Posts Tagged ‘htaccess’

05/27/11
Paul Savage
tags:   ,

Removing sitemaps.xml & robots.txt from the SERPs


It can happen that your sitemap.xml or your robots.txt file finds it’s way into the index. Just do the following query site:yourdomain.com filetype:xml to see what XML files you have listed from your domain. Here is an example of some files indexed for the domain court.us.

It’s probably not what you really want, as basically it’s just trash in the SERPs. To fix this, and remove it from the SERPs,  you can simply add some extra details to your .htaccess file which will send the the proper X-Robots-Tag.

For your .htaccess file

<FilesMatch "sitemap\.xml">
Header set X-Robots-Tag "noindex"
</FilesMatch>

<FilesMatch "robots\.txt">
Header set X-Robots-Tag "noindex"
</FilesMatch>
This method can also be used to remove all word documents or similar from the index.

<FilesMatch "\.doc$">

Header set X-Robots-Tag "index, noarchive, nosnippet" </Files>

 

To check your MIME type

http://redbot.org/ is a handy tool to check out the MIME headers, cache control and FileTypes. Their code is opensource so you can run a version on your server.

Thanks

Thanks to some people like Carlo Zottmann , JohnMu, & Paul Cawley for giving me some pointers on this. :)