On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
There are a number of OpenGuides page types that web spiders don't really need to index, and we have code to stop them doing it, e.g.
http://dev.openguides.org/changeset/573 http://dev.openguides.org/changeset/1132
However, it doesn't seem to be working. See for instance: http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
<meta name="robots" content="noindex,nofollow" /> in the <head>.
But from the Apache logs: 66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
In order to read the meta-headers the bot needs to make a request to the page. It wont index that page, or follow links from it though.
It might be worth putting some stuff in robots.txt to stop it, but I don't know if robots.txt can use the request parameters in patterns.
David