On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
There are a number of OpenGuides page types that web spiders don't really need to index, and we have code to stop them doing it
However, it doesn't seem to be working. See for instance: http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
<meta name="robots" content="noindex,nofollow" /> in the <head>.
But from the Apache logs: 66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
Yes, what you're missing is that google don't pay attention to robots.txt or the meta thingy. I expect that they cache it and then ignore changes for some time.
Yahoo do the same.
And yes, I do have the logs to prove that. I've had to ban their IP ranges from cpandeps and from wikiproxy.cantrell.org.uk.