On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
There are a number of OpenGuides page types that web spiders don't really need to index, and we have code to stop them doing it, e.g.
http://dev.openguides.org/changeset/573 http://dev.openguides.org/changeset/1132
However, it doesn't seem to be working. See for instance: http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
<meta name="robots" content="noindex,nofollow" /> in the <head>.
But from the Apache logs: 66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
'noindex,nofollow' says: "Don't put my contents in the Google Searches, and don't follow links from this page."
It can't know that the page has those tags without crawling it.
"Crawling" and "Indexing" are two different things: the only way to have a page not be crawled is to: * Not have any links pointing to it anywhere that Google can get to * Including it in robots.txt.
Regards,