Re: [OGDev] Robot deterrence

25 Jun 2008


      On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
...
There are a number of OpenGuides page types that web spiders don't
really need to index, and we have code to stop them doing it, e.g.
http://dev.openguides.org/changeset/573
http://dev.openguides.org/changeset/1132
However, it doesn't seem to be working.  See for instance:
  http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
  <meta name="robots" content="noindex,nofollow" />
in the <head>.
But from the Apache logs:
  66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
'noindex,nofollow' says: "Don't put my contents in the Google Searches,
and don't follow links from this page."
It can't know that the page has those tags without crawling it.
"Crawling" and "Indexing" are two  different things: the only way to
have a page not be crawled is to: 
 * Not have any links pointing to it anywhere that Google can get to
 * Including it in robots.txt.
Regards,
-- 
Christopher Schmidt
Web Developer

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [OGDev] Robot deterrence