Re: [OGDev] Robot deterrence

25 Jun 2008


      On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
...
There are a number of OpenGuides page types that web spiders don't
really need to index, and we have code to stop them doing it
However, it doesn't seem to be working.  See for instance:
  http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
  <meta name="robots" content="noindex,nofollow" />
in the <head>.
But from the Apache logs:
  66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
Yes, what you're missing is that google don't pay attention to
robots.txt or the meta thingy.  I expect that they cache it and then
ignore changes for some time.
Yahoo do the same.
And yes, I do have the logs to prove that.  I've had to ban their IP
ranges from cpandeps and from wikiproxy.cantrell.org.uk.
-- 
David Cantrell | top google result for "topless karaoke murders"

    Godliness is next to Englishness

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [OGDev] Robot deterrence