Re: [OGDev] Robot deterrence

25 Jun 2008


      On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote:
...
There are a number of OpenGuides page types that web spiders don't
really need to index, and we have code to stop them doing it, e.g.
http://dev.openguides.org/changeset/573
http://dev.openguides.org/changeset/1132
However, it doesn't seem to be working.  See for instance:
  http://london.randomness.org.uk/wiki.cgi?action=list_all_versions;id=Locale%...
which if you view the source does indeed have
  <meta name="robots" content="noindex,nofollow" />
in the <head>.
But from the Apache logs:
  66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=list_all_versions;id=Locale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Am I missing something obvious?
In order to read the meta-headers the bot needs to make a request to the
page. It wont index that page, or follow links from it though.
It might be worth putting some stuff in robots.txt to stop it, but
I don't know if robots.txt can use the request parameters in patterns.
David
-- 
"I think 'small and fluffy' is a good term, which should be used more often"
  --Andie

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [OGDev] Robot deterrence