From dave@earth.li Wed Jun 25 15:14:10 2008 From: David Sheldon To: openguides-dev@lists.openguides.org Subject: Re: [OGDev] Robot deterrence Date: Wed, 25 Jun 2008 15:14:05 +0100 Message-ID: <20080625141405.GE10103@ox.compsoc.net> In-Reply-To: <20080625140759.GA478@the.earth.li> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0854328350641488435==" --===============0854328350641488435== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote: > There are a number of OpenGuides page types that web spiders don't > really need to index, and we have code to stop them doing it, e.g. >=20 > http://dev.openguides.org/changeset/573 > http://dev.openguides.org/changeset/1132 >=20 > However, it doesn't seem to be working. See for instance: > http://london.randomness.org.uk/wiki.cgi?action=3Dlist_all_versions;id=3D= Locale%20IG9 >=20 > which if you view the source does indeed have > > in the . >=20 > But from the Apache logs: > 66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=3Dli= st_all_versions;id=3DLocale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compat= ible; Googlebot/2.1; +http://www.google.com/bot.html)" >=20 > Am I missing something obvious? In order to read the meta-headers the bot needs to make a request to the page. It wont index that page, or follow links from it though. It might be worth putting some stuff in robots.txt to stop it, but I don't know if robots.txt can use the request parameters in patterns. David --=20 "I think 'small and fluffy' is a good term, which should be used more often" --Andie --===============0854328350641488435==--