From david@cantrell.org.uk Wed Jun 25 16:06:34 2008 From: David Cantrell To: openguides-dev@lists.openguides.org Subject: Re: [OGDev] Robot deterrence Date: Wed, 25 Jun 2008 16:06:28 +0100 Message-ID: <20080625150628.GE19237@bytemark.barnyard.co.uk> In-Reply-To: <20080625140759.GA478@the.earth.li> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9198301614102478649==" --===============9198301614102478649== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable On Wed, Jun 25, 2008 at 03:07:59PM +0100, Kake L Pugh wrote: > There are a number of OpenGuides page types that web spiders don't > really need to index, and we have code to stop them doing it >=20 > However, it doesn't seem to be working. See for instance: > http://london.randomness.org.uk/wiki.cgi?action=3Dlist_all_versions;id=3D= Locale%20IG9 >=20 > which if you view the source does indeed have > > in the . >=20 > But from the Apache logs: > 66.249.67.153 - - [25/Jun/2008:14:59:00 +0100] "GET /wiki.cgi?action=3Dli= st_all_versions;id=3DLocale%20IG9 HTTP/1.1" 200 3151 "-" "Mozilla/5.0 (compat= ible; Googlebot/2.1; +http://www.google.com/bot.html)" >=20 > Am I missing something obvious? Yes, what you're missing is that google don't pay attention to robots.txt or the meta thingy. I expect that they cache it and then ignore changes for some time. Yahoo do the same. And yes, I do have the logs to prove that. I've had to ban their IP ranges from cpandeps and from wikiproxy.cantrell.org.uk. --=20 David Cantrell | top google result for "topless karaoke murders" Godliness is next to Englishness --===============9198301614102478649==--