This one time, at band camp, Earle Martin wrote:
1) Somebody at 80.68.93.162 has been slurping the
whole site, indexes,
pages, format varieties (rdf, raw) and all with WWW::Mechanize at the
rate of one request per second.
This is something we can fix! That IP == ivorw.vm.bytemark.co.uk.
Ivor, care to back off slurping for a bit?
a) and b) are going to happen this afternoon. c) d)
and e) hopefully
within a day or two. f) as soon as e) is complete. If load *still*
goes too high after the spamwall has been raised, it may be indicative
of a deeper problem meriting heavy investigation, and I'll put the
site back into maintenance mode again.
I think a rudimentary publishing mode would be helpful. The front page
doesn't need to be built from the database every time, so having it
published might be good. I suspect putting the latest changes on there
causes a fair bit of pain.
Incidental thought: since robots.txt doesn't allow
wildcards, perhaps
we should put 'rel="nofollow"' on all our links to resources useless
to search engines. I believe the better-behaved robots should respect
this.
Indeed, this should most certainly be done for all revisions of a
page except the current, so that if someone reverts spam without the
admin password, it's not indexed by the crawlers.
PS: I strongly doubt it's Google causing problems. Google is a very
well behaved bot. Others like the MSN one are much less well behaved.
--
Rev Simon Rumble <simon(a)rumble.net>
www.rumble.net
The Tourist Engineer
Geeks need vacations too.
http://engineer.openguides.org/
Politics is the gentle art of getting votes from the poor
and campaign funds from the rich by promising to protect
each from the other.
- Politicians and Other Scoundrels by Ferdinand Lundberg