This one time, at band camp, Earle Martin wrote:
1) Somebody at 22.214.171.124 has been slurping the
whole site, indexes,
pages, format varieties (rdf, raw) and all with WWW::Mechanize at the
rate of one request per second.
This is something we can fix! That IP == ivorw.vm.bytemark.co.uk.
Ivor, care to back off slurping for a bit?
a) and b) are going to happen this afternoon. c) d)
and e) hopefully
within a day or two. f) as soon as e) is complete. If load *still*
goes too high after the spamwall has been raised, it may be indicative
of a deeper problem meriting heavy investigation, and I'll put the
site back into maintenance mode again.
I think a rudimentary publishing mode would be helpful. The front page
doesn't need to be built from the database every time, so having it
published might be good. I suspect putting the latest changes on there
causes a fair bit of pain.
Incidental thought: since robots.txt doesn't allow
we should put 'rel="nofollow"' on all our links to resources useless
to search engines. I believe the better-behaved robots should respect
Indeed, this should most certainly be done for all revisions of a
page except the current, so that if someone reverts spam without the
admin password, it's not indexed by the crawlers.
PS: I strongly doubt it's Google causing problems. Google is a very
well behaved bot. Others like the MSN one are much less well behaved.
Rev Simon Rumble <simon(a)rumble.net>
The Tourist Engineer
Geeks need vacations too.
Politics is the gentle art of getting votes from the poor
and campaign funds from the rich by promising to protect
each from the other.
- Politicians and Other Scoundrels by Ferdinand Lundberg