[OGDev] URGENT Re: [OpenGuides-Dev] OG performance

Rev Simon Rumble simon at rumble.net
Thu Jul 26 01:24:50 BST 2007


This one time, at band camp, Earle Martin wrote:

> 1) Somebody at 80.68.93.162 has been slurping the whole site, indexes,
> pages, format varieties (rdf, raw) and all with WWW::Mechanize at the
> rate of one request per second.

This is something we can fix!  That IP == ivorw.vm.bytemark.co.uk.  
Ivor, care to back off slurping for a bit?

> a) and b) are going to happen this afternoon. c) d) and e) hopefully
> within a day or two. f) as soon as e) is complete. If load *still*
> goes too high after the spamwall has been raised, it may be indicative
> of a deeper problem meriting heavy investigation, and I'll put the
> site back into maintenance mode again.

I think a rudimentary publishing mode would be helpful.  The front page 
doesn't need to be built from the database every time, so having it 
published might be good.  I suspect putting the latest changes on there 
causes a fair bit of pain.

> Incidental thought: since robots.txt doesn't allow wildcards, perhaps
> we should put 'rel="nofollow"' on all our links to resources useless
> to search engines. I believe the better-behaved robots should respect
> this.

Indeed, this should most certainly be done for all revisions of a 
page except the current, so that if someone reverts spam without the 
admin password, it's not indexed by the crawlers.

PS: I strongly doubt it's Google causing problems.  Google is a very 
well behaved bot.  Others like the MSN one are much less well behaved.

-- 
Rev Simon Rumble <simon at rumble.net>
www.rumble.net

The Tourist Engineer
Geeks need vacations too.
http://engineer.openguides.org/

Politics is the gentle art of getting votes from the poor
and campaign funds from the rich by promising to protect
each from the other.
- Politicians and Other Scoundrels by Ferdinand Lundberg



More information about the OpenGuides-Dev mailing list