Summary:
I think I know why OpenGuides has a habit of occasionally
taking over servers. If you've experienced problems with occasional
ridiculously high load averages, take a look at the Action paragraph
below.
Action:
In the short term, I suggest that anyone who's been having trouble
with OpenGuides taking over their machines should comment the per-user
RSS feed links out of userstats.tt - copy it to your custom_templates
directory and either delete everything between [% IF username %] and
the next [% END %] statement (inclusive), or change [% IF username %]
to [% IF username AND 0 %] - both will have the same effect.
Details:
I suspect that I have at last found out the thing that makes
OpenGuides take people's servers over. It's the per-user RSS feeds,
which are usually linked from e.g.
http://cambridge.openguides.org/wiki/?username=Kake;action=userstats
The problematic feeds are those which omit minor edits, and they're
only a problem on large guides (Cambridge is fine, RGL is not). The
reason for this is that they effectively select _everything_ from the
database and then winnow it down. When a spider is hitting one of
these feeds every second... I think you can see where this is going,
and it's not pretty.
The reason this has gone unnoticed for so long is that (a) nobody
really uses these feeds anyway, so they only get accessed by spiders
etc, and spiders aren't going to email us and say "hey, this link is
very slow", and (b) these feeds are the only thing in OpenGuides that
use "last n changes" as a selection criterion - Recent Changes uses
"last n days", which is much more efficient.
The root problem is with the _find_recent_changed_by_criteria method
in Wiki::Toolkit::Store::Database, specifically the "metadata_wasnt"
parameter in combination with the "limit" one. I am working on a
rewrite of this method to make it more efficient, but this is going to
require some thinking and hence some time.
Kake