On Thu, Jul 26, 2007 at 01:24:50AM +0100, Rev Simon
Indeed, this should most certainly be done for
all revisions of a
page except the current, so that if someone reverts spam without the
admin password, it's not indexed by the crawlers.
PS: I strongly doubt it's Google causing problems. Google is a very
well behaved bot. Others like the MSN one are much less well behaved.
I'm not convinced of that.
Google routinely and regularly fetches *large* pages on the Open Guide
to Boston that almost never change. Think Category Restaurant page --
1MB page, changes maybe once a week, Google fetches it daily.
Granted, OG Boston is particularly poorly optimized for this because we
use index_list in our category pages. The actual index_value, etc. mode
in wiki.cgi is significantly more lightweight. (Bad decision on my
part.) But you don't have to have someone fetching much data to hurt a
site, and even if Google is only requesting things slowly, they can
still exceed the return rate of the server.
Sorry not really following most of this conversation due to flooding...
Dominic Hargreaves |