On Thu, Jul 26, 2007 at 01:24:50AM +0100, Rev Simon Rumble wrote:
Indeed, this should most certainly be done for all
revisions of a
page except the current, so that if someone reverts spam without the
admin password, it's not indexed by the crawlers.
PS: I strongly doubt it's Google causing problems. Google is a very
well behaved bot. Others like the MSN one are much less well behaved.
I'm not convinced of that.
Google routinely and regularly fetches *large* pages on the Open Guide
to Boston that almost never change. Think Category Restaurant page --
1MB page, changes maybe once a week, Google fetches it daily.
Granted, OG Boston is particularly poorly optimized for this because we
use index_list in our category pages. The actual index_value, etc. mode
in wiki.cgi is significantly more lightweight. (Bad decision on my
part.) But you don't have to have someone fetching much data to hurt a
site, and even if Google is only requesting things slowly, they can
still exceed the return rate of the server.