[OGDev] URGENT Re: [OpenGuides-Dev] OG performance
Paul Makepeace
paulm at paulm.com
Thu Jul 26 14:46:22 BST 2007
On 7/26/07, Christopher Schmidt <crschmidt at crschmidt.net> wrote:
> On Thu, Jul 26, 2007 at 01:24:50AM +0100, Rev Simon Rumble wrote:
> > Indeed, this should most certainly be done for all revisions of a
> > page except the current, so that if someone reverts spam without the
> > admin password, it's not indexed by the crawlers.
> >
> > PS: I strongly doubt it's Google causing problems. Google is a very
> > well behaved bot. Others like the MSN one are much less well behaved.
>
> I'm not convinced of that.
>
> Google routinely and regularly fetches *large* pages on the Open Guide
> to Boston that almost never change. Think Category Restaurant page --
> 1MB page, changes maybe once a week, Google fetches it daily.
Spoke with the crawler guys here and your site changes more often than
you seem to think. That page also has a high page rank which affects
crawl frequency.
Your HTTP headers could help more: try using if-modified-since.
Consider also a reverse caching proxy to reduce load. You can reduce
the crawl freq with the webmaster console if you still think it's too
much.
You could restructure the page to not be a megabyte too, of course ;-)
HTH,
Paul (not speaking as a representative of his employer, just trying to help out)
>
> Granted, OG Boston is particularly poorly optimized for this because we
> use index_list in our category pages. The actual index_value, etc. mode
> in wiki.cgi is significantly more lightweight. (Bad decision on my
> part.) But you don't have to have someone fetching much data to hurt a
> site, and even if Google is only requesting things slowly, they can
> still exceed the return rate of the server.
>
> Regards,
> --
> Christopher Schmidt
> Web Developer
>
> --
> OpenGuides-Dev mailing list - OpenGuides-Dev at lists.openguides.org
> http://lists.openguides.org/cgi-bin/mailman/listinfo/openguides-dev
>
More information about the OpenGuides-Dev
mailing list