Hiya.
It occurs to me that there's very little point in search engines crawling certain pages of an OpenGuide - anything with "action=edit" or "action=delete" in the URL, at the very least, has no real value to someone searching for information. Unfortunately, unless I'm missing something, robots.txt syntax doesn't allow for matching on anything other than the start of the path component of a URL, which doesn't help us here.
It's been suggested to me that ``<meta robots="noindex">'' tags in the <head> of a page are effective here, but I'm not sure of the best way to implement this for edit pages without implementing it for *all* pages, which would be careless.
I'm thinking of something in header.tt conditional upon the requested URI containing "action=edit" or "action=delete", but before I wander off learning how to talk template::toolkit, I'd be interested to hear better suggestions -- and other values of action we might care about, I guess.
The other, more complex but possibly "better" alternative, would be for someone to run with the idea mooted in this thread: http://openguides.org/mail/openguides-dev/2004-April/000258.html
Then edit pages could be /edit/Node_Name, deletes /delete/Node_Name, and so on. This makes setting up a suitable robots.txt very simple indeed, though it makes setting up the Apache rewrite rules a) a requirement, instead of just a nice thing, and b) more complex than at present.
Thoughts, comments, suggestions and the like all invited.
Cheers, James.
This one time, at band camp, James Green wrote:
It occurs to me that there's very little point in search engines crawling certain pages of an OpenGuide - anything with "action=edit" or "action=delete" in the URL, at the very least, has no real value to someone searching for information. Unfortunately, unless I'm missing something, robots.txt syntax doesn't allow for matching on anything other than the start of the path component of a URL, which doesn't help us here.
Solution here would be to change the syntax of the action= URLs to be something like: /action=edit/MyWikiPage
Then you could put /action=edit/ in your robots.txt file.
This would probably, if you're not already doing it, involve Apache rewriting which, if you're not already doing it, be a bit more effort than is really warranted.
Using the meta tag doesn't remove the problem of the bandwidth used.
On Sun 29 Aug 2004, James Green jkg@earth.li wrote:
It occurs to me that there's very little point in search engines crawling certain pages of an OpenGuide - anything with "action=edit" or "action=delete" in the URL, at the very least, has no real value to someone searching for information. [...]
It's been suggested to me that ``<meta robots="noindex">'' tags in the
<head> of a page are effective here, [...]
I made a start on this; see for example http://london.openguides.org/kakemirror/index.cgi?id=Locale_Fulham;action=ed...
Questions: (a) where else on http://london.openguides.org/kakemirror/index.cgi do we need these tags (please do check they're not already there before spitting out suggestions), (b) am I doing the tags right.
Kake
On Sun, Sep 19, 2004 at 01:22:22AM +0100, Kake L Pugh wrote:
On Sun 29 Aug 2004, James Green jkg@earth.li wrote:
It's been suggested to me that ``<meta robots="noindex">'' tags in the
<head> of a page are effective here, [...]
I made a start on this [...]
Cool!
Questions: (a) where else on http://london.openguides.org/kakemirror/index.cgi do we need these tags (please do check they're not already there before spitting out suggestions),
My main concerns were the edit and delete pages, which you've covered. The only other thing that springs to mind is newpage.cgi.
(b) am I doing the tags right.
No, apparently I goofed on that one. The correct form is: `<meta name="robots" content="noindex">', at least according to http://www.robotstxt.org/wc/meta-user.html
Mea culpa.
James.
On Sun 19 Sep 2004, James Green jkg@earth.li wrote:
My main concerns were the edit and delete pages, which you've covered. The only other thing that springs to mind is newpage.cgi.
Done.
No, apparently I goofed on that one. The correct form is: `<meta name="robots" content="noindex">', at least according to http://www.robotstxt.org/wc/meta-user.html
Fixed. These changes will be in the next release.
Kake
openguides-dev@lists.openguides.org