[moved to mailing list since this is rambling now]
On Sun, Oct 12, 2003 at 09:32:45AM -0400, via RT wrote:
> Full context and any attached attachments can be found at:
> <URL: http://rt.cpan.org/NoAuth/Bug.html?id=4077 >
> [guest - Sun Oct 12 06:54:12 2003]:
> > Hence I marked it as a wishlist item. While I can understand you not
> > being interested in it, I don't understand why you closed the bug.
> I closed it because I'm not planning to do anything about it, and I'd
> rather only have things that I do plan to do "on my radar". I'll leave
> it open since you want it so, though.
Fair enough. However I think in the interests of encouraging development
on the project by others (which is something you were interested in, I
think) it might be helpful to widen the scope of the bugs list. Having a
bunch of things to work on, even if you don't want to, might encourage
more people to hack.
Here's a log of a conversation Ivor and I had on IRC the other week about
searching - meant to post it before, but forgot. Comments?
00:09 -!- Irssi: Starting query in perl with ivorw
00:09 <ivorw> Hi, I've been thinking about how to restructure the search on
00:12 <hex> hi.
00:12 <hex> oho? do go on...
00:13 <ivorw> I'm thinking that we keep the existing idea of priming a page
cache that is then searched
00:14 * hex nods
00:14 <ivorw> However, the cache can be primed with pages resulting from an
SQL query, in addition to those from a keyword inverted index search
00:15 <ivorw> The idea is that including locale=camden causes all locale
camden pages to be loaded into the cache
00:16 <ivorw> But a full and & or syntax is available for applying to the
00:17 <ivorw> I'm now looking for a syntax for metadata qualifiers on a
00:19 <hex> hmm... so the search would be faster because locale camden would
00:20 <ivorw> hex, not quite - the search tree only works over a single hash
(cache). The idea is that everything that could possibly match the search is
pre-loaded into the cache.
00:21 <ivorw> Prior to O/G, the usemod search worked by slurping the whole
wiki into the hash every time.
00:22 <hex> blimey
00:22 <ivorw> Although this works, SII et al provide a better mechanism for
subsetting the data, and something that will not blow the server up with a
substantial query and dataset
00:23 <hex> right, yes.
00:23 <ivorw> My bodge (which works prety well) is to 'prime' the input hash
with the results of an inverted index search on all of the keywords supplied
- regardless of and/or syntax
00:24 <ivorw> However, only the SII is used, so the search will only find
words in the body text of the page, not in the metadata
00:25 <hex> so we need a mechanism for searching metadata?
00:26 <ivorw> I want to keep this idea of a primed cache, and load it with
SQL query results, to provide just that
00:28 <ivorw> How about: King's Head&locale=acton&category=real ale
00:29 <ivorw> Or, how about: locale=west end&category=pubs
00:30 <hex> ah, you mean search syntax
00:30 <hex> I quite like Google style
00:30 <hex> King's Head locale:"West End" category:Pubs
00:31 <ivorw> Ah, but are spaces allowed in metadata field names?
00:32 <hex> I dunno, but surely that could be handled without the user being
00:32 <hex> s/ /_/g on the fly sort of thing and vice versa
00:33 <ivorw> Just thinking that some syntax might be tricky or ambiguous
00:34 <ivorw> the last post code:foo #Is this matching on post code or
00:35 <hex> oh, right!
00:35 <hex> no, I think all metadata is one word
00:36 <hex> there could always be a pidgin syntax for it anyway
00:36 <hex> phone:12345
00:36 <ivorw> taken from google again (dig them pidgeons :)
00:37 <ivorw> If we can name every meta field with \w chars, this will be OK
00:38 <ivorw> We could have aliases, phone, telephone, tel, etc.
00:39 <hex> yup!
00:39 <ivorw> Given my previous suggestion, I would quite like to be able to
do regexp matches
00:40 <ivorw> e.g. king's head&post_code=~W3
00:41 <hex> must the search terms be joined by '&'?
00:42 <hex> I would have thought a magic word would suffice, or the =
00:42 <ivorw> that was just my previous syntax.
00:42 <hex> ah, I follow.
00:42 <hex> but yes, regexen++
00:44 <ivorw> How about dropping the &s: locale=west end category=pubs
00:45 <ivorw> Note, that's the grid ref of Holborn tube
00:46 <hex> I prefer colons to equals, simply because I think people are
more used to Google style
00:46 <hex> but that may just be me
00:47 <ivorw> How about colons for a straight match, and =~ for a regexp
00:47 <hex> "west end" would need quotes so you don't search for "end" in
00:47 <hex> yes, that sounds great.
00:48 <ivorw> might want to delimit the regexp
00:48 <hex> actually, could it be :~ for a regexp, for consistency?
00:48 <hex> I know that's not very perlish....
00:48 <ivorw> yup, why not tho
00:48 <hex> cool.
00:49 <ivorw> what do you think of my idea of a distance 'function'?
00:51 <hex> how about:
00:51 <hex> near:530546,181503 range:200m
00:51 <hex> a little easier to read
00:51 <hex> (and write)
00:51 <hex> I love the idea
00:52 <ivorw> with a default range presumably
00:52 <hex> hmm, yes
00:52 <hex> units: m, ft, yds, mi, km
00:52 <hex> (maybe)
00:53 <ivorw> Didn't someone give a talk on a module to handle dimensions?
00:53 <hex> dunno :)
00:53 <ivorw> I recall one last year in State51
00:54 <ivorw> Alex Gough: - Meaningful Strong Typing with Data::Dimensions
00:55 <ivorw> unfortunately, the link for the slides is broken :(
00:55 <hex> bug him on the list:)
00:55 <hex> listen, I must run, or rather sleep, my eyes are closing
00:56 <hex> this is very promising stuff
00:56 <ivorw> OK, noe wurriz - Thanks for the braindump receptacle
00:56 <hex> no problem!
00:56 <hex> seeya...
00:56 <ivorw> nn
# Earle Martin http://c2.com/cgi/wiki?EarleMartin
Believe we met at Semantics at the Zoo
have Zaurus 750 excellent! would like kismet and GPS.
are you going to linuxfest at olympia next week?
can we meet up?
need help setting up, but have willing team to create local detailed
map in Lambeth, and some of Wandsworth, Southwark and West end
They need this for independent travel :-)
People with Learning Difficulties creating the web.