Here's a log of a conversation Ivor and I had on IRC the other week about searching - meant to post it before, but forgot. Comments?
Cheers,
Earle.
00:09 -!- Irssi: Starting query in perl with ivorw 00:09 <ivorw> Hi, I've been thinking about how to restructure the search on OG 00:12 <hex> hi. 00:12 <hex> oho? do go on... 00:13 <ivorw> I'm thinking that we keep the existing idea of priming a page cache that is then searched 00:14 * hex nods 00:14 <ivorw> However, the cache can be primed with pages resulting from an SQL query, in addition to those from a keyword inverted index search 00:15 <ivorw> The idea is that including locale=camden causes all locale camden pages to be loaded into the cache 00:16 <ivorw> But a full and & or syntax is available for applying to the cache 00:17 <ivorw> I'm now looking for a syntax for metadata qualifiers on a search 00:19 <hex> hmm... so the search would be faster because locale camden would be cached? 00:20 <ivorw> hex, not quite - the search tree only works over a single hash (cache). The idea is that everything that could possibly match the search is pre-loaded into the cache. 00:21 <ivorw> Prior to O/G, the usemod search worked by slurping the whole wiki into the hash every time. 00:22 <hex> blimey 00:22 <ivorw> Although this works, SII et al provide a better mechanism for subsetting the data, and something that will not blow the server up with a substantial query and dataset 00:23 <hex> right, yes. 00:23 <ivorw> My bodge (which works prety well) is to 'prime' the input hash with the results of an inverted index search on all of the keywords supplied - regardless of and/or syntax 00:24 <ivorw> However, only the SII is used, so the search will only find words in the body text of the page, not in the metadata 00:25 <hex> so we need a mechanism for searching metadata? 00:26 <ivorw> I want to keep this idea of a primed cache, and load it with SQL query results, to provide just that 00:28 <ivorw> How about: King's Head&locale=acton&category=real ale 00:29 <ivorw> Or, how about: locale=west end&category=pubs 00:30 <hex> ah, you mean search syntax 00:30 <hex> I quite like Google style 00:30 <hex> King's Head locale:"West End" category:Pubs 00:31 <ivorw> Ah, but are spaces allowed in metadata field names? 00:32 <hex> I dunno, but surely that could be handled without the user being involved... 00:32 <hex> s/ /_/g on the fly sort of thing and vice versa 00:33 <ivorw> Just thinking that some syntax might be tricky or ambiguous 00:34 <ivorw> the last post code:foo #Is this matching on post code or code? 00:35 <hex> oh, right! 00:35 <hex> no, I think all metadata is one word 00:36 <hex> there could always be a pidgin syntax for it anyway 00:36 <hex> phone:12345 00:36 <ivorw> taken from google again (dig them pidgeons :) 00:37 <ivorw> If we can name every meta field with \w chars, this will be OK 00:38 <ivorw> We could have aliases, phone, telephone, tel, etc. 00:39 <hex> yup! 00:39 <ivorw> Given my previous suggestion, I would quite like to be able to do regexp matches 00:40 <ivorw> e.g. king's head&post_code=~W3 00:41 <hex> must the search terms be joined by '&'? 00:42 <hex> I would have thought a magic word would suffice, or the = 00:42 <ivorw> that was just my previous syntax. 00:42 <hex> ah, I follow. 00:42 <hex> but yes, regexen++ 00:44 <ivorw> How about dropping the &s: locale=west end category=pubs distance(530546,181503)<200 00:45 <ivorw> Note, that's the grid ref of Holborn tube 00:46 <hex> I prefer colons to equals, simply because I think people are more used to Google style 00:46 <hex> but that may just be me 00:47 <ivorw> How about colons for a straight match, and =~ for a regexp 00:47 <hex> "west end" would need quotes so you don't search for "end" in "locale:west" 00:47 <hex> yes, that sounds great. 00:48 <ivorw> might want to delimit the regexp 00:48 <hex> actually, could it be :~ for a regexp, for consistency? 00:48 <hex> I know that's not very perlish.... 00:48 <ivorw> yup, why not tho 00:48 <hex> cool. 00:49 <ivorw> what do you think of my idea of a distance 'function'? 00:51 <hex> how about: 00:51 <hex> near:530546,181503 range:200m 00:51 <hex> a little easier to read 00:51 <hex> (and write) 00:51 <hex> I love the idea 00:52 <ivorw> with a default range presumably 00:52 <hex> hmm, yes 00:52 <hex> units: m, ft, yds, mi, km 00:52 <hex> (maybe) 00:53 <ivorw> Didn't someone give a talk on a module to handle dimensions? 00:53 <hex> dunno :) 00:53 <ivorw> I recall one last year in State51 00:54 <ivorw> Alex Gough: - Meaningful Strong Typing with Data::Dimensions 00:55 <ivorw> unfortunately, the link for the slides is broken :( 00:55 <hex> bug him on the list:) 00:55 <hex> listen, I must run, or rather sleep, my eyes are closing 00:56 <hex> this is very promising stuff 00:56 <ivorw> OK, noe wurriz - Thanks for the braindump receptacle 00:56 <hex> no problem! 00:56 <hex> seeya... 00:56 <ivorw> nn