Here's a log of a conversation Ivor and I had on IRC the other week about
searching - meant to post it before, but forgot. Comments?
Cheers,
Earle.
00:09 -!- Irssi: Starting query in perl with ivorw
00:09 <ivorw> Hi, I've been thinking about how to restructure the search on
OG
00:12 <hex> hi.
00:12 <hex> oho? do go on...
00:13 <ivorw> I'm thinking that we keep the existing idea of priming a page
cache that is then searched
00:14 * hex nods
00:14 <ivorw> However, the cache can be primed with pages resulting from an
SQL query, in addition to those from a keyword inverted index search
00:15 <ivorw> The idea is that including locale=camden causes all locale
camden pages to be loaded into the cache
00:16 <ivorw> But a full and & or syntax is available for applying to the
cache
00:17 <ivorw> I'm now looking for a syntax for metadata qualifiers on a
search
00:19 <hex> hmm... so the search would be faster because locale camden would
be cached?
00:20 <ivorw> hex, not quite - the search tree only works over a single hash
(cache). The idea is that everything that could possibly match the search is
pre-loaded into the cache.
00:21 <ivorw> Prior to O/G, the usemod search worked by slurping the whole
wiki into the hash every time.
00:22 <hex> blimey
00:22 <ivorw> Although this works, SII et al provide a better mechanism for
subsetting the data, and something that will not blow the server up with a
substantial query and dataset
00:23 <hex> right, yes.
00:23 <ivorw> My bodge (which works prety well) is to 'prime' the input
hash
with the results of an inverted index search on all of the keywords supplied
- regardless of and/or syntax
00:24 <ivorw> However, only the SII is used, so the search will only find
words in the body text of the page, not in the metadata
00:25 <hex> so we need a mechanism for searching metadata?
00:26 <ivorw> I want to keep this idea of a primed cache, and load it with
SQL query results, to provide just that
00:28 <ivorw> How about: King's Head&locale=acton&category=real ale
00:29 <ivorw> Or, how about: locale=west end&category=pubs
00:30 <hex> ah, you mean search syntax
00:30 <hex> I quite like Google style
00:30 <hex> King's Head locale:"West End" category:Pubs
00:31 <ivorw> Ah, but are spaces allowed in metadata field names?
00:32 <hex> I dunno, but surely that could be handled without the user being
involved...
00:32 <hex> s/ /_/g on the fly sort of thing and vice versa
00:33 <ivorw> Just thinking that some syntax might be tricky or ambiguous
00:34 <ivorw> the last post code:foo #Is this matching on post code or
code?
00:35 <hex> oh, right!
00:35 <hex> no, I think all metadata is one word
00:36 <hex> there could always be a pidgin syntax for it anyway
00:36 <hex> phone:12345
00:36 <ivorw> taken from google again (dig them pidgeons :)
00:37 <ivorw> If we can name every meta field with \w chars, this will be OK
00:38 <ivorw> We could have aliases, phone, telephone, tel, etc.
00:39 <hex> yup!
00:39 <ivorw> Given my previous suggestion, I would quite like to be able to
do regexp matches
00:40 <ivorw> e.g. king's head&post_code=~W3
00:41 <hex> must the search terms be joined by '&'?
00:42 <hex> I would have thought a magic word would suffice, or the =
00:42 <ivorw> that was just my previous syntax.
00:42 <hex> ah, I follow.
00:42 <hex> but yes, regexen++
00:44 <ivorw> How about dropping the &s: locale=west end category=pubs
distance(530546,181503)<200
00:45 <ivorw> Note, that's the grid ref of Holborn tube
00:46 <hex> I prefer colons to equals, simply because I think people are
more used to Google style
00:46 <hex> but that may just be me
00:47 <ivorw> How about colons for a straight match, and =~ for a regexp
00:47 <hex> "west end" would need quotes so you don't search for
"end" in
"locale:west"
00:47 <hex> yes, that sounds great.
00:48 <ivorw> might want to delimit the regexp
00:48 <hex> actually, could it be :~ for a regexp, for consistency?
00:48 <hex> I know that's not very perlish....
00:48 <ivorw> yup, why not tho
00:48 <hex> cool.
00:49 <ivorw> what do you think of my idea of a distance 'function'?
00:51 <hex> how about:
00:51 <hex> near:530546,181503 range:200m
00:51 <hex> a little easier to read
00:51 <hex> (and write)
00:51 <hex> I love the idea
00:52 <ivorw> with a default range presumably
00:52 <hex> hmm, yes
00:52 <hex> units: m, ft, yds, mi, km
00:52 <hex> (maybe)
00:53 <ivorw> Didn't someone give a talk on a module to handle dimensions?
00:53 <hex> dunno :)
00:53 <ivorw> I recall one last year in State51
00:54 <ivorw> Alex Gough: - Meaningful Strong Typing with Data::Dimensions
00:55 <ivorw> unfortunately, the link for the slides is broken :(
00:55 <hex> bug him on the list:)
00:55 <hex> listen, I must run, or rather sleep, my eyes are closing
00:56 <hex> this is very promising stuff
00:56 <ivorw> OK, noe wurriz - Thanks for the braindump receptacle
00:56 <hex> no problem!
00:56 <hex> seeya...
00:56 <ivorw> nn
--
# Earle Martin
http://c2.com/cgi/wiki?EarleMartin
$a="f695a9a2176a7dd1618af6649896ee10f05ea986de18af6277e9a1d8ef4696644569a1d".
"8ef46961ae1e64277e9896eea7d92ea8003e9a1d8ef4696f6950";$b="8ALB6AIA4.BA2";$c=
join"",unpack"C*",$b;$c=~s/7/2/g;@b=split"",$c;foreach$d(@b){$e=hex(substr($a
,$f,$d));while(length($e)<8){substr($e,0,0)=0;}print pack"b8",$e;$f+=$d;}