Here's a log of a conversation Ivor and I had on IRC the other week about
searching - meant to post it before, but forgot. Comments?
Cheers,
Earle.
00:09 -!- Irssi: Starting query in perl with ivorw
00:09 <ivorw> Hi, I've been thinking about how to restructure the search on
OG
00:12 <hex> hi.
00:12 <hex> oho? do go on...
00:13 <ivorw> I'm thinking that we keep the existing idea of priming a page
cache that is then searched
00:14 * hex nods
00:14 <ivorw> However, the cache can be primed with pages resulting from an
SQL query, in addition to those from a keyword inverted index search
00:15 <ivorw> The idea is that including locale=camden causes all locale
camden pages to be loaded into the cache
00:16 <ivorw> But a full and & or syntax is available for applying to the
cache
00:17 <ivorw> I'm now looking for a syntax for metadata qualifiers on a
search
00:19 <hex> hmm... so the search would be faster because locale camden would
be cached?
00:20 <ivorw> hex, not quite - the search tree only works over a single hash
(cache). The idea is that everything that could possibly match the search is
pre-loaded into the cache.
00:21 <ivorw> Prior to O/G, the usemod search worked by slurping the whole
wiki into the hash every time.
00:22 <hex> blimey
00:22 <ivorw> Although this works, SII et al provide a better mechanism for
subsetting the data, and something that will not blow the server up with a
substantial query and dataset
00:23 <hex> right, yes.
00:23 <ivorw> My bodge (which works prety well) is to 'prime' the input hash
with the results of an inverted index search on all of the keywords supplied
- regardless of and/or syntax
00:24 <ivorw> However, only the SII is used, so the search will only find
words in the body text of the page, not in the metadata
00:25 <hex> so we need a mechanism for searching metadata?
00:26 <ivorw> I want to keep this idea of a primed cache, and load it with
SQL query results, to provide just that
00:28 <ivorw> How about: King's Head&locale=acton&category=real ale
00:29 <ivorw> Or, how about: locale=west end&category=pubs
00:30 <hex> ah, you mean search syntax
00:30 <hex> I quite like Google style
00:30 <hex> King's Head locale:"West End" category:Pubs
00:31 <ivorw> Ah, but are spaces allowed in metadata field names?
00:32 <hex> I dunno, but surely that could be handled without the user being
involved...
00:32 <hex> s/ /_/g on the fly sort of thing and vice versa
00:33 <ivorw> Just thinking that some syntax might be tricky or ambiguous
00:34 <ivorw> the last post code:foo #Is this matching on post code or
code?
00:35 <hex> oh, right!
00:35 <hex> no, I think all metadata is one word
00:36 <hex> there could always be a pidgin syntax for it anyway
00:36 <hex> phone:12345
00:36 <ivorw> taken from google again (dig them pidgeons :)
00:37 <ivorw> If we can name every meta field with \w chars, this will be OK
00:38 <ivorw> We could have aliases, phone, telephone, tel, etc.
00:39 <hex> yup!
00:39 <ivorw> Given my previous suggestion, I would quite like to be able to
do regexp matches
00:40 <ivorw> e.g. king's head&post_code=~W3
00:41 <hex> must the search terms be joined by '&'?
00:42 <hex> I would have thought a magic word would suffice, or the =
00:42 <ivorw> that was just my previous syntax.
00:42 <hex> ah, I follow.
00:42 <hex> but yes, regexen++
00:44 <ivorw> How about dropping the &s: locale=west end category=pubs
distance(530546,181503)<200
00:45 <ivorw> Note, that's the grid ref of Holborn tube
00:46 <hex> I prefer colons to equals, simply because I think people are
more used to Google style
00:46 <hex> but that may just be me
00:47 <ivorw> How about colons for a straight match, and =~ for a regexp
00:47 <hex> "west end" would need quotes so you don't search for "end" in
"locale:west"
00:47 <hex> yes, that sounds great.
00:48 <ivorw> might want to delimit the regexp
00:48 <hex> actually, could it be :~ for a regexp, for consistency?
00:48 <hex> I know that's not very perlish....
00:48 <ivorw> yup, why not tho
00:48 <hex> cool.
00:49 <ivorw> what do you think of my idea of a distance 'function'?
00:51 <hex> how about:
00:51 <hex> near:530546,181503 range:200m
00:51 <hex> a little easier to read
00:51 <hex> (and write)
00:51 <hex> I love the idea
00:52 <ivorw> with a default range presumably
00:52 <hex> hmm, yes
00:52 <hex> units: m, ft, yds, mi, km
00:52 <hex> (maybe)
00:53 <ivorw> Didn't someone give a talk on a module to handle dimensions?
00:53 <hex> dunno :)
00:53 <ivorw> I recall one last year in State51
00:54 <ivorw> Alex Gough: - Meaningful Strong Typing with Data::Dimensions
00:55 <ivorw> unfortunately, the link for the slides is broken :(
00:55 <hex> bug him on the list:)
00:55 <hex> listen, I must run, or rather sleep, my eyes are closing
00:56 <hex> this is very promising stuff
00:56 <ivorw> OK, noe wurriz - Thanks for the braindump receptacle
00:56 <hex> no problem!
00:56 <hex> seeya...
00:56 <ivorw> nn
--
# Earle Martin http://c2.com/cgi/wiki?EarleMartin
$a="f695a9a2176a7dd1618af6649896ee10f05ea986de18af6277e9a1d8ef4696644569a1d".
"8ef46961ae1e64277e9896eea7d92ea8003e9a1d8ef4696f6950";$b="8ALB6AIA4.BA2";$c=
join"",unpack"C*",$b;$c=~s/7/2/g;@b=split"",$c;foreach$d(@b){$e=hex(substr($a
,$f,$d));while(length($e)<8){substr($e,0,0)=0;}print pack"b8",$e;$f+=$d;}
I just got this very justified bug report. The entire install
procedure really needs overhauling. Can I have a volunteer to
spearhead this, please? If I don't get one within a week then I'll
ask around more widely. Please don't volunteer unless you are
committed to producing some real changes within, say, two weeks of
starting.
I'm feeling the need to expand the team anyway since the workload does
seem to be getting greater than we can manage in a reasonable timeframe.
I am feeling the burden falling quite heavily on me, and this is meant
to be a team effort.
Kake
----- Forwarded message from Guest via RT <bug-OpenGuides(a)rt.cpan.org> -----
From: "Guest via RT" <bug-OpenGuides(a)rt.cpan.org>
Date: Tue, 30 Sep 2003 19:04:10 -0400 (EDT)
To: "AdminCc of cpan Ticket #3916": ;
Subject: [cpan #3916] Config.pm leaks database password
This message about OpenGuides was sent to you by guest <> via rt.cpan.org
Full context and any attached attachments can be found at:
<URL: https://rt.cpan.org/Ticket/Display.html?id=3916 >
In most standard installs, perl module files will be installed world-readable; therefore it is not appropriate to store the database password in Config.pm.
(I'm not actually sure why this file exists at all; surely we should be using the relevant wiki.conf anyway? storing configuration data in a perl module strikes me as exceedingly horrid)
----- End forwarded message -----
Earle,
The rather irritating bug whereby tables generate loads of blank lines, has been fixed in the latest CGI::Wiki::Formatter::UseMod
Kake is running the Vegan Oxford site with this version in there, and it's working fine.
Can we please fix the London site (and any others).
Thanx,
Ivor.
Accidentally sent this to the cgi-wiki list. doh!!
As promised to Kake last night, I've written some config file location
code that isn't specific to my installation. Can people take a look and
see what they think please?
Jody
----- Original Message -----
From: "Kate L Pugh" <kake(a)earth.li>
To: "Discussion of the Open Guide to London." <openguides-london(a)openguides.org>
Cc: <cgi-wiki-dev(a)earth.li>
Sent: 11 September 2003 20:09
Subject: Re: [OpenGuides-London] Openguides london error message
> On Wed 03 Sep 2003, Billy Abbott <billy(a)cowfish.org.uk> wrote:
> > just got this error when trying to add a new entry,
> >
> > Search::InvertedIndex::update() - Failed to save updated
> > 'ged_000000000000_a_000000000972' -> (list of ranked keys) at
> > /home/earle/openguides.org/lib/CGI/Wiki/Search/SII.pm line 204
>
> I've had this one before. There is *something* up with
> Search::InvertedIndex but for the life of me I know not what. The
> author of Search::InvertedIndex is a very nice person but is too busy
> to spend much time on it. Does this scratch anyone's itch? I have
> some code from him that is more recent than the latest CPAN version,
> plus a reworking of the docs that I did to make them more
> understandable. Give me a shout if Search::InvertedIndex is something
> that you'd like to play with.
>
Kake,
You'll be please to know that I am actively working on the enhancements to supersearch. One of the effects will be reducing our
dependence on the SII. In fact category and locale searches will not use the SII at all.
If there are any problems with querying the inverted index (rather than updating it), I would be interested and willing to assist,
especially as I know exactly how supersearch uses it.
Ivor.
Hi Pete,
I understand you had some trouble with the OpenGuides user interface
the other week. Could you let us know what the problem was so we can
fix it? Thanks!
Kake
On Sat 06 Sep 2003, Ivor Williams <ivor.williams(a)tiscali.co.uk> wrote:
> Not sure if this reached you or whether the list is up...
Did you send a mail to the list that didn't make it through? Can you
mail Tantrix if so?
>> OK, Have had a brief look - SuperSearch.pm is not checked into CVS ?!
>> but I found it in the site-perl directory tree.
Whoops, OK, it's in CVS now. Thanks.
>> I'm having a look at the code. It looks good, and much along the
>> lines I was thinking. What is still broken, that you'd like me to
>> look at?
Off the top of my head, the " " and [ ] searches don't seem to work -
if you'd like to do some work then those might be useful.
>> Have you got a version of OG up which includes all the changes so far?
I've released 0.24 to CPAN now. The Vegan Oxford guide is running on
it, and I think hex plans to upgrade London today.
Kake
I'm *sure* I already sent this, but I can't find it in my outbox.
Ivor, I have ripped out the supersearch.cgi stuff into a module
OpenGuides::SuperSearch, and added tests. Can you take a look at it
ASAP as I don't like the search on london.openguides being broken, so
I'd like to do a new release in the next few days.
Kake
[cced to OpenGuides and CGI::Wiki dev lists - Matt, please feel free
to join either or both of these. URLs at bottom of mail.]
Hi Matt,
Paul Mison pointed me at your post about wikis:
http://a.wholelottanothing.org/archives.blah/007391
Sounds like you might be interested in some stuff I've been writing.
CGI::Wiki is a perl distribution (available on CPAN) that provides a
backend for wikis and wiki-like applications. Despite the name, it
does no CGI whatsoever - it just stores, indexes and retrieves content
and metadata.
>From the start of my work on CGI::Wiki I wanted to make it generic
enough to be used in as wide a variety of situations as possible.
There's a choice of backends (currently MySQL, Postgres and SQLite,
but people have expressed interest in writing a flat-file backend
too), and since it's a toolkit - providing methods - rather than an
actual application, you can use it wherever you like.
OpenGuides (also available on CPAN) is built on top of CGI::Wiki.
It's a complete web application providing a management system for a
collaboratively-written city guide. The London install is currently
the largest - see
http://openguides.org/london/
Other installs are linked from
http://openguides.org/
OpenGuides uses CSS and the Template Toolkit and our aim has always
been to produce something completely suited to its purpose, rather
than shoehorning our content into something that looks and smells like
every other wiki site out there. It does RDF too, so it's all semantic
and stuff.
Both CGI::Wiki and OpenGuides are in active development. They're
certainly not what I would call mature yet, but they're definitely
usable. Here are the links for the dev lists:
http://www.earth.li/cgi-bin/mailman/listinfo/cgi-wiki-devhttp://realprogrammers.com/mm/listinfo/openguides-dev
And the code is on CPAN:
http://search.cpan.org/author/KAKE/
Kake