----- Original Message -----
From: "Kate L Pugh" <kake(a)earth.li>
To: "Discussion of development on the OpenGuides software."
<openguides-dev(a)openguides.org>
Sent: 12 October 2003 07:49
Subject: Re: [OpenGuides-Dev] Search and search syntax
In fact,
it's not AND or OR by default, but a phrase search. In this case
looking for "sausages oysters", which doesn't make much sense.
I think I'm going to have to have another look at the code when I've
not just got up, because I can't see why this is. Can someone explain
it to me?
The relevant line of the RecDescent is as follows:
| word(s) {$return = ['word', @{$item[1]}]}
This results in a series of words being turned into a list with 'word' at the
head,
i.e. ['word','sausages','oysters']
[snip...] - but the thinking behind the way
the tree is built up. You need to say *why* you're doing what you're
doing.
Fine. What the code is doing is contructing a tree of nodes representing the query.
Each node is an arrayref, and the first item of the array is a node type.
Here are some examples.
['word', 'pub']
['word', 'the', 'green', 'man']
['AND', ['word', 'restaurant'], ['word', 'vegan']
]
['OR', ['word', 'cheap'], ['word', 'value'] ]
In the original CGI version of SuperSearch was a debug line commented out:
# print $outstr,pre(Dumper($tree));
Although we don't want the <pre> tag necessarily, this is a way to see the
output of the parse, and fix problems with the grammar.
-_-_-
What happens to this tree is that it is walked recursively. This is what _matched_items
does. This results in calls to matched_word,
matched_AND etc.
Also of note is that the word nodes trigger a call to _prime_wikitext to load up the base
text. I think that there is a potential
bug here, as the original intention here was to prime the wikitext once based on a
complete OR of all word searches in the inverted
index, then applying boolean logic via the parse tree. I wouldn't be surprised if some
of the ANDs, ORs and NOTs don't work
properly.
The solution is to do a pre-pass of the tree, priming on each word node. Also, to do this,
_prime_wikitext should not empty out the
hash every time, as it's being called more than once.
Looking at _prime_wikitext, I see you have incorporated category and locale searches
in it. I had started to do this, but didn't know about your call
list_nodes_by_metadata.
Anyway, here is an alternative grammar which is a
drop-in replacement,
which does a google style AND by default, and gives you AND if
you separate the words with commas. Phrase search is still available, by
passing a string bounded by "".
Thanks - can you write some tests for this? Don't worry if you can't,
it'll just mean a short delay while I find time to do it.
I've not got my head round where the test database is and what it's got in it.
I also have a version of SuperSearch.pm where the functionality of _perform_search was
split down, with other methods _build_parser
and _apply_parser.
This is attached. Beware, this code has seriously branched from the SuperSearch.pm in the
latest release. I am providing it for
ideas, and for a resolution of the _prime_wikitext issue above (which is solved by making
_prime_wikitext recursive and giving it
the whole parse tree). Enjoy.
Also, if you have any further questions or issues on this, I will be willing to help.
Ivor.