----- Original Message ----- From: "Kate L Pugh" kake@earth.li To: "Discussion of development on the OpenGuides software." openguides-dev@openguides.org Sent: 12 October 2003 07:49 Subject: Re: [OpenGuides-Dev] Search and search syntax
In fact, it's not AND or OR by default, but a phrase search. In this case looking for "sausages oysters", which doesn't make much sense.
I think I'm going to have to have another look at the code when I've not just got up, because I can't see why this is. Can someone explain it to me?
The relevant line of the RecDescent is as follows:
| word(s) {$return = ['word', @{$item[1]}]}
This results in a series of words being turned into a list with 'word' at the head, i.e. ['word','sausages','oysters']
[snip...] - but the thinking behind the way the tree is built up. You need to say *why* you're doing what you're doing.
Fine. What the code is doing is contructing a tree of nodes representing the query. Each node is an arrayref, and the first item of the array is a node type. Here are some examples.
['word', 'pub'] ['word', 'the', 'green', 'man'] ['AND', ['word', 'restaurant'], ['word', 'vegan'] ] ['OR', ['word', 'cheap'], ['word', 'value'] ]
In the original CGI version of SuperSearch was a debug line commented out:
# print $outstr,pre(Dumper($tree));
Although we don't want the <pre> tag necessarily, this is a way to see the output of the parse, and fix problems with the grammar.
-_-_-
What happens to this tree is that it is walked recursively. This is what _matched_items does. This results in calls to matched_word, matched_AND etc.
Also of note is that the word nodes trigger a call to _prime_wikitext to load up the base text. I think that there is a potential bug here, as the original intention here was to prime the wikitext once based on a complete OR of all word searches in the inverted index, then applying boolean logic via the parse tree. I wouldn't be surprised if some of the ANDs, ORs and NOTs don't work properly.
The solution is to do a pre-pass of the tree, priming on each word node. Also, to do this, _prime_wikitext should not empty out the hash every time, as it's being called more than once.
Looking at _prime_wikitext, I see you have incorporated category and locale searches in it. I had started to do this, but didn't know about your call list_nodes_by_metadata.
Anyway, here is an alternative grammar which is a drop-in replacement, which does a google style AND by default, and gives you AND if you separate the words with commas. Phrase search is still available, by passing a string bounded by "".
Thanks - can you write some tests for this? Don't worry if you can't, it'll just mean a short delay while I find time to do it.
I've not got my head round where the test database is and what it's got in it.
I also have a version of SuperSearch.pm where the functionality of _perform_search was split down, with other methods _build_parser and _apply_parser.
This is attached. Beware, this code has seriously branched from the SuperSearch.pm in the latest release. I am providing it for ideas, and for a resolution of the _prime_wikitext issue above (which is solved by making _prime_wikitext recursive and giving it the whole parse tree). Enjoy.
Also, if you have any further questions or issues on this, I will be willing to help.
Ivor.