Working on the search has caused me to mull on a particular problem:
Which searches will currently find "King's head"?
king Yes, ' is a non-word character which matches \b kings No king's Yes
I think ideally that we want the middle one to work also.
Then I thought of the epic "King's Cross St Pancras". How many ways to write that one out?
And I realised that this is not just a search issue but a linking issue as well. I recall the problem we had with "Regent's Park" and "Regents Park", which has been worked around with a redirect.
Also, I'm wondering about having a list of standard abbreviations somewhere, which gets applied in-line as part of the node_name_to_node_title munging:
ave => avenue ct => court gdns => gardens hse => house rd => road st => street st => saint ...oops!
This is the end of my braindump on this. I need input from others.
It will also be generating more test cases for the search :).
Ivor.
Which searches will currently find "King's head"? I think ideally that we want the middle one to work also. Then I thought of the epic "King's Cross St Pancras".
definitely yes. i'd never really considered this in mudlondon. i think the canonical epic comes with a 'St.' ;)
ave => avenue ct => court st => saint ...oops!
cute. hrmm, the redirect problem. - you could implement something like symbolic links between pages - a redirect in a little loop of code which looks for more known special cases like 'kings', perhaps from a config file - also allow the user to set alternaspellings when they create a page. we did this for state51's music search. (and many artists may have alias or changed names)
is that overcomplicating? this does look like a tricky one. the opinion of a search strategy person would be interesting.
zx -- "Common sense won't tell you. We have to tell each other." -DNA
Ivor Williams wrote:
Working on the search has caused me to mull on a particular problem:
Which searches will currently find "King's head"?
king Yes, ' is a non-word character which matches \b kings No king's Yes
I think ideally that we want the middle one to work also.
You have to normalise data before stuffing it into the database, and normalise user input before comparing it to the database.
Also, I'm wondering about having a list of standard abbreviations somewhere, which gets applied in-line as part of the node_name_to_node_title munging:
ave => avenue
also av
ct => court gdns => gardens
also gdn
hse => house rd => road st => street st => saint ...oops!
pa => pass => passage gt => great x => cross (eg Charing X, Kings X) lwr => lower up => upper sq => square cir => circ => circle or circus pk => park la => lane TCR ;-) stn => station jcn => jct => junction comm => common va => vale lvl => level hth => heath br => bridge ga => gate cnr => corner bor => borough
openguides-dev@lists.openguides.org