Hi folks,
I'm sorry for the short notice but I have intervened and disabled supersearch.cgi as follows,
chmod a-x /home/earle/openguides.org/london/supersearch.cgi chmod a-x /home/earle/openguides.org/manchester/supersearch.cgi chmod a-x /home/earle/openguides.org/reading/supersearch.cgi
I have never edited permissions on anyone's files on any site without being requested to so this is exceptional. I didn't receive any plans on how to fix this issue since I highlighted it 21 Oct 04 and with about 20 instances of this process driving the load over 25 just now I took action.
Feel free to switch it on at your discretion when you feel it's not a liability anymore. Please let me know when you do this.
(For those curious about the extent to which a process has to go to to bring this machine down, it's a 1.8GHz P4 512MB RAM and RAID0 7,200rpm disks.)
Paul
On Tue, Nov 02, 2004 at 01:31:28PM +0000, Paul Makepeace wrote:
I didn't receive any plans on how to fix this issue since I highlighted it 21 Oct 04 and
Kake, are you still working on this? Do you need anything from anyone else to speed things up (eg to help refactor code to retain S::II support or similar).
Cheers,
Dominic.
Je 2004-11-02 14:57:37 +0000, Dominic Hargreaves skribis:
On Tue, Nov 02, 2004 at 01:31:28PM +0000, Paul Makepeace wrote:
I didn't receive any plans on how to fix this issue since I highlighted it 21 Oct 04 and
Kake, are you still working on this? Do you need anything from anyone else to speed things up (eg to help refactor code to retain S::II support or similar).
Even just something that kept a track of number of existing executing processes and hung until one was finished, would be enough. A server- push animation `a la airline companies would do it.
Is there a log of what's searched for, where from, and when? Would be interesting to see the pattern of these "attacks" (it certainly appears like an attack), not to mention investigating caching common searches, if you don't already.
Paul
On Tue, Nov 02, 2004 at 03:10:34PM +0000, Paul Makepeace wrote:
Even just something that kept a track of number of existing executing processes and hung until one was finished, would be enough. A server- push animation `a la airline companies would do it.
One possibility, but I believe Kake has made some improvements to the efficiency of searching too.
Is there a log of what's searched for, where from, and when? Would be interesting to see the pattern of these "attacks" (it certainly appears like an attack), not to mention investigating caching common searches, if you don't already.
That information would normally be available in the Apache access logs; I can't speak for the precise setup on london.openguides.org though.
Dominic.
Je 2004-11-02 15:14:57 +0000, Dominic Hargreaves skribis:
On Tue, Nov 02, 2004 at 03:10:34PM +0000, Paul Makepeace wrote:
Is there a log of what's searched for, where from, and when? Would be interesting to see the pattern of these "attacks" (it certainly appears like an attack), not to mention investigating caching common searches, if you don't already.
That information would normally be available in the Apache access logs; I can't speak for the precise setup on london.openguides.org though.
Ah, it's GET. Cool.
$ grep "GET /supersearch.cgi" /var/log/apache/london.openguides.org-access.log > /tmp/supersearch
Edited /etc/analog.cfg to include an expression to teach it about OpenGuides searching,
--- /etc/analog.cfg.orig 2004-11-02 15:29:31.000000000 +0000 +++ /etc/analog.cfg 2004-11-02 15:29:40.000000000 +0000 @@ -98,2 +98,3 @@ SEARCHENGINE http://*/pursuit query +SEARCHENGINE http://*openguides.*/* search ROBOTINCLUDE REGEXPI:robot
$ analog /tmp/supersearch > ~paulm/tmphtml/supersearch.html
http://junk.paulm.com/supersearch.html
Fewer than I thought. It must really be working hard.
HTH, Paul
On Tue 02 Nov 2004, Dominic Hargreaves dom@earth.li wrote:
Kake, are you still working on this? Do you need anything from anyone else to speed things up (eg to help refactor code to retain S::II support or similar).
I think I'm going to be able to keep the S::II support, so you can breathe easy. I don't think anyone else can really help - you'd need time to get the hang of the code; it's truly labyrinthine. But thanks, It'll be a lot clearer after the rewrite.
Kake
On Tue 02 Nov 2004, Paul Makepeace openguides.org@paulm.com wrote:
I didn't receive any plans on how to fix this issue since I highlighted it 21 Oct 04
Sorry Paul, I thought you were on the list so I didn't cc my original reply to you. I profiled the search and found out that the reason it's so inefficient is that it's basically ignoring all indexing and doing the search with a regex!
I started working on an improvement but health issues got in the way - it's about half done, will have it finished by the end of the weekend (that's factoring in a few days for potential future being-ill).
Kake
----- Original Message ----- From: "Kake L Pugh" kake@earth.li To: "OpenGuides software developers" openguides-dev@openguides.org Sent: 02 November 2004 20:30 Subject: Re: [OpenGuides-Dev] supersearch.cgi disabled
... I profiled the search and found out that the reason it's so inefficient is that it's basically ignoring all indexing and doing the search with a regex!
That's interesting. You've done some profiling to show that the code is not spending the majority of its time in Plucene or SII.
How big a result set are these back-ends returning?
The original idea was that they return only a set of candidate pages which could satisfy the search, and the regex does the syntactic fine filtering on this.
Iterating the parse tree and regexen was only designed to be run on a very small subset of the wiki pages - the returned candidate matches.
If the backend searches are returning large result sets, this could explain the problem.
Hope this helps,
Ivor.
On Fri 05 Nov 2004, IvorW ivorw-openguides@xemaps.com wrote:
That's interesting. You've done some profiling to show that the code is not spending the majority of its time in Plucene or SII.
It's spending a negligible amount of time in everything apart from that one statement with the regex in. A single search on "pub" spends *over eleven seconds* there.
How big a result set are these back-ends returning?
For a search on "pub" - almost 300. They return what they've been asked for, which is every page with that word in.
That's fine in itself; the problem is that the "super"search then hunts through each and every one of these using a regex to score them - and create a summary for each one, regardless of whether or not it's needed for the current page of results.
Hope this helps,
There's no help for a mess like this but to rewrite it from scratch. I can only apologise to everyone for having failed to properly supervise this part of the code. Hopefully Dom will keep a sharper eye on his developers... :)
Kake
On Tue, Nov 02, 2004 at 01:31:28PM +0000, Paul Makepeace wrote:
I'm sorry for the short notice but I have intervened and disabled supersearch.cgi as follows,
Fair enough. I've put in a holding page for the moment to avoid getting server errors.
I've put up a preliminary version of the rewritten supersearch at http://london.openguides.org/kakemirror/supersearch.cgi
Profiling indicates this is way, way faster than the old version.
The summaries aren't working as I write, though they may be as you read. (I'm going to have some breakfast first though).
Please can everyone test this and give me feedback ASAP? Minor tweaking of the result ordering can be done later; I'm mostly interested in things that are glaringly wrong or obviously broken.
(It's not in CVS yet but it will be once I've eaten and checked it over. Search::InvertedIndex support is probably intact but also probably not as good with the ordering.)
Thanks,
Kake
On Sat 06 Nov 2004, Kake L Pugh kake@earth.li wrote:
(It's not in CVS yet but it will be once I've eaten and checked it over. Search::InvertedIndex support is probably intact but also probably not as good with the ordering.)
Now committed. Not done the summaries yet so set that test to skip.
Kake
On Sat, Nov 06, 2004 at 07:31:16AM +0000, Kake L Pugh wrote:
I've put up a preliminary version of the rewritten supersearch at http://london.openguides.org/kakemirror/supersearch.cgi
Please can everyone test this and give me feedback ASAP? Minor tweaking of the result ordering can be done later; I'm mostly interested in things that are glaringly wrong or obviously broken.
I've had a play with this install and it seems to be working fine with my tests. It still feels quite slow to me (up to 10s of wallclock) for a page to be returned, but as long as that is down to external factors, it looks good.
Cheers,
Dominic.
Je 2004-11-09 19:38:11 +0000, Dominic Hargreaves skribis:
On Sat, Nov 06, 2004 at 07:31:16AM +0000, Kake L Pugh wrote:
I've put up a preliminary version of the rewritten supersearch at http://london.openguides.org/kakemirror/supersearch.cgi
Please can everyone test this and give me feedback ASAP? Minor tweaking of the result ordering can be done later; I'm mostly interested in things that are glaringly wrong or obviously broken.
I've had a play with this install and it seems to be working fine with my tests. It still feels quite slow to me (up to 10s of wallclock) for a page to be returned, but as long as that is down to external factors, it looks good.
Encouraging news. I'm going to say it again since I've been Warnocked on this so far:
The issue is that the system is not resilient to bursts of searches. The system needs to be able to resist that. Making it go faster is great as it increases the number of searches possible per second but until there is something that queues searches when there are more than 'n' happening we're still running a risk.
Does this make sense? Presumably this is not hard to solve, and certainly easier than optimising search algos.
Paul
On Tue 09 Nov 2004, Paul Makepeace openguides.org@paulm.com wrote:
I'm going to say it again since I've been Warnocked on this so far:
The issue is that the system is not resilient to bursts of searches. The system needs to be able to resist that. Making it go faster is great as it increases the number of searches possible per second but until there is something that queues searches when there are more than 'n' happening we're still running a risk.
I think what people aren't understanding about your idea is what makes "a CGI script that searches an OpenGuide" so intrinsically different from all other CGI scripts that the only way to make it not take down a server is to limit the number of copies that can run at one time.
Or are you saying that there's something unusual about the server setup? Or that all CGI scripts other than the trivial should do this? Or that you've checked the code and looked at Plucene and done something compsci-ish (*waves hands*) to determine that the problem can't be solved in a non-CPU-intensive way?
(So I think this was Warnock Dilemma #4.)
Kake
Je 2004-11-10 06:12:18 +0000, Kake L Pugh skribis:
On Tue 09 Nov 2004, Paul Makepeace openguides.org@paulm.com wrote:
I'm going to say it again since I've been Warnocked on this so far:
The issue is that the system is not resilient to bursts of searches. The system needs to be able to resist that. Making it go faster is great as it increases the number of searches possible per second but until there is something that queues searches when there are more than 'n' happening we're still running a risk.
I think what people aren't understanding about your idea is what makes "a CGI script that searches an OpenGuide" so intrinsically different from all other CGI scripts that the only way to make it not take down a server is to limit the number of copies that can run at one time.
You're right - anything that's not executing pretty quickly is open to suffer this.
I don't know why there are sometimes dozens of instances of this script in the process table. Perhaps it's some harvesting exercise; something search scripts might be vulnerable to? Wild guess.
I'm also trying to offer a solution that isn't heavy on your time so the service could come back up asap - queuing searches during load.
Or are you saying that there's something unusual about the server setup? Or that all CGI scripts other than the trivial should do this? Or that you've checked the code and looked at Plucene and done something compsci-ish (*waves hands*) to determine that the problem can't be solved in a non-CPU-intensive way?
I don't have that much time/skill :)
P
On Wed 10 Nov 2004, Paul Makepeace openguides.org@paulm.com wrote:
I don't know why there are sometimes dozens of instances of this script in the process table. Perhaps it's some harvesting exercise; something search scripts might be vulnerable to? Wild guess.
I think it's probably because the script is so slow. People press submit, wait, "nothing happens", they think it didn't work, they press submit again and again.
I'm also trying to offer a solution that isn't heavy on your time so the service could come back up asap - queuing searches during load.
Do you know of some already-working thing that does this? There's been discussion on IRC but we haven't come up with a good way to do it yet.
Kake
Je 2004-11-11 16:23:43 +0000, Kake L Pugh skribis:
On Wed 10 Nov 2004, Paul Makepeace openguides.org@paulm.com wrote:
I don't know why there are sometimes dozens of instances of this script in the process table. Perhaps it's some harvesting exercise; something search scripts might be vulnerable to? Wild guess.
I think it's probably because the script is so slow. People press submit, wait, "nothing happens", they think it didn't work, they press submit again and again.
Hmm. Thought: if you had a queuing system one component of that would be the "please hold" server push page. This could be re-used (pre-used?) to let users know their search is in progress, quite apart from any queuing, to deter users behaving like coke-starved labrats.
I'm also trying to offer a solution that isn't heavy on your time so the service could come back up asap - queuing searches during load.
Do you know of some already-working thing that does this? There's been discussion on IRC but we haven't come up with a good way to do it yet.
I'm a frayed knot. London.pm?
P
On Thu, Nov 11, 2004 at 04:28:46PM +0000, Paul Makepeace wrote:
Je 2004-11-11 16:23:43 +0000, Kake L Pugh skribis:
I'm also trying to offer a solution that isn't heavy on your time so the service could come back up asap - queuing searches during load.
Do you know of some already-working thing that does this? There's been discussion on IRC but we haven't come up with a good way to do it yet.
I'm a frayed knot. London.pm?
Quick n' dirty solution ... on every search request, do this:
send "please wait" page if lockfile doesn't exist in /tmp create it endif flock the lockfile (use a blocking exclusive lock) [expensive bit goes here - searching and (perhaps) sorting] unflock send results
This will prevent parallel searching, the only race condition is if two processes both create the lockfile, but you need not care about that.
Expanding it to support up to N concurrent searches is fairly easy - just have N lockfiles and use non-blocking flocks to try them all, and if none are available sleep for a bit before trying again. Making N dynamic based on system load is also pretty trivial.
This one time, at band camp, Kake L Pugh wrote:
Do you know of some already-working thing that does this? There's been discussion on IRC but we haven't come up with a good way to do it yet.
If you mean code, no idea. For the concept, search for a flight on this: http://www.traveljungle.co.uk/
openguides-dev@lists.openguides.org