On Thu, Dec 16, 2004 at 06:27:30AM -0800, Jo Walsh wrote:
perhaps a content-based wiki spam filter would be
useful; it wouldn't
hold off dissociated-text attacks but would certainly deal with the
'long list of medication spam links' style wikispam we are seeing now.
a blacklist service?
Would using some metrics of sentence complexity and structure be useful?
There was some talk recently on the london.pm list about this, in which
Lingua::EN::Fathom, diction(1) and style(1) were recommended.
http://london.pm.org/pipermail/london.pm/Week-of-Mon-20041011/029431.html
and follow-ups.
This is working on the assumption here that typical comment spam will get
outlandish scores. Not that I know, cos I've never seen any, let alone
measured it.
I'm going to pop over to spam-l now and see if anyone there has any
ideas ...
--
David Cantrell | Reality Engineer, Ministry of Information
While researching this email, I was forced to carry out some
investigative work which unfortunately involved a bucket of
puppies and a belt sander
-- after JoeB, in the Monastery