On Thu, Dec 16, 2004 at 06:27:30AM -0800, Jo Walsh wrote:
perhaps a content-based wiki spam filter would be useful; it wouldn't hold off dissociated-text attacks but would certainly deal with the 'long list of medication spam links' style wikispam we are seeing now. a blacklist service?
Would using some metrics of sentence complexity and structure be useful? There was some talk recently on the london.pm list about this, in which Lingua::EN::Fathom, diction(1) and style(1) were recommended.
http://london.pm.org/pipermail/london.pm/Week-of-Mon-20041011/029431.html and follow-ups.
This is working on the assumption here that typical comment spam will get outlandish scores. Not that I know, cos I've never seen any, let alone measured it.
I'm going to pop over to spam-l now and see if anyone there has any ideas ...