I've written a spam remover tool for OpenGuides... it detects at least the popular spam on the Lancaster guide, and requests the page to delete it.
You'll need Ruby.
Find it at http://www.lancs.ac.uk/~shawc2/programming/ogspam/ogspam.tgz
Clair
Fantastic, thanks Clair!
Does it also handle the Category and Locale spam (e.g. Category Free Ringtones)?
I can't work out how it gets in exactly (seems an odd/cunning method that doesn't show up in many places) but damn its annoying (and seemingly pointless, as there's so little content in the pages, not even any spam links - weird).
Tom.
On 05/07/06, Clair openguides-lancaster@nimoll.co.uk wrote:
I've written a spam remover tool for OpenGuides... it detects at least the popular spam on the Lancaster guide, and requests the page to delete it.
You'll need Ruby.
Find it at http://www.lancs.ac.uk/~shawc2/programming/ogspam/ogspam.tgz
Clair
-- OpenGuides-Dev mailing list - OpenGuides-Dev@openguides.org http://openguides.org/mm/listinfo/openguides-dev
Tom Heath wrote:
Fantastic, thanks Clair!
Does it also handle the Category and Locale spam (e.g. Category Free Ringtones)?
I can't work out how it gets in exactly (seems an odd/cunning method that doesn't show up in many places) but damn its annoying (and seemingly pointless, as there's so little content in the pages, not even any spam links - weird).
It doesn't - I've never seen it before.
Do you have any examples? It should be simple enough to add.
Clair
Sure...
Looking at http://everywhere.openguides.org/ suggests it comes in some slightly different forms. At the OGMK we get new Categories and Locales that go something like:
Locale/Category Rolex Replicas Locale/Category How to Make Money
but they have no content. From http://everywhere.openguides.org/ scroll down to Open Guide to Milton Keynes, then entries "by Auto Create at July 05, 2006 11:38 AM". I've since removed these pages from our guide, but they _will_ reappear <sigh>
At OGLancaster (and OGLondon) it looks like you're getting similar stuff, but with URLs embedded in the category and locale names, which weirdly we don't get in the OGMK (a slightly different config perhaps??).
Either way, http://everywhere.openguides.org/ is a great way of spotting these, as it reads the RSS feed of recent changes, which has *different* data to the HTML page of Recent Changes. I think there's something to be said for this (though not clear about how it's working in the underlying code), as it keeps the spam off the Home and Recent Changes pages, whilst still allowing Admins to read the RSS feeds to see what's really going on. I have no idea if this was an explicit design decision, or simply the way it happened.
Hope this helps :) and good luck. Is your tool something that could be deployed centrally to clean spam from all OGs? If yes, what do people think of that suggestion?
Cheers,
Tom.
On 05/07/06, Clair openguides-lancaster@nimoll.co.uk wrote:
Tom Heath wrote:
Fantastic, thanks Clair!
Does it also handle the Category and Locale spam (e.g. Category Free Ringtones)?
I can't work out how it gets in exactly (seems an odd/cunning method that doesn't show up in many places) but damn its annoying (and seemingly pointless, as there's so little content in the pages, not even any spam links - weird).
It doesn't - I've never seen it before.
Do you have any examples? It should be simple enough to add.
Clair
-- OpenGuides-Dev mailing list - OpenGuides-Dev@openguides.org http://openguides.org/mm/listinfo/openguides-dev
On Thu, Jul 06, 2006 at 10:37:52AM +0100, Tom Heath wrote:
Hope this helps :) and good luck. Is your tool something that could be deployed centrally to clean spam from all OGs? If yes, what do people think of that suggestion?
I haven't looked at Clair's tool yet, but any long-term solution needs to be based on preventing the spam from going live to start with (via moderation, rule-based spam filtering etc).
I've mentioned this a couple of times before but noone's taken me up on it: I use a hack on Oxford to implement moderation. Regular contributors know the moderation URL so they will be able to approve their stuff, but I get emailed and try to process all genuine contributions quickly.
It *is* a hack and certainly not the recommended way to implement moderation - that comes later with Wiki::Toolkit's proper moderation support, when we get round to it :)
In the mean time, if you're comfortable with running a hack and possibly cleaning up your database manually later, you can grab it from:
http://www.larted.org.uk/~dom/computing/code/openguides/moderation.patch (patch against OpenGuides.pm - please change the email address!)
http://www.larted.org.uk/~dom/computing/code/openguides/moderate.cgi moderation CGI interface.
Dominic.
On Thu, Jul 06, 2006 at 10:54:03AM +0100, Dominic Hargreaves wrote:
http://www.larted.org.uk/~dom/computing/code/openguides/moderation.patch (patch against OpenGuides.pm - please change the email address!)
http://www.larted.org.uk/~dom/computing/code/openguides/moderate.cgi moderation CGI interface.
To clarify: these are for OpenGuides 0.56 with the Wiki::Toolkit DB schema. I can dig out the old versions if you really want, but really, just upgrade to the latest :)
Dominic.
On Thu, 6 Jul 2006, Dominic Hargreaves wrote:
It *is* a hack and certainly not the recommended way to implement moderation - that comes later with Wiki::Toolkit's proper moderation support, when we get round to it :)
In case people don't know, with Wiki::Toolkit, it's possible to flag individual pages as requiring moderation (optionally also all new pages). These enter the versions list (content/metadata), but don't update the current version (nodes), until someone moderates them.
What we need for openguides to use it is: * an admin interface to let you toggle moderation on and off for nodes (Wiki::Toolit->set_node_moderation) * an admin interface to let you moderate entries (Wiki::Toolkit->moderate_node) * optionally also a config flag to allow you to select if new entries automatically get their "required moderation" flag set (optional 5th parameter to Wiki::Toolkit->write_node)
Nick
Tom Heath wrote:
Sure...
Looking at http://everywhere.openguides.org/ suggests it comes in some slightly different forms. At the OGMK we get new Categories and Locales that go something like:
Locale/Category Rolex Replicas Locale/Category How to Make Money
but they have no content. From http://everywhere.openguides.org/ scroll down to Open Guide to Milton Keynes, then entries "by Auto Create at July 05, 2006 11:38 AM". I've since removed these pages from our guide, but they _will_ reappear <sigh>
Ah - that's an easy one to kill :)
At OGLancaster (and OGLondon) it looks like you're getting similar stuff, but with URLs embedded in the category and locale names, which weirdly we don't get in the OGMK (a slightly different config perhaps??).
Seems there's less of a pattern to this spam than I thought - seems I'll have to find a new way of recognising it..
Either way, http://everywhere.openguides.org/ is a great way of spotting these, as it reads the RSS feed of recent changes, which has *different* data to the HTML page of Recent Changes. I think there's something to be said for this (though not clear about how it's working in the underlying code), as it keeps the spam off the Home and Recent Changes pages, whilst still allowing Admins to read the RSS feeds to see what's really going on. I have no idea if this was an explicit design decision, or simply the way it happened.
Yeah, I'm not sure how it works myself :) Found lots when I looked thoguh - thanks for reminding me!
Hope this helps :) and good luck. Is your tool something that could be deployed centrally to clean spam from all OGs? If yes, what do people think of that suggestion?
The problem with that is the app requires a password, and since every guide has a different password, it isn't feasible. (Though the scripts could be ran centrally, if required, with different config files). If there's a need for it, I could implement a feature where you could enter the url/password in the commandline?
Clair
Thanks to Tom, I've now completed v1.2 - change being it now checks for certain keywords in the category/locale list, rather than for "Great site!"
There's a small chance some spam will be missed, or there will be false positives, I'd appreciate any reports of these.
Of course, I agree totally with Dom's view that we ought to be preventing spam from getting there in the first place, something which I think having real user registration will help with. (We can do things like moderate IP edits, allow certain users to moderate, etc easier.)
Clair
Sounds great Clair :)
Yes I also agree that the moderation route sounds like a good one. In the end it can't be any more onerous than clearing out the spam. Thanks Nick and Dom for the Wiki::Toolkit information.
Tom.
On 06/07/06, Clair openguides-lancaster@nimoll.co.uk wrote:
Thanks to Tom, I've now completed v1.2 - change being it now checks for certain keywords in the category/locale list, rather than for "Great site!"
There's a small chance some spam will be missed, or there will be false positives, I'd appreciate any reports of these.
Of course, I agree totally with Dom's view that we ought to be preventing spam from getting there in the first place, something which I think having real user registration will help with. (We can do things like moderate IP edits, allow certain users to moderate, etc easier.)
Clair
-- OpenGuides-Dev mailing list - OpenGuides-Dev@openguides.org http://openguides.org/mm/listinfo/openguides-dev
openguides-dev@lists.openguides.org