I provided some of my programs with a feedback function. Unfortunately I forgot to include some sort of spam-protection – so users could send anything they wanted to my server – where every feedback is stored in a huge db.
In the beginning I periodically checked those feedbacks – I filtered out what was usable and deleted garbage. The problem is: I get 900 feedbacks per day. Only 4-5 are really useful, the other messages are mostly 2 type of gibberish:
- nonsense: jfvgasdjkfahs kdlfjhasdf (People smashing their heads on the keyboard)
- language i don’t understand
What I did so far:
-
I installed a filter to delete any feedback containing ‘asdf’, ‘qwer’ etc… -> only 700 per day
-
I installed a word filter to delte anything containing bad language -> 600 per day (don’t ask – but there are many strange people out there)
- I filter out any messages containing letters not being used in my language -> 400 per day
But 400 per day is still way too much. So I’m wondering if anybody has dealt with such a problem before and knows some sort of algorithm to filter out senseless messages.
Any help would really be appreciated!
How about just using some existing implementation of a bayesian spam filter instead of implementing your own. I have had good results with DSpam