I am trying to clean up user submitted comments in PHP using regex but have become rather stuck and confused!
Is it possible using regex to:
-
Remove punctuation repeated more than twice so that:
OMG it was AWESOME!!!!becomesOMG it was AWESOME!!!!!!!!!!!!.........------becomes!!..--!?!?!?becomes!?
-
Remove duplicate words of phrases (for example a user has copied and pasted a message) so:
spamspamspamspambecomesspamI love copy and paste. I love copy and paste. I love copy and paste.becomesI love copy and paste.
-
Remove collections of letters and spaces longer than say 10 letters in caps:
I LOVE CAPITALS THEY ARE SO AWESOMEbecomesI love capitals they are so awesomeGOOD that soundsstays the same
-
Any suggestions you have?
This is for a student system (hence the urge to at least try and tidy up what they post), although I do not wish to go as far as filtering it or blocking their messages, just “correct” it with some regex.
Thanks for your time,
Edit:
If it isn’t possible using regex (or regex mised with other PHP) how would you do it?
1:
2:
3:
Try it here: http://codepad.org/iQsZ2vJ0