I am trying to write a validation function for strings where I want to check if the string is a copy+paste work.
Background:
We have a CMS where the user can enter description texts with a minimum of – for example – 200 Chars. A lot of user write too short texts and get the “you have to use more then 200 letters” error message.
To avoid this, they copy paste the text or some dummy strings like “AAAAA” to reach the limit.
I am looking now for an function / methode / regex to detect such copy+paste strings and prevent them by showing a message.
I know that there is no 100% solution to prevent dummy texts, but we want to reduce it a little bit. Any ideas?
There’s not going to be a fast, reliable, undefeatable solution. But I can think of a compromise:
would return
Truefor strings that contain repeated sequences of one to four characters (when they’re repeated at least three times).So it would match on strings like
It would not detect longer repetitions like
but the complexitly of the regex will grow exponentially if you try to match longer repeats, so I think four characters are a workable compromise.
Alternatively, you might want to anchor the repeats to the end of the string (which is where most people would put the filler):
but of course, then a string like
would not be detected. Your choice 🙂