I am using a regex to find dupliates in a list. It is only a short comma seperated list, and performance is not an issue, so there is no need to tell me I should not use regex for those reasons.
// returns a match because some is repeated
"some,thing,here,some,whatever".match(/(^|,)(.+?)(,|,.+,)\2(,|$)/g)
Questions…
- Can this regex be improved?
- Does it cover all possible scenarios where comma is not in the seperated strings
- Is there a better (preferably more readable and more efficient) way to do this?
I don’t see the purpose of using regexes here, unless you like unimaginable pain. If I had to find duplicates I would
Obtain an array of words
optionally lowercase everything, if you feel like doing that
sort the array
Duplicates should now all be in consecutive positions of the array.
As an extra advantage, I`m pretty sure this would be vastly more efficient than a regex version.