I need to strip any non-alphanumeric characters from the end of strings using PHP’s preg_replace:
Word One, Two, -, Word One, Two,[space], Word One, Two,, Word One, Two should all become Word One, Two.
I have tried preg_replace('/(.+)\\W+$/', '$1', 'Word One, Two, -'); but this only strips the last non-word character. I also tried '/(.+)\\W*$/' as I assumed this would make it work if 0 or 1 non-word characters are found (as I need) but it then doesn’t match at all. I think I need to make the \W greedy but I’m not sure how. Any ideas? Also, please feel free to explain to me what I am doing wrong so I don’t find myself haunting the SO regex tag 😉
This is because
(.+)eats up all other character, including non-word characters. The regex engine starts matching the string and starts out with all characters in the capturing group. Only then it notices that the\Wat the end of the string won’t fit and backs up, tentatively allowing a single character to be matched by the\W. But a single character is all that’s needed to satisfy the\W+, so it just stops and just strips that single character. That’s also the reason why(.+)\W*$doesn’t work at all, because\W*is content with matching nothing at all.Use
instead. This avoids the problem by just replacing trailing non-word characters without even trying to match something else.
Another option would be
which would use a lazy quantifier (
+?) for the capturing group. This quantifier tries satisfying the match while matching as little as possible (as opposed to+which tries to match as much as possible as we saw above). But generally I’d avoid replacing parts of the match by themselves if you can avoid it. To strip things from a string you certainly don’t need to match more than you need to strip.