I have an automatically generated regular expression, which basically is one big “or” group like so:
(\bthe\b|\bcat\b|\bin\b|\bhat\.\b|\bhat\b)
I’ve noticed that in case of
hat.
It would match “hat” only, not “hat.” as I want. Is there a way to make it more greedy?
UPDATE: forgot about word boundaries, sorry for that.
Put
hat\.beforehatin the regular expression. The first matching expression in an alternation wins.hatmatcheshat.sohat\.is never checked.A better way would to just write that part as
hat\.?rather thanhat\.|hat. That makes the period optional so you don’t need two terms in the alternation.After your edit:
There is no word boundary between
.and, say, a space (both are non-word characters). So\bhat\.\bis only going to match in things likehat.xwhere another letter immediately follows the period. This means that in e.g. a sentence,hatwill be the one that gets matched. I see you found a solution, however.