From regular-expressions.info:
\b\w+(?<!s)\b. This is definitely not the same as\b\w+[^s]\b. When applied toJon's, the former will matchJonand the latterJon'(including the apostrophe). I will leave it up to you to figure out why. (Hint: \b matches between the apostrophe and the s). The latter will also not match single-letter words like “a” or “I”.
Can you explain why ?
Also, can you make clear what exacly \b does, and why it matches between the apostrophe and the s ?
\bis a zero-width assertion that means word boundary. These character positions (taken from that link) are considered word boundaries:Word characters are of course any
\w.sis a word character, but'is not. In the above example, the area between the'and thesis a word boundary.The string
"Jon's"looks like this if I highlight the anchors and boundaries (the first and last\bs occur in the same positions as^and$):^Jon\b'\bs$The negative lookbehind assertion
(?<!s)\bmeans it will only match a word boundary if it’s not preceded by the letters(i.e. the last word character is not ans). So it looks for a word boundary under a certain condition.Therefore the first regex works like this:
\b\w+matches the first three lettersJon.There’s actually another word boundary between
nand'as shown above, so(?<!s)\bmatches this word boundary because it’s preceded by ann, not ans.Since the end of the pattern has been reached, the resultant match is
Jon.The complementary character class
[^s]\bmeans it will match any character that is not the letters, followed by a word boundary. Unlike the above, this looks for one character followed by a word boundary.Therefore the second regex works like this:
\b\w+matches the first three lettersJon.Since the
'is not the letters(it fulfills the character class[^s]), and it’s followed by a word boundary (between'ands), it’s matched.Since the end of the pattern has been reached, the resultant match is
Jon'. The lettersis not matched because the word boundary before it has already been matched.