Can anyone explain the difference between \b and \w regular expression metacharacters? It is my understanding that both these metacharacters are used for word boundaries. Apart from this, which meta character is efficient for multilingual content?
Can anyone explain the difference between \b and \w regular expression metacharacters? It is
Share
The metacharacter
\bis an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.There are three different positions that qualify as word boundaries:
a word character.
last character is a word character.
string, where one is a word character and the other is not a word character.
Simply put:
\ballows you to perform a “whole words only” search using a regular expression in the form of\bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.In all flavors, the characters
[a-zA-Z0-9_]are word characters. These are also matched by the short-hand character class\w. Flavors showing “ascii” for word boundaries in the flavor comparison recognize only these as word characters.\wstands for “word character”, usually[A-Za-z0-9_]. Notice the inclusion of the underscore and digits.\Bis the negated version of\b.\Bmatches at every position where\bdoes not. Effectively,\Bmatches at any position between two word characters as well as at any position between two non-word characters.\Wis short for[^\w], the negated version of\w.