I am looking to match the following pattern
(1)
10digits sometext (e.g. 1235873490 ABCD EFGK)
In a text that might have the pattern above, as well as very similar pattern like this one
(2)
10digits sometext decimal_number (e.g. 9835873490 VBGF XMF 23.233)
How I can write the regular expression to match only pattern (1) and ignore pattern (2)?
I have looked at negative lookaheads using something like this:
(\d{10})\s*([A-Za-z0-9]+(?:\s+[A-Za-z0-9]+)(?:\s+[A-Za-z0-9]+))\s*(?!(\d+.\d+))
but cannot get it to work. Any ideas? By the way, I am using c++ boost::regex.
First, start with the straightforward version:
I changed your
[A-Za-z0-9]to\wfor simplicity, and allowed it to occur as many times as it wants.However, this will also match the second string – it will gobble up the
23at the end, then see that this doesn’t have a decimal number following (it’s followed by “.23”), so it will match.To prevent this, we can say that it must be followed by a space or the end of the text:
However, this still has a problem. Now, it will match up to “…XMF”, but then see it is followed by a decimal number, and backtrack. It will go back to “…VBGF” and then match, since “VBGF” isn’t followed by a decimal.
To prevent this, we can tell the regex that it can’t backtrack once it has matched our main section:
Alternately, if you know that there will always be 2 parts in sometext, this will also solve the backtracking: