I have a string like this:
Please refer to document ABC.123.1234.1234 and document CBA.321.4321
I’m running two different regex searches to separately identify two different document identifiers. The expression for the first identifer works great:
ABC.123.1234.1234 = \b[A-Z]{3}\.\d{1,4}\.\d{1,4}\.\d{1,4}\b
Now, the problem I’m having is with trying to extract the smaller identifier using the following expression:
\b[A-Z]{3}\.\d{1,3}\.\d{1,4}\b
Unfortunately this returns both results, ABC.123.1234 & CBA.321.4321. The only result I require the second expression to return is CBA.321.4321.
Not sure which regex system you are using, since they all have slightly different syntax.
What you want is a negative zero-width lookahead assertion, to make sure you get your match, and matches only count if they aren’t followed by
\.[A-Za-z]{4}.Also, are the numbers in your data actually variable-width? If not, it would be easier to get matches if you match
{4}instead of{1,4}. The look-ahead assertion wouldn’t be as easy to implement otherwise.You could still implement them, though. Simply make your negative look-ahead match
\d*\.\d{1,4}(the\d*being the important part to avoid partial matches).Edit:
Since you’re using VB.Net, here’s the syntax for negative lookahead assertions in that Regex implementation:
So your regex might become something like:
The important part to remove the longer matches, and deal with the variable width numbers is: