This is my first experience with pattern matching using regular expressions so
any help is appreciated.
I am trying to search a string for the following substrings:
"(TPU 1-999)
http://somewebaddress.com"
I want to keep TPU, 1-999 and the link as separate substrings.
This is the pattern I am using:
^\s{3}\(([AEINPRSTUW]{3})\s(\d{1,3}.\d{2,5})\)$^\s{3}(http+\s{1,100})$
I’ll break it down to explain my reasoning
^\s{3} – beginning of string (or line in this case), followed by 3 spaces
\( – left parentheses
([AEINPRSTUW]{3}) – 3 instances of any of the letters in brackets, TPU being one example
\s(\d{1,3}.\d{2,5}) – a space and then 1-3 numeric digits, separated by any char from 2-5 more numeric digits
\)$ – right parentheses, end of line
^\s{3} – beginning of next line followed by three spaces
(http+\s{1,100})$ – the characters “http” followed by anywhere between 1 and 100 non whitespace characters, and the end of the line.
This pattern doesn’t work right now but am I headed in the right direction?
$^this cannot work.$is the end of line (before the line break),^is the beginning of a line (after the line break). But the line break is a character (or two), while do not advance the position of the regex engine. So$and^try to match the same position, which can only ever happen if they are the ending and beginning of an empty line – and even then putting them in this order would be greatly misleading. If you want to make sure that there is exactly one line break between them try this:However, as ridgerunner pointed out the comment, the following
\s{3}could match (up to 3) more linebreaks, since they are whitespace as well.Also note that
.as a separator of your numbers might not be the best idea. At least, use a non-digit character:Note also that I have changed your last
\sto\S(because\sis whitespace,\Sis non-whitespace).Also note, that the string you have shown us does not contain those three whitespaces you are trying to match. So making them optional (as CaptainMurphy suggested) might be helpful, too:
And since we are already matching that line break, we could also remove those anchors there completely, they do not really help any more: