I have a logfile, where we parse each line using regex in Python.
Part of each line contains a phrase, which is one or more words.
For example, in the below, the phrase is “SOME PHRASE”.
12-09-95 10:37:46,082 [3] INFO Foobar <> - 1995-Dec-09 10:37:47.189025 --- [5571467078570868736::TYPE ::SOME PHRASE ::1995-Dec-09 10:37:47.165672::1995-Dec-09 10:37:47.188790::00:00:00.023117]
In other lines, it may only be a single word, for example “PHRASE”.
12-09-95 10:37:46,082 [3] INFO Foobar <> - 1995-Dec-09 10:37:47.189025 --- [5571467078570868736::TYPE ::SOME PHRASE ::1995-Dec-09 10:37:47.165672::1995-Dec-09 10:37:47.188790::00:00:00.023117]
We need to extract all the words of the phrase, including any spaces in between words, but minus any whitespace either to the left or right of it.
The phrase itself is easy – the relevant part of our regex:
::(?P<phrase>[\w\s]+)::
However, I’m not sure how to discard the whitespace on the right using regex – the logfile usually have a bunch of extraneous spaces after the phrase we want.
I know I could just use str.rstrip() to remove it afterwards, but I’d rather use the regex expression itself to simply not pick it up – is there a way of doing this?
Cheers,
Victor
You could not match trailing whitespace like so..