"^(?:(2\d\d\d)\s+)?(?:Comm\. Rep\.\s+)?(?:CONG\s+)?(\S+)\s+(\S+)\s+(?:No\.\s+)?(\S+)(?:\s+\(.*?\))?$"
Currently this is able to parse a string like
2009 IA H.B. 184 (NS)
How can I make it parse a text like
2009 IA HEART RATE 184 (NS)
I’m looking for a tweak that’ll make it parse the spaced word HEART RATE.
EDIT:
It seems to work as long as the third word is not spaced out. Like for e.g.
It works for 2009 IA REG 184 (NS) … But as soon as the third word is actually made of spaces it goes out of whack like HEART RATE for example.
I’m going to make the assumption that you want all those space separated words, which doesn’t work right now because you are trying to read the “third word” (the second regex capture group)
HEART RATEby just reading until you find a space (the second(\S+)).To fix this, I’ll assume you the “third word” is all the space separated words until you hit a number or a word starting with
No.(tell me if this assumption is wrong!). This is the((?:\S|\s(?!\d|No\.))+)in the solution.Here is my solution:
When I test it on
it (still) finds the third word to be
H.B.When I test it on
it finds the third word to be
HEART RATEWhen I test it on
it finds the third word to be
HEART RATE NoneWhen I test it on
it finds the third word to be
HEART RATELooks good?
PS gskinner is awesome.