I was wondering how to select words near each other using regular expressions.
For example, I would like to select the digits and the word miles from the following phrases:
"140,000 mostly freeway miles"
"173k commuter miles. "
"154K(all highway) miles
I don’t know how to fill in for the optional words in the middle:
[0-9]+ ???? miles
*near could be defined as 1-3 words apart. Thanks for pointing that out.
Here is an answer in
R. The other answers could work with some modification. Mostly, they need to have “double escapes” and you will have to use the paired functionsregexprandregmatches.This says group numbers punctuation or a k in group 1. Follow this by anything. Then this is followed by group 2 which is the word miles, followed by anything else.
You could also use the “normal” regex syntax:
However, I would clean up the data first then do some simpler matching! (e.g.
tolowerand remove punctuation).