I need to find prices in text document. My code looks like this:
sentence = "This is test text $25,000 $25,000$20,000 $30"
pattern = re.compile(ur'[$€£]?\d+([.,]\d+)?', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall(sentence)
Desired result is:
['$25,000', '$30']
I don’t need to include $25,000$20,000 in the result becouse this is not valid result for my task. I need only full word matches.
But i get this result:
['$25,000', '$25,000', '$20,000', '$30']
How to rewrite my regex to include only prices separated by whitespace or punctuation ?
Try the following:
I added the negative assertions
(?<!\S)and(?!\S)which mean “fail to match if preceded by a non-space” and “fail to match if followed by a non-space” respectively.Tested:
If you want to allow certain non-space characters before or after the match, replace
\Sby[^\s<chars>]where<chars>are the characters you want to allow. Example:allows the pattern to be preceded by a
:and followed by,or.: