I am trying to learn more about regular expressions I have one below that I believe finds cases where there is a missing close paren on a number up to 999 billion. The one below it I thought should do the same but I do not get similar results
missingParenReg=re.compile(r'^\([$]*[0-9]{1,3}[,]?[0-9]{0,3}[,]?[0-9]{0,3}[,]?[0-9]{0,3}[.]*[0-9]*[^)]$') missingParenReg2=re.compile(r'^\([$]?([0-9]{1,3}[,]?)+[.]*[0-9]*[^)]$')
I think the second one says: There must be an open paren to start
There may or may not be as many as one dollar sign
The next group must exist at least once but can exist an unlimited number of times
The group should have at least one digit but may have as many as three
The group may have as few as 0 and as many as 1 commas
Following this group there may or may not be a decimal point
If there is a decimal point it will be followed by as many as 0 but as many as uncounted occurences of digits
At the end there should not be a closing paren.
I am trying to understand this magic stuff so I would appreciate a correction to my regex (if it can be corrected) in addition to a more elegant solution if you have it.
The trickier part about regular expressions isn’t making them accept valid input, it’s making them reject invalid input. For example, the second expression accepts input that is clearly wrong, including:
(1,2,3,4— one digit between each comma(12,34,56— two digits between each comma(1234......5— unlimited number of decimal points(1234,.5— comma before decimal point(123,456789,012— if there are some commas, they should be between each triple(01234— leading zero is not conventional(123.4X— last char is not a closing parenHere’s an alternative regular expression that should reject the examples above:
[-+]?[$]?(0|[1-9]\d*|[1-9]\d{0,2}(,\d{3})*)(\.\d+)?|:Regarding the parens, if all you care about is whether the parens are balanced, then you can disregard parsing out the numeric format precisely; just trust that any combination of digits, decimal points, and commas between the parens are valid. Then use the
(?!...)construct that evaluates as a match if the input doesn’t match the regular expression inside.(?!\([$\d.,]+\))