I’m writing a small script that will match a shoe size from a shoe identifier ( SKU ).
There’s a few cases that i want to be able to handle. Given the following list:
sizes = ['315122-603 10 A', '315122-608_12.0', '317982-019', '364781-019_5.5Y', 'V24088-001_10', '609048-035 8.5', '7-20Spm8231B5 10', 'G17295-001_9.5']
i want to be able to get the size for each like (10,12,5.5,etc..).
My knowledge of regular expressions is very limited, i have been looking for some snippets here and there and came up with the following
r = '\d{1,2}.\d+'
for size in sizes:
re.findall(r, size)
['315122', '603']
['315122', '608', '12.0']
['317982', '019']
['364781', '019', '5.5']
['24088', '001']
['609048', '035', '8.5']
['7-20', '8231', '5 10']
['17295', '001', '9.5']
but as you can see it doesn’t work. I want to be able to match only the number before the decimal and after the decimal but only the numbers.
A few problems:
.has a special meaning in a regular expression. If you literally want to match a dot you need to escape it.\D,\bor(?!\d).re.findallfinds multiple matches. If you know there’s only going to be one match, usere.search.Try this:
Note that some of your strings contain underscores or no decimal separator. You haven’t really described what should happen in these cases, and this pattern won’t handle all the cases in your example, but it will hopefully give you a good start.
You may also want to consider writing a different regular expression for each input type rather than trying to write a single regular expression to handle all possible inputs.