I need to process the text to create a dictionary {name: quantity}
Variants of text:
2 Cardname
3 Cardname Two
1 Cardname Three
Cardname
Cardname Two
Cardname Three
So i wrote a basic code:
card_list = card_area.splitlines()
card_dict = {}
for card in card_list:
qty_re = re.search('^\d{1,6}', card)
if qty_re:
qty = qty_re.group()
else:
qty = 1
name_re = re.search('[A-Za-z ]+$', card)
if name_re:
name = name_re.group()
else:
name = None
if name:
card_dict[name] = qty
The first question: Can I use the groupdict method if some elements of strings isn’t exists (no qty or empty string).
Second: I also want to consider such formats:
2 x Cardname
3x Cardname Two
1 xCardname Three
1xCardname Four
What is the best way ?
A solution. Notes to follow.
Notes:
I recommend pre-compiling the regular expression pattern, for speed.
The best way to handle this is a single regular expression pattern that grabs both the count and the card. I have added an optional pattern that recognizes card formats with the optional ‘x’; using a character class I made it match either upper- or lower-case ‘x’. The white space between the number and the ‘x’ is optional but there must be white space between the ‘x’ and the card name, or else the ‘x’ will be treated as part of the card name.
If you are not familiar with regular expressions, here is how to read this one: form a match group that matches zero or more digits. This is followed by zero or more white space characters. This is followed by another group, but this following group is flagged with
(?:rather than just(so it is a group but will not make a match group in the output; this group is a character class matching ‘x’ or ‘X’ followed by one or more white space characters. Form another match group, which starts with one non-whitespace character and is followed by zero or more of any character.I believe you want to sum all the cards of the same name? The best for that is to use
defaultdict()as I showed here.If no legal card name ever starts with ‘x’ or ‘X’, you could change the pattern to not keep the ‘x’ even when there is no space between it and the card name. To do that, change the pattern to match the ‘x’ from this:
(?:[xX]\s+)?to this:(?:[xX]\s*)?(Note that a single+changed to a single*after the\s, so zero whitespace characters will now be accepted.)