In a previous question, it was suggested that, in order to divide a string and store it, I should use a list, like so:
[a for a in re.split(r'([A-Z][a-z]*)', 'MgSO4') if a]
['Mg', u'S', u'O', u'4']
What I’d like to ask this time around is how would I be able to use that to store the different strings created into variables so I can look them up in the CSV file I have, if it’s at all possible. Where it says ‘MgSO4’ would be coming from a variable called ‘formula’, which is produced from a raw_input, like so:
formula = raw_input("Enter formula: ")
Full program code can be found here, and I’ve included the more relevant part below. Thanks in advance for any help!
formula = raw_input("Enter formula: ")
[a for a in re.split(r'([A-Z][a-z]*)', 'MgSO4') if a]
weight_sum = sum(float(formul_data.get(elem.lower())) for elem in elements)
print "Total weight =", weightSum
If your goal is to be able to add up the molecular weights of the atoms comprising a molecule, I suggest doing your regular expressions a bit differently. Instead of having the numbers mixed in with the element symbols in your split list, attach them to the preceding element instead (and attach a 1 if there was no number). Here’s how I’d do that:
To make this fit with the code you’ve shown, replace
weights[symbol]with something likeformul_data.get(symbol.lower(), 0)(or whatever is necessary to get appropriate atomic weights by symbol in your code).This should handle any empiric formula, and many structural ones, as long as there are no parentheses. To solve fully parenthesized formulas you’ll need to make a better parser, as simple regular expressions won’t work.