Say I have a data string that can have formatting that varies. Nominally the data in the string would be seperated by spaces but that’s not always the case, so a simple .split(' ') won’t work in this instance.
An example string is:
string = '2012 05 06 04:20:00.0500 FOOBAR 4.7E+10 -55 33.0 555~2767 B 12 \r\n'
To get all the numbers I need, which can contain exponents, start with -, +, or ~, or not have a space separating them I can use:
re.findall(r'[~+-]?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?', string)
# giving the result;
['2012', '05', '06', '04', '20', '00.0500', '4.7E+10', '-55', '33.0', '555', '~2767', '12']
I also need just the single character (in this case B) from the string. This single character can be either B,F, or O and I can get this and avoid the FOOBAR in my string by using:
re.findall(r'((?:(?:\b))[FBO]\b)', string)
# giving the result:
['B']
But what I need is to get a result that combines the two results above. I could always append the list with the second result, but I would really like the position of the results in the corresponding list to appear in the order in which they exist in the original string string. That is, I want a list that looks like:
['2012', '05', '06', '04', '20', '00.0500', '4.7E+10', '-55', '33.0', '555', '~2767', 'B', '12']
Any ideas? Or is there a better way?
How about:
This returns:
Also, not to nag, but overwriting the Python type
strwith a variable name made be shudder for a second there.