I’ve got a problem with the following python script which extracts some options from text in an internal company web app text area.
import re
text = 'option one\noption two, option three, option four'
correct = 'option one, option two, option three, option four'
pattern = re.compile('(\s*[,]\s*)')
fixed = pattern.sub(', ', text)
print fixed
option one
option two, option three, option four
print fixed.split(', ')
['option one\noption two', 'option three', 'option four']
This obviously fails to split up ‘option one\noption two’ into ‘option one’, ‘option two’
So the input could end up as
option one
option two, option three, option four
which would need to be converted to
option one, option two, option three, option four
it works fine if its a comma
or
a comma followed by a newline
but not if its just a newline by itself.
Extend your character class from
[,]to[,\n], maybe? Also, why don’t you split on the regex directly, rather than search-and-replacing first and then splitting? This function: http://docs.python.org/library/re.html?highlight=re.split#re.split could come handy for this.