I’m using the RE expression in python and trying to split a chunk of text by period and by exclamation mark. However when I split it, I get a “None” in the result
a = "This is my text...I want it to split by periods. I also want it to split \
by exclamation marks! Is that so much to ask?"
This is my code:
re.split('((?<=\w)\.(?!\..))|(!)',a)
Note that I have this (?<=\w).(?!..) because I want it to avoid ellipses. Nevertheless, the above code spits out:
['This is my text...I want it to split by periods', '.', None, ' \
I also want it to split by exclamation marks', None, '!', \
' Is that so much to ask?']
As you can see, where a period or exclamation mark is, it has added a special “None” into my list. Why is this and how do I get rid of it?
Try the following:
You get the
Nonebecause you have two capturing groups, and all groups are included as a part of there.split()result.So any time you match a
.the second capture group isNone, and any time you match a!the first capture group isNone.Here is the result:
If you don’t want to include
'.'and'!'in your result, just remove the parentheses that surround the entire expression:r'(?<=\w)\.(?!\..)|!'