I have been trying to teach myself Regexes in python and I decided to print out all the sentences of a text. I have been tinkering with the regular expressions for the past 3 hours to no avail.
I just tried the following but couldn’t do anything.
p = open('anan.txt')
process = p.read()
regexMatch = re.findall('^[A-Z].+\s+[.!?]$',process,re.I)
print regexMatch
p.close()
My input file is like this:
OMG is this a question ! Is this a sentence ? My.
name is.
This prints no outputs. But when I remove “My. name is.”, it prints OMG is this a question and Is this a sentence together as if it only reads the first line.
What is the best solution of regex that can find all sentences in a text file – regardless if the sentence carries to new line or so – and also reads the entire text? Thanks.
Something like this works:
Notice how
name is.is not in the result because it does not start with a uppercase letter.Your problem comes from the use of the
^$anchors, they work on the whole text.