This question ( Best way to strip punctuation from a string in Python ) deals with stripping punctuation from an individual string. However, I’m hoping to read text from an input file, but only print out ONE COPY of all strings without ending punctuation. I have started something like this:
f = open('#file name ...', 'a+')
for x in set(f.read().split()):
print x
But the problem is that if the input file has, for instance, this line:
This is not is, clearly is: weird
It treats the three different cases of “is” differently, but I want to ignore any punctuation and have it print “is” only once, rather than three times. How do I remove any kind of ending punctuation and then put the resulting string in the set?
Thanks for any help. (I am really new to Python.)
should be more able to distinguish words correctly.
This regular expression finds compact groups of alphanumerical characters (a-z, A-Z, 0-9, _).
If you want to find letters only (no digits and no underscore), then replace the
\wwith[a-zA-Z].