I wonder, how to read character string like fscanf. I need to read for word, in the all .txt . I need a count for each words.
collectwords = collections.defaultdict(int)
with open('DatoSO.txt', 'r') as filetxt:
for line in filetxt:
v=""
for char in line:
if str(char) != " ":
v=v+str(char)
elif str(char) == " ":
collectwords[v] += 1
v=""
this way, I cant to read the last word.
You might also consider using
collections.counterif you are using Python >=2.7http://docs.python.org/library/collections.html#collections.Counter
It adds a number of methods like ‘most_common’, which might be useful in this type of application.
From Doug Hellmann’s PyMOTW:
http://www.doughellmann.com/PyMOTW/collections/counter.html — although this does letter counts instead of word counts. In the
c.updateline, you would want to replaceline.rstrip().lowerwithline.split()and perhaps some code to get rid of punctuation.Edit: To remove punctuation here is probably the fastest solution:
(borrowed from the following question Best way to strip punctuation from a string in Python)