I wanna count words from text files which contain data as follows:
ROK :
ROK/(NN)
New :
New/(SV)
releases, :
releases/(NN) + ,/(SY)
week :
week/(EP)
last :
last/(JO)
compared :
compare/(VV) + -ed/(EM)
year :
year/(DT)
releases :
releases/(NN)
The expressions like /(NN), /(SV), and /(EP) are considered category.
I wanna extract the words just before each of category and count how many words are in the whole text.
I wanna write a result in a new text file like this:
(NN)
releases 2
ROK 1
(SY)
New 1
, 1
(EP)
week 1
(JO)
last 1
......
Please help me out!
here is my garage code ;_; it doesn’t work.
import os, sys
import re
wordset = {}
for line in open('E:\\mach.txt', 'r'):
if '/(' in line:
word = re.findall(r'(\w)/\(', line)
print word
if word not in wordset: wordset[word]=1
else: wordset[word]+=1
f = open('result.txt', 'w')
for word in wordset:
print>> f, word, wordset[word]
f.close()
You’re welcome (=
If you will want also count that weird “-ed” or “,”, tune regexp to match any character except whitespace: