i’m trying to generate unigram from a text file. But only the bigram for the first line of the given file is shown. I want to show unigram for all the sentences in the file.
import string;
import sys;
import tokenize;
f = open("data.txt", 'r');
line=f.readline();
while line:
line = line.rstrip();
list = line.split();
for word in list:
print word
line = f.readline();
Why it is not showing unigram for the sentences and also how can i turn this into a bigram?
Thanks in advance.
data.txt is the text file which contains the sentences.
It has two sentences –
Hello world this is a test code
today is 29th november 2011
im getting the output:
Hello
world
this
is
a
test
code
There are some obvious problems with that code snippet.
;are not requiredtokenize) are used. This is valid, but pointless.You do not show the structure of the text file, but I’m assuming each sentence is on a separate line (i.e. a text file with two sentences will contain two lines).
I’m unsure exactly what a bigram is in this case, so you may need to replace the
bigramfunction.