i’m trying to generate unigram from a text file. But only the bigram for

Question

0

Asked: May 27, 20262026-05-27T05:35:30+00:00 2026-05-27T05:35:30+00:00

i’m trying to generate unigram from a text file. But only the bigram for

0

i’m trying to generate unigram from a text file. But only the bigram for the first line of the given file is shown. I want to show unigram for all the sentences in the file.

import string;
import sys;
import tokenize;

f = open("data.txt", 'r');
line=f.readline();
while line:
    line = line.rstrip();
    list = line.split();
    for word in list:
         print word
    line = f.readline();

Why it is not showing unigram for the sentences and also how can i turn this into a bigram?

Thanks in advance.

data.txt is the text file which contains the sentences.
It has two sentences –

        Hello world this is a test code
        today is 29th november 2011

im getting the output:

    Hello
    world
    this
    is
    a
    test

code

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T05:35:31+00:00

There are some obvious problems with that code snippet.

; are not required
None of the imported modules (i.e. tokenize) are used. This is valid, but pointless.
The loop over the file lines uses while, which works but is odd.

You do not show the structure of the text file, but I’m assuming each sentence is on a separate line (i.e. a text file with two sentences will contain two lines).

I’m unsure exactly what a bigram is in this case, so you may need to replace the bigram function.

from itertools import tee, izip

def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

with open("data.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        uni = words
        bi = bigrams(words)
        print uni
        print list(bi)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i’m trying to generate unigram from a text file. But only the bigram for

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply