Creating a basic ngram implementation in Python as a personal challenge. Started with unigrams

Question

0

Asked: June 18, 20262026-06-18T03:12:06+00:00 2026-06-18T03:12:06+00:00

Creating a basic ngram implementation in Python as a personal challenge. Started with unigrams

0

Creating a basic ngram implementation in Python as a personal challenge. Started with unigrams and worked up to trigrams:

def unigrams(text):
    uni = []
    for token in text:
        uni.append([token])
    return uni

def bigrams(text):
    bi = []
    token_address = 0
    for token in text[:len(text) - 1]:
        bi.append([token, text[token_address + 1]])
        token_address += 1
    return bi

def trigrams(text):
    tri = []
    token_address = 0
    for token in text[:len(text) - 2]:
        tri.append([token, text[token_address + 1], text[token_address + 2]])
        token_address += 1
    return tri

Now the fun part, generalize to n-grams. The main problem with generalizing the approach I have here is creating the list of length n that goes into the append method. I thought initially that lambdas might be a way to do it, but I can’t figure out how.

Also, other implementations I’m looking at are taking an entirely different tack (no surprise), e.g. here and here, so I’m starting to wonder if I’m at a dead end.

Before I give up on this approach, I’m curious: 1) is there a one line or pythonic method of creating an arbitrary list size in this manner? 2) what are the downsides of approaching the problem this way?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T03:12:08+00:00

Editorial Team

2026-06-18T03:12:08+00:00Added an answer on June 18, 2026 at 3:12 am

The following function should work for a general n-gram model.

def ngram(text,grams):  
    model=[]
    # model will contain n-gram strings
    count=0
    for token in text[:len(text)-grams+1]:  
       model.append(text[count:count+grams])  
       count=count+1  
    return model

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Creating a basic ngram implementation in Python as a personal challenge. Started with unigrams

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply