Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8752125
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T13:10:32+00:00 2026-06-13T13:10:32+00:00

quick question that is confusing me. I have NLTK installed and it has been

  • 0

quick question that is confusing me. I have NLTK installed and it has been working fine. However I am trying to get bigrams of a corpus and want to use bigrams(corpus) basically.. but it says that bigrams is not defined when i “from nltk import bigrams”

Same with trigrams. Am I missing something? Also, how could I get bigrams from the corpus manually.

I am also looking to calculate the frequencies of bigrams trigrams and quads, but am unsure exactly how to go about this.

I have the corpus tokenized with "<s>" and "</s>" at the beginning and end appropriately. Program so far here:

 #!/usr/bin/env python
import re
import nltk
import nltk.corpus as corpus
import tokenize
from nltk.corpus import brown

def alter_list(row):
    if row[-1] == '.':
        row[-1] = '</s>'
    else:
        row.append('</s>')
    return ['<s>'] + row

news = corpus.brown.sents(categories = 'editorial')
print len(news),'\n'

x = len(news)
for row in news[:x]:
    print(alter_list(row))
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T13:10:33+00:00Added an answer on June 13, 2026 at 1:10 pm

    I tested this in a virtualenv and it works:

    In [20]: from nltk import bigrams
    
    In [21]: bigrams('This is a test')
    Out[21]: 
    [('T', 'h'),
     ('h', 'i'),
     ('i', 's'),
     ('s', ' '),
     (' ', 'i'),
     ('i', 's'),
     ('s', ' '),
     (' ', 'a'),
     ('a', ' '),
     (' ', 't'),
     ('t', 'e'),
     ('e', 's'),
     ('s', 't')]
    

    Is that the only error you’re getting?

    By the way, as for your second question:

    from collections import Counter
    In [44]: b = bigrams('This is a test')
    
    In [45]: Counter(b)
    Out[45]: Counter({('i', 's'): 2, ('s', ' '): 2, ('a', ' '): 1, (' ', 't'): 1, ('e', 's'): 1, ('h', 'i'): 1, ('t', 'e'): 1, ('T', 'h'): 1, (' ', 'i'): 1, (' ', 'a'): 1, ('s', 't'): 1})
    

    For words:

    In [49]: b = bigrams("This is a test".split(' '))
    
    In [50]: b
    Out[50]: [('This', 'is'), ('is', 'a'), ('a', 'test')]
    
    In [51]: Counter(b)
    Out[51]: Counter({('is', 'a'): 1, ('a', 'test'): 1, ('This', 'is'): 1})
    

    This split by words obviously is very superficial but depending on your application it may suffice. Obviously you could use nltk’s tokenize which is far more sophisticated.

    In order to accomplish your final goal, you can do something like that:

    In [56]: d = "Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."
    
    In [56]: from nltk import trigrams
    In [57]: tri = trigrams(d.split(' '))
    
    In [60]: counter = Counter(tri)
    
    In [61]: import random
    
    In [62]: random.sample(counter, 5)
    Out[62]: 
    [('Ipsum', 'has', 'been'),
     ('industry.', 'Lorem', 'Ipsum'),
     ('Ipsum', 'passages,', 'and'),
     ('was', 'popularised', 'in'),
     ('galley', 'of', 'type')]
    

    I trimmed the output because it was unnecessarily large, but you get the idea.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a quick question that hopefully someone has worked through before. In the
Quick question that I've been struggling with. I have 2 arrays of different lengths
I have a quick question that I hope has a quick and clear answer.
I have a quick question that's been bugging me for a while. Is it
Just a quick question that has been bothering me today. I own five servers,
I have a quick question that I can't seem to find online. I am
Quick question... I have a query that checks for duplicates that looks like this:
Quick question. I have an app that use a native DLL through PInvoke, this
just a question that needs a quick answer, I have a Action, lets say,
Quick question about include/requre_once . I have some code that is common to a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.