Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6767475
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T14:57:19+00:00 2026-05-26T14:57:19+00:00

Remark : I know there are many similar questions on SO, but none specific

  • 0

Remark: I know there are many similar questions on SO, but none specific to the C language, hence why I am asking this.

Here’s the problem I am facing: I will be provided a large text (e.g., 150,000 words) and after that a series of phrases (each phrase has from 1 up to 10 words). For each of those phrases I need to find the word that immediately follows the phrase in the text and return it.

My only idea to solve it so far: create a struct that holds:

  • the current word
  • the 3 words that preceded that word
  • the word that follows

Then I would parse the text creating one struct for each word, and store all those structs on a hash table. As each phrase comes along I would search on the hash table for the last word of that phrase, check if the previous 3 words match, and then return the next word. I believe going to back to 3 words would be enough to uniquely identify phrases, but I could increase that number.

Do you think this would work? Do you know a better way?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T14:57:20+00:00Added an answer on May 26, 2026 at 2:57 pm

    Much easier approach: run through the text, storing all n-grams (subsequences of n words) for 1 <= n <= 10 in a hash table or trie. Retrieval is then trivial, just look up the n-gram in the hash table or trie.

    In the hash table version, you’d just store the n-grams as concatenations of word strings with normalized space in between.

    The problem with this approach is that with a hash table, you’ll need up to 45 * N entries, where N is the number of words in the text. Lookup should be very fast, though, and 150.000 words is a small enough dataset to make this work.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I know very similar questions were asked here in the past - but neither
I've found some questions that seem related to this one, but none describes exaclty
I asked various questions about my problem ( here and here ) and I
I know that there are already several questions on StackOverflow about std::string versus std::wstring
The first question your probably asking is how many Group by / Order by
I got a code review remark today to extract this anonymous class into a
I know I can do most of this by hacking Trac and using Git
First of all: I really tried to find a matching answer for this, but
Disclaimer: I tried to search for similar question, however this returned about every C++
Here is a function I would like to write but am unable to do

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.