Remark : I know there are many similar questions on SO, but none specific

Question

0

Asked: May 26, 20262026-05-26T14:57:19+00:00 2026-05-26T14:57:19+00:00

Remark : I know there are many similar questions on SO, but none specific

0

Remark: I know there are many similar questions on SO, but none specific to the C language, hence why I am asking this.

Here’s the problem I am facing: I will be provided a large text (e.g., 150,000 words) and after that a series of phrases (each phrase has from 1 up to 10 words). For each of those phrases I need to find the word that immediately follows the phrase in the text and return it.

My only idea to solve it so far: create a struct that holds:

the current word
the 3 words that preceded that word
the word that follows

Then I would parse the text creating one struct for each word, and store all those structs on a hash table. As each phrase comes along I would search on the hash table for the last word of that phrase, check if the previous 3 words match, and then return the next word. I believe going to back to 3 words would be enough to uniquely identify phrases, but I could increase that number.

Do you think this would work? Do you know a better way?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T14:57:20+00:00

Much easier approach: run through the text, storing all n-grams (subsequences of n words) for 1 <= n <= 10 in a hash table or trie. Retrieval is then trivial, just look up the n-gram in the hash table or trie.

In the hash table version, you’d just store the n-grams as concatenations of word strings with normalized space in between.

The problem with this approach is that with a hash table, you’ll need up to 45 * N entries, where N is the number of words in the text. Lookup should be very fast, though, and 150.000 words is a small enough dataset to make this work.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Remark : I know there are many similar questions on SO, but none specific

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply