I have a text with about 300 – 500 words. Also i got about

Question

0

Asked: May 22, 20262026-05-22T12:12:05+00:00 2026-05-22T12:12:05+00:00

I have a text with about 300 – 500 words. Also i got about

0

I have a text with about 300 – 500 words. Also i got about 200k keywords and i want to know if each of the keywords is contained in the text. A String contains ist quite slow, is there some way to preprocess the String?

I thought about using a SuffixTree but im not sure this is the best choice.

Also, are there any good librarys for this task? semanticdiscoverytoolkit for example has a suffixtree implementation but after adding the string i cant figure out how to look up if a string is contained in the tree.

Greetings,

Nico

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T12:12:06+00:00

you can try the rabin-karp string search algorithm. since you are doing mostly hash (integer) comparisons, the performance is much better than string comparisons.

compute the hash of the keyword
compute the rolling hash of the text
compare these 2 hashes. if they match, perform the actual string comparison.
advance the position by 1 character and repeat from step 2 until you reach the end of the text.

as a analogy, the rolling hash is like a “sliding window” that scrolls along the text. the hash comparison is done using the hash of the substring in the “sliding window” against the hash of the keyword.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a text with about 300 – 500 words. Also i got about

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply