I am using this simple algorithm for searching some text in document and taging

Question

0

Asked: May 29, 20262026-05-29T21:40:34+00:00 2026-05-29T21:40:34+00:00

I am using this simple algorithm for searching some text in document and taging

0

I am using this simple algorithm for searching some text in document and taging on which page I found it

for (int i = 1; i <= a.PageCount; i++)
{
    Buf.Append(a.Pages[i].Text);
    String contain = Buf.ToString();
    if (contain != "")
    {
        // Inside is dictionary of keys and value contain page where I found it
        foreach (KeyValuePair<string, List<string>> pair in inside)
        {
              if (contain.Contains(pair.Key))
                  inside[pair.Key].Add((i).ToString());
        }
    }

    Buf.Clear();
 }

I have no problem with it, but when I search in 700 pages document and I am looking for over 500 keys, its very slow, took about 1-2 minutes to pass, is there any way how to speed it up? I am using c#

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T21:40:38+00:00

A few points:

Get rid of Buf; just assign a.Pages[i].Text directly to contain:
inside[pair.Key] wastes time looking up the value associated with that key; the time is wasted because you have a much cheaper reference to that object in pair.Value.
if you have a list of integer values, why are you storing them as strings?

Sample code:

for (int i = 1; i <= a.PageCount; i++)
{
    String contain = a.Pages[i].Text
    if (contain != "")
    {
        // Inside is dictionary of keys and value contain page where I found it
        foreach (KeyValuePair<string, List<int>> pair in inside)
        {
            if (contain.Contains(pair.Key))
                pair.Value.Add(i);
        }
    }
}

Finally, make sure Pages does in fact use a one-based index. Collections are more commonly zero-indexed.

EDIT since Pages is a dictionary:

foreach (KeyValuePair<int, Page> kvp in a.Pages)
{
    string contain = kvp.Value.Text;
    if (contain == "")
        continue;
    foreach (KeyValuePair<string, List<int>> pair in inside)
        if (contain.Contains(pair.Key))
            pair.Value.Add(kvp.Key);
}

How many times did you time the first code sample? The time could vary depending on many external factors; the fact that a single run of one approach is faster or slower than a single run of another doesn’t really tell you much, especially since the suggestions I made probably don’t address the bulk of the problem.

As someone else pointed out, the main problem is that you’re calling contain.Contains(pair.Key) 350,000 times; that’s probably your bottleneck. You can profile the method to find out if that is true. If it is true, then something like the Rabin Karp algorithm as suggested by Miserable Variable is probably your best bet.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using this simple algorithm for searching some text in document and taging

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply