I have a function which works a very slow for my task (it must

Question

0

Editorial Team

Asked: June 3, 20262026-06-03T09:23:26+00:00 2026-06-03T09:23:26+00:00

I have a function which works a very slow for my task (it must

0

I have a function which works a very slow for my task (it must be 10-100 times faster)

Here is code

public long Support(List<string[]> sequences, string[] words)
{

            var count = 0;
            foreach (var sequence in sequences)
            {
                for (int i = 0; i < sequence.Length - words.Length + 1; i++)
                {
                    bool foundSeq = true;
                    for (int j = 0; j < words.Length; j++)
                    {
                        foundSeq = foundSeq && sequence[i + j] == words[j];
                    }
                    if (foundSeq)
                    {
                        count++;
                        break;
                    }
                }
            }

            return count;
}

public void Support(List<string[]> sequences, List<SequenceInfo> sequenceInfoCollection)
{
    System.Threading.Tasks.Parallel.ForEach(sequenceInfoCollection.Where(x => x.Support==null),sequenceInfo =>
    {
        sequenceInfo.Support = Support(sequences, sequenceInfo.Sequence);
    });

}

Where List<string[]> sequences is a array of array of words. This array usually contains 250k+ rows. Each row is about 4-7 words. string[] words is a array of words(all words contains in sequences at least one time) which we trying to count.

The problem is foundSeq = foundSeq && sequence[i + j] == words[j];. This code take most of all execution time(Enumerable.MoveNext at second place). I want to hash all words in my array. Numbers compares faster then strings, right? I think it can help me to get 30%-80% of perfomance. But i need 10x! What can i to do? If you want to know it’s a part of apriory algorithm.

Support function check if the words sequence is a part any sequence in the sequences list and count how much times.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T09:23:27+00:00

Knuth–Morris–Pratt algorithm

In computer science, the Knuth–Morris–Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a “word” W within a main “text string” S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.

The algorithm was conceived in 1974 by Donald Knuth and Vaughan Pratt, and independently by James H. Morris. The three published it jointly in 1977.

From Wikipedia: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

This is one of the improvements that you should make. With a small difference: a “word” in your code is a “characters” in the terminology of the algorithm; your “words” array is what is a word in KMP.

The idea is that when you search for “abc def ghi jkl”, and have matched “abc def ghi” already, but the next word does not match, you can jump three positions.

Search:   abc def ghi jkl
Text:     abc def ghi klm abc def ghi jkl
i=0:      abc def ghi jkl?
skip 2:       XXX XXX  <--- you save two iterations here, i += 2
i=2:                  abc?
i=3:                      abc? ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a function which works a very slow for my task (it must

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply