Currently I’m trying to enhance my search algorithm. For better understanding, here’s the current

Question

0

Asked: May 18, 20262026-05-18T07:33:09+00:00 2026-05-18T07:33:09+00:00

Currently I’m trying to enhance my search algorithm. For better understanding, here’s the current

0

Currently I’m trying to enhance my search algorithm.

For better understanding, here’s the current logic behind it:
we have objects with attached n keywords in db. in database this is solved via 2 tables (Object, Keyword) where the Keyword-table has a FK to Object. When i’m building my searchtrees I create a line-value (ad: remove umlauts, convert to lower-case, …) of all keywords of an object. the same convertion-routine (NormalizeSearchPattern()) is done with the search-patterns. I’m supporting AND-search and keywords with minimum length of 2 characters only!

The search-algorithm is currently a variant of fast-reverse-search (this example is not optimized):

bool IsMatch(string source, string searchPattern)
{
    // example:
    // source: "hello world"
    // searchPattern: "hello you freaky funky world"
    // patterns[]: { "hello", "you", "freaky", "funky", "world" }

    searchPattern = NormalizeSearchPattern(searchPattern);
    var patterns = MagicMethodToSplitPatternIntoPatterns(searchPattern);
    foreach (var pattern in patterns)
    {
        var success = false;
        var patternLength = pattern.Length;
        var firstChar = pattern[0];
        var secondChar = pattern[1];

        var lengthDifference = input.Length - patternLength;
        while (lengthDifference >= 0)
        {
            if (source[lengthDifference--] != firstChar)
            {
                continue;
            }
            if (source[lengthDifference + 2] != secondChar)
            {
                continue;
            }

            var l = lengthDifference + 3;
            var m = 2;
            while (m < patternLength)
            {
                if (input[l] != pattern[m])
                {
                    break;
                }
                l++;
                m++;
            }

            if (m == patternLength)
            {
                success = true;
            }
        }
        if (!success)
        {
            return false;
        }
    }
    return true;
}

Normalization is done with (this example is not optimized)

    string RemoveTooShortKeywords(string keywords)
    {
        while (Regex.IsMatch(keywords, TooShortKeywordPattern, RegexOptions.Compiled | RegexOptions.Singleline))
        {
            keywords = Regex.Replace(keywords, TooShortKeywordPattern, " ", RegexOptions.Compiled | RegexOptions.Singleline);
        }

        return keywords;
    }

    string RemoveNonAlphaDigits(string value)
    {
        value = value.ToLower();
        value = value.Replace("ä", "ae");
        value = value.Replace("ö", "oe");
        value = value.Replace("ü", "ue");
        value = value.Replace("ß", "ss");

        return Regex.Replace(value, "[^a-z 0-9]", " ", RegexOptions.Compiled | RegexOptions.Singleline);
    }

    string NormalizeSearchPattern(string searchPattern)
    {
        var resultNonAlphaDigits = RemoveNonAlphaDigits(searchPattern);
        var resultTrimmed = RemoveTooShortKeywords(resultNonAlphaDigits);
        return resultTrimmed;
    }

So this is pretty straight forward, thus it’s obvious, that I can only cope with variants of source and searchPattern which I’ve implemented in NormalizeSearchPattern() (as mentioned above: umlauts, case-differences, …).

But how should I enhance the algorithm (or NormalizeSearchPattern()) to be non-sensitive when it comes down to:

singular/plural
misstyping (eg. “hauserr” <-> “hauser”)
…

Just to know more about the design:
This app is done in c#, it stores the searchtrees and objects in a static variable (to query the database only once at init), the performance has to be outstanding (currently 500.000 lineValues are queried within less than 300msec).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T07:33:10+00:00

You might also be interested in a Trigram and Bigram search matching algorithm:

Trigram search is a powerful method of searching for text when the exact syntax or spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e. near matches. A threshold can be specified as a cutoff point, after which a result is no longer regarded as a match.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Currently I’m trying to enhance my search algorithm. For better understanding, here’s the current

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply