Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 947423
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T23:04:49+00:00 2026-05-15T23:04:49+00:00

I need three fast-on-large-strings functions: fast search, fast search and replace, and fast count

  • 0

I need three fast-on-large-strings functions: fast search, fast search and replace, and fast count of substrings in a string.

I have run into Boyer-Moore string searches in C++ and Python, but the only Delphi Boyer-Moore algorithm used to implement fast search and replace that I have found is part of the FastStrings by Peter Morris, formerly of DroopyEyes software, and his website and email are no longer working.

I have already ported FastStrings forward to work great for AnsiStrings in Delphi 2009/2010, where a byte is equal to one AnsiChar, but making them also work with the String (UnicodeString) in Delphi 2010 appears non-trivial.

Using this Boyer-Moore algorithm, it should be possible to easily do case insensitive searches, as well as case-insensitive search and replace, without any temporary string (using StrUpper etc), and without calling Pos() which is slower than Boyer-Moore searching when repeated searches over the same text are required.

(Edit: I have a partial solution, written as an answer to this question, it is almost 100% complete, it even has a fast string replace function. I believe it MUST have bugs, and especially think that since it pretends to be Unicode capable that it must be that there are glitches due to unfulfilled Unicode promises. )

(Edit2: Interesting and unexpected result; The large stack size of a unicode code-point table on the stack – SkipTable in the code below puts a serious damper on the amount of win-win-optimization you can do here in a unicode string boyer-moore string search. Thanks to Florent Ouchet for pointing out what I should have noticed immediately.)

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T23:04:50+00:00Added an answer on May 15, 2026 at 11:04 pm

    This answer is now complete and works for case sensitive mode, but does not work for case insensitive mode, and probably has other bugs too, since it’s not well unit tested, and could probably be optimized further, for example I repeated the local function __SameChar instead of using a comparison function callback which would have been faster, and actually, allowing the user to pass in a comparison function for all these would be great for Unicode users who want to provide some extra logic (equivalent sets of Unicode glyphs for some languages).

    Based on Dorin Dominica’s code, I built the following.

    { _FindStringBoyer:
      Boyer-Moore search algorith using regular String instead of AnsiSTring, and no ASM.
      Credited to Dorin Duminica.
    }
    function _FindStringBoyer(const sString, sPattern: string;
      const bCaseSensitive: Boolean = True; const fromPos: Integer = 1): Integer;
    
        function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
        begin
          if bCaseSensitive then
            Result := (sString[StringIndex] = sPattern[PatternIndex])
          else
            Result := (CompareText(sString[StringIndex], sPattern[PatternIndex]) = 0);
        end; // function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
    
    var
      SkipTable: array [Char] of Integer;
      LengthPattern: Integer;
      LengthString: Integer;
      Index: Integer;
      kIndex: Integer;
      LastMarker: Integer;
      Large: Integer;
      chPattern: Char;
    begin
      if fromPos < 1 then
        raise Exception.CreateFmt('Invalid search start position: %d.', [fromPos]);
      LengthPattern := Length(sPattern);
      LengthString := Length(sString);
      for chPattern := Low(Char) to High(Char) do
        SkipTable[chPattern] := LengthPattern;
      for Index := 1 to LengthPattern -1 do
        SkipTable[sPattern[Index]] := LengthPattern - Index;
      Large := LengthPattern + LengthString + 1;
      LastMarker := SkipTable[sPattern[LengthPattern]];
      SkipTable[sPattern[LengthPattern]] := Large;
      Index := fromPos + LengthPattern -1;
      Result := 0;
      while Index <= LengthString do begin
        repeat
          Index := Index + SkipTable[sString[Index]];
        until Index > LengthString;
        if Index <= Large then
          Break
        else
          Index := Index - Large;
        kIndex := 1;
        while (kIndex < LengthPattern) and __SameChar(Index - kIndex, LengthPattern - kIndex) do
          Inc(kIndex);
        if kIndex = LengthPattern then begin
          // Found, return.
          Result := Index - kIndex + 1;
          Index := Index + LengthPattern;
          exit;
        end else begin
          if __SameChar(Index, LengthPattern) then
            Index := Index + LastMarker
          else
            Index := Index + SkipTable[sString[Index]];
        end; // if kIndex = LengthPattern then begin
      end; // while Index <= LengthString do begin
    end;
    
    { Written by Warren, using the above code as a starter, we calculate the SkipTable once, and then count the number of instances of
      a substring inside the main string, at a much faster rate than we
      could have done otherwise.  Another thing that would be great is
      to have a function that returns an array of find-locations,
      which would be way faster to do than repeatedly calling Pos.
    }
    function _StringCountBoyer(const aSourceString, aFindString : String; Const CaseSensitive : Boolean = TRUE) : Integer;
    var
      foundPos:Integer;
      fromPos:Integer;
      Limit:Integer;
      guard:Integer;
      SkipTable: array [Char] of Integer;
      LengthPattern: Integer;
      LengthString: Integer;
      Index: Integer;
      kIndex: Integer;
      LastMarker: Integer;
      Large: Integer;
      chPattern: Char;
        function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
        begin
          if CaseSensitive then
            Result := (aSourceString[StringIndex] = aFindString[PatternIndex])
          else
            Result := (CompareText(aSourceString[StringIndex], aFindString[PatternIndex]) = 0);
        end; // function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
    
    begin
      result := 0;
      foundPos := 1;
      fromPos := 1;
      Limit := Length(aSourceString);
      guard := Length(aFindString);
      Index := 0;
      LengthPattern := Length(aFindString);
      LengthString := Length(aSourceString);
      for chPattern := Low(Char) to High(Char) do
        SkipTable[chPattern] := LengthPattern;
      for Index := 1 to LengthPattern -1 do
        SkipTable[aFindString[Index]] := LengthPattern - Index;
      Large := LengthPattern + LengthString + 1;
      LastMarker := SkipTable[aFindString[LengthPattern]];
      SkipTable[aFindString[LengthPattern]] := Large;
      while (foundPos>=1) and (fromPos < Limit) and (Index<Limit) do begin
    
        Index := fromPos + LengthPattern -1;
        if Index>Limit then
            break;
        kIndex := 0;
        while Index <= LengthString do begin
          repeat
            Index := Index + SkipTable[aSourceString[Index]];
          until Index > LengthString;
          if Index <= Large then
            Break
          else
            Index := Index - Large;
          kIndex := 1;
          while (kIndex < LengthPattern) and __SameChar(Index - kIndex, LengthPattern - kIndex) do
            Inc(kIndex);
          if kIndex = LengthPattern then begin
            // Found, return.
            //Result := Index - kIndex + 1;
            Index := Index + LengthPattern;
            fromPos := Index;
            Inc(Result);
            break;
          end else begin
            if __SameChar(Index, LengthPattern) then
              Index := Index + LastMarker
            else
              Index := Index + SkipTable[aSourceString[Index]];
          end; // if kIndex = LengthPattern then begin
        end; // while Index <= LengthString do begin
    
      end;
    end; 
    

    This is really a nice Algorithm, because:

    • it’s way faster to count instances of substring X in string Y this way, magnificently so.
    • For merely replacing Pos() the _FindStringBoyer() is faster than the pure asm version of Pos() contributed to Delphi by FastCode project people, that is currently used for Pos, and if you need the case-insensitivity, you can imagine the performance boost when we don’t have to call UpperCase on a 100 megabyte string. (Okay, your strings aren’t going to be THAT big. But still, Efficient Algorithms are a Thing of Beauty.)

    Okay I wrote a String Replace in Boyer-Moore style:

    function _StringReplaceBoyer(const aSourceString, aFindString,aReplaceString : String; Flags: TReplaceFlags) : String;
    var
      errors:Integer;
      fromPos:Integer;
      Limit:Integer;
      guard:Integer;
      SkipTable: array [Char] of Integer;
      LengthPattern: Integer;
      LengthString: Integer;
      Index: Integer;
      kIndex: Integer;
      LastMarker: Integer;
      Large: Integer;
      chPattern: Char;
      CaseSensitive:Boolean;
      foundAt:Integer;
      lastFoundAt:Integer;
      copyStartsAt:Integer;
      copyLen:Integer;
        function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
        begin
          if CaseSensitive then
            Result := (aSourceString[StringIndex] = aFindString[PatternIndex])
          else
            Result := (CompareText(aSourceString[StringIndex], aFindString[PatternIndex]) = 0);
        end; // function __SameChar(StringIndex, PatternIndex: Integer): Boolean;
    
    begin
      result := '';
      lastFoundAt := 0;
      fromPos := 1;
      errors := 0;
      CaseSensitive := rfIgnoreCase in Flags;
      Limit := Length(aSourceString);
      guard := Length(aFindString);
      Index := 0;
      LengthPattern := Length(aFindString);
      LengthString := Length(aSourceString);
      for chPattern := Low(Char) to High(Char) do
        SkipTable[chPattern] := LengthPattern;
      for Index := 1 to LengthPattern -1 do
        SkipTable[aFindString[Index]] := LengthPattern - Index;
      Large := LengthPattern + LengthString + 1;
      LastMarker := SkipTable[aFindString[LengthPattern]];
      SkipTable[aFindString[LengthPattern]] := Large;
      while (fromPos>=1) and (fromPos <= Limit) and (Index<=Limit) do begin
    
        Index := fromPos + LengthPattern -1;
        if Index>Limit then
            break;
        kIndex := 0;
        foundAt := 0;
        while Index <= LengthString do begin
          repeat
            Index := Index + SkipTable[aSourceString[Index]];
          until Index > LengthString;
          if Index <= Large then
            Break
          else
            Index := Index - Large;
          kIndex := 1;
          while (kIndex < LengthPattern) and __SameChar(Index - kIndex, LengthPattern - kIndex) do
            Inc(kIndex);
          if kIndex = LengthPattern then begin
    
    
            foundAt := Index - kIndex + 1;
            Index := Index + LengthPattern;
            //fromPos := Index;
            fromPos := (foundAt+LengthPattern);
            if lastFoundAt=0 then begin
                    copyStartsAt := 1;
                    copyLen := foundAt-copyStartsAt;
            end else begin
                    copyStartsAt := lastFoundAt+LengthPattern;
                    copyLen := foundAt-copyStartsAt;
            end;
    
            if (copyLen<=0)or(copyStartsAt<=0) then begin
                    Inc(errors);
            end;
    
            Result := Result + Copy(aSourceString, copyStartsAt, copyLen ) + aReplaceString;
            lastFoundAt := foundAt;
            if not (rfReplaceAll in Flags) then
                     fromPos := 0; // break out of outer while loop too!
            break;
          end else begin
            if __SameChar(Index, LengthPattern) then
              Index := Index + LastMarker
            else
              Index := Index + SkipTable[aSourceString[Index]];
          end; // if kIndex = LengthPattern then begin
        end; // while Index <= LengthString do begin
      end;
      if (lastFoundAt=0) then
      begin
         // nothing was found, just return whole original string
          Result := aSourceString;
      end
      else
      if (lastFoundAt+LengthPattern < Limit) then begin
         // the part that didn't require any replacing, because nothing more was found,
         // or rfReplaceAll flag was not specified, is copied at the
         // end as the final step.
        copyStartsAt := lastFoundAt+LengthPattern;
        copyLen := Limit; { this number can be larger than needed to be, and it is harmless }
        Result := Result + Copy(aSourceString, copyStartsAt, copyLen );
      end;
    
    end;
    

    Okay, problem: Stack footprint of this:

    var
      skiptable : array [Char] of Integer;  // 65536*4 bytes stack usage on Unicode delphi
    

    Goodbye CPU hell, Hello stack hell. If I go for a dynamic array, then I have to resize it at runtime. So this thing is basically fast, because the Virtual Memory system on your computer doesn’t blink at 256K going on the stack, but this is not always an optimal piece of code. Nevertheless my PC doesn’t blink at big stack stuff like this. It’s not going to become a Delphi standard library default or win any fastcode challenge in the future, with that kinda footprint. I think that repeated searches are a case where the above code should be written as a class, and the skiptable should be a data field in that class. Then you can build the boyer-moore table once, and over time, if the string is invariant, repeatedly use that object to do fast lookups.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.