Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6370175
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T00:53:46+00:00 2026-05-25T00:53:46+00:00

Don’t ask how I got there, but I was playing around with some masking,

  • 0

Don’t ask how I got there, but I was playing around with some masking, loop unrolling etc. In any case, out of interest I was thinking about how I would implement an indexof method, and long story short, all that masking etc aside, this naive implementation:

public static unsafe int IndexOf16(string s, int startIndex, char c) {
            if (startIndex < 0 || startIndex >= s.Length) throw new ArgumentOutOfRangeException("startIndex");
            fixed (char* cs = s) {
                for (int i = startIndex; i < s.Length; i++) {
                    if ((cs[i]) == c) return i;
                }
                return -1;
            }
        }

is faster than string.IndexOf(char). I wrote some simple tests, and it seems to match output exactly.
Some sample output numbers from my machine (it varies to some degree of course, but the trend is clear):

short haystack 500k runs
1741 ms for IndexOf16
2737 ms for IndexOf32
2963 ms for IndexOf64
2337 ms for string.IndexOf <-- buildin

longer haystack:
2888 ms for IndexOf16
3028 ms for IndexOf32
2816 ms for IndexOf64
3353 ms for string.IndexOf <-- buildin

IndexOfChar is marked extern, so you cant reflector it. However I think this should be the (native) implementation:
http://www.koders.com/cpp/fidAB4768BA4DF45482A7A2AA6F39DE9C272B25B8FE.aspx?s=IndexOfChar#L1000

They seem to use the same naive implementation.

Questions come to my mind:

1) Am I missing something in my implementation that explains why its faster? I can only think of extended chars support, but their implementation suggests they don’t do anything special for that either.

2) I assumed much of the low level methods would ultimately be implemented in hand assembler, that seems not the case. If so, why implement it natively at all, instead of just in C# like my sample implementation?

(Complete test here (I think its too long to paste here): http://paste2.org/p/1606018 )

(No this is not premature optimization, it’s not for a project I am just messing about) 🙂

Update: Thnx to Oliver for the hint about nullcheck and the Count param. I have added these to my IndexOf16Implementation like so:

public static unsafe int IndexOf16(string s, int startIndex, char c, int count = -1) {
    if (s == null) throw new ArgumentNullException("s");
    if (startIndex < 0 || startIndex >= s.Length) throw new ArgumentOutOfRangeException("startIndex");
    if (count == -1) count = s.Length - startIndex;
    if (count < 0 || count > s.Length - startIndex) throw new ArgumentOutOfRangeException("count");

    int endIndex = startIndex + count;
    fixed (char* cs = s) {
        for (int i = startIndex; i < endIndex; i++) {
            if ((cs[i]) == c) return i;
        }
        return -1;
    }
}

The numbers changed slightly, however it is still quite significantly faster (32/64 results omitted):

short haystack 500k runs
1908 ms for IndexOf16
2361 ms for string.IndexOf
longer haystack:
3061 ms for IndexOf16
3391 ms for string.IndexOf

Update2: This version is faster yet (especially for the long haystack case):

public static unsafe int IndexOf16(string s, int startIndex, char c, int count = -1) {
            if (s == null) throw new ArgumentNullException("s");
            if (startIndex < 0 || startIndex >= s.Length) throw new ArgumentOutOfRangeException("startIndex");
            if (count == -1) count = s.Length - startIndex;
            if (count < 0 || count > s.Length - startIndex) throw new ArgumentOutOfRangeException("count");

            int endIndex = startIndex + count;
            fixed (char* cs = s) {
                char* cp = cs + startIndex;
                for (int i = startIndex; i <= endIndex; i++, cp++) {
                    if (*cp == c) return i;
                }
                return -1;
            }
        }

Update 4:
Based on the discussion with LastCoder I believe this to be architecture depended. My Xeon W3550 at works seems to prefer this version, while his i7 seems to like the buildin version. My home machine (Athlon II) appears to be in between. I am surprised about the large difference though.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T00:53:47+00:00Added an answer on May 25, 2026 at 12:53 am

    Possibility 1)
    This may not hold (as true) in C# but when I did optimization work for x86-64 assembler I quickly found out while benchmarking that calling code from a DLL (marked external) was slower than implementing the same exact function within my executable. The most obvious reason is paging and memory, the DLL (external) method is loaded far away in memory from the rest of the running code and if it wasn’t accessed previously it’ll need to be paged in. Your benchmarking code should do some warm up loops of the functions you are benchmarking to make sure they are paged in memory first before you time them.

    Possibility 2)
    Microsoft tends not to optimize string functions to the fullest, so out optimizing a native string length, substring, indexof etc. isn’t really unheard of. Anecdote; in x86-64 assembler I was able to create a version of WinXP64’s RtlInitUnicodeString function that ran 2x faster in almost all practical use cases.

    Possibility 3) Your benchmarking code shows that you’re using the 2 parameter overload for IndexOf, this function likely calls the 3 parameter overload IndexOf(Char, Int32, Int32) which adds an extra overhead to each iteration.


    This may be even faster because your removing the i variable increment per iteration.

                char* cp = cs + startIndex;
                char* cpEnd = cp + endIndex;
                while (cp <= cpEnd) {
                    if (*cp == c) return cp - cs;
                    cp++;
                }
    

    edit In reply regarding (2) for your curiosity, coded back in 2005 and used to patch the ntdll.dll of my WinXP64 machine. http://board.flatassembler.net/topic.php?t=4467

    RtlInitUnicodeString_Opt: ;;rcx=buff rdx=ucharstr 77bytes
                 xor    r9d,r9d
                 test   rdx,rdx
                 mov    dword[rcx],r9d
                 mov    [rcx+8],rdx
                 jz     .end
                 mov    r8,rdx
       .scan:
                 mov    eax,dword[rdx]
    
                 test   ax,ax
                 jz     .one
                 add    rdx,4
                 shr    eax,16
                 test   ax,ax
                 jz     .two
                 jmp    .scan
       .two:
                 add    rdx,2
       .one:
                 mov    eax,0fffch
                 sub    rdx,r8
                 cmp    rdx,0fffeh
                 cmovnb rdx,rax
                 mov    [ecx],dx
                 add    dx,2
                 mov    [ecx+2],dx
                 ret
       .end:
                 retn  
    

    edit 2 Running your example code (updated with your fastest version) the string.IndexOf runs faster on my Intel i7, 4GB RAM, Win7 64bit.

    short haystack 500k runs
    2590 ms for IndexOf16
    2287 ms for string.IndexOf
    longer haystack:
    3549 ms for IndexOf16
    2757 ms for string.IndexOf
    

    Optimizations are sometimes very architecture reliant.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

No related questions found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.