Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 70643
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T19:45:47+00:00 2026-05-10T19:45:47+00:00

Note The question below was asked in 2008 about some code from 2003. As

  • 0

Note

The question below was asked in 2008 about some code from 2003. As the OP’s update shows, this entire post has been obsoleted by vintage 2008 algorithms and persists here only as a historical curiosity.


I need to do a fast case-insensitive substring search in C/C++. My requirements are as follows:

  • Should behave like strstr() (i.e. return a pointer to the match point).
  • Must be case-insensitive (doh).
  • Must support the current locale.
  • Must be available on Windows (MSVC++ 8.0) or easily portable to Windows (i.e. from an open source library).

Here is the current implementation I am using (taken from the GNU C Library):

/* Return the offset of one string within another.    Copyright (C) 1994,1996,1997,1998,1999,2000 Free Software Foundation, Inc.    This file is part of the GNU C Library.     The GNU C Library is free software; you can redistribute it and/or    modify it under the terms of the GNU Lesser General Public    License as published by the Free Software Foundation; either    version 2.1 of the License, or (at your option) any later version.     The GNU C Library is distributed in the hope that it will be useful,    but WITHOUT ANY WARRANTY; without even the implied warranty of    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU    Lesser General Public License for more details.     You should have received a copy of the GNU Lesser General Public    License along with the GNU C Library; if not, write to the Free    Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA    02111-1307 USA.  */  /*  * My personal strstr() implementation that beats most other algorithms.  * Until someone tells me otherwise, I assume that this is the  * fastest implementation of strstr() in C.  * I deliberately chose not to comment it.  You should have at least  * as much fun trying to understand it, as I had to write it :-).  *  * Stephen R. van den Berg, berg@pool.informatik.rwth-aachen.de */  /*  * Modified to use table lookup instead of tolower(), since tolower() isn't  * worth s*** on Windows.  *  * -- Anders Sandvig (anders@wincue.org)  */  #if HAVE_CONFIG_H # include <config.h> #endif  #include <ctype.h> #include <string.h>  typedef unsigned chartype;  char char_table[256];  void init_stristr(void) {   int i;   char string[2];    string[1] = '\0';   for (i = 0; i < 256; i++)   {     string[0] = i;     _strlwr(string);     char_table[i] = string[0];   } }  #define my_tolower(a) ((chartype) char_table[a])  char * my_stristr (phaystack, pneedle)      const char *phaystack;      const char *pneedle; {   register const unsigned char *haystack, *needle;   register chartype b, c;    haystack = (const unsigned char *) phaystack;   needle = (const unsigned char *) pneedle;    b = my_tolower (*needle);    if (b != '\0')   {     haystack--;             /* possible ANSI violation */     do       {         c = *++haystack;         if (c == '\0')           goto ret0;       }     while (my_tolower (c) != (int) b);      c = my_tolower (*++needle);     if (c == '\0')         goto foundneedle;      ++needle;     goto jin;      for (;;)     {       register chartype a;         register const unsigned char *rhaystack, *rneedle;          do         {           a = *++haystack;           if (a == '\0')               goto ret0;           if (my_tolower (a) == (int) b)               break;           a = *++haystack;           if (a == '\0')               goto ret0;         shloop:           ;         }       while (my_tolower (a) != (int) b);  jin:             a = *++haystack;       if (a == '\0')           goto ret0;          if (my_tolower (a) != (int) c)           goto shloop;          rhaystack = haystack-- + 1;         rneedle = needle;          a = my_tolower (*rneedle);          if (my_tolower (*rhaystack) == (int) a)           do           {               if (a == '\0')                 goto foundneedle;                ++rhaystack;           a = my_tolower (*++needle);               if (my_tolower (*rhaystack) != (int) a)                 break;            if (a == '\0')                 goto foundneedle;            ++rhaystack;               a = my_tolower (*++needle);           }           while (my_tolower (*rhaystack) == (int) a);          needle = rneedle;       /* took the register-poor approach */        if (a == '\0')           break;     }   } foundneedle:   return (char*) haystack; ret0:   return 0; }

Can you make this code faster, or do you know of a better implementation?

Note: I noticed that the GNU C Library now has a new implementation of strstr(), but I am not sure how easily it can be modified to be case-insensitive, or if it is in fact faster than the old one (in my case). I also noticed that the old implementation is still used for wide character strings, so if anyone knows why, please share.

Update

Just to make things clear—in case it wasn’t already—I didn’t write this function, it’s a part of the GNU C Library. I only modified it to be case-insensitive.

Also, thanks for the tip about strcasestr() and checking out other implementations from other sources (like OpenBSD, FreeBSD, etc.). It seems to be the way to go. The code above is from 2003, which is why I posted it here in hope for a better version being available, which apparently it is. 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T19:45:47+00:00Added an answer on May 10, 2026 at 7:45 pm

    The code you posted is about half as fast as strcasestr.

    $ gcc -Wall -o my_stristr my_stristr.c steve@solaris:~/code/tmp $ gcc -Wall -o strcasestr strcasestr.c  steve@solaris:~/code/tmp $ ./bench ./my_stristr > my_stristr.result ; ./bench ./strcasestr > strcasestr.result; steve@solaris:~/code/tmp $ cat my_stristr.result  run 1... time = 6.32 run 2... time = 6.31 run 3... time = 6.31 run 4... time = 6.31 run 5... time = 6.32 run 6... time = 6.31 run 7... time = 6.31 run 8... time = 6.31 run 9... time = 6.31 run 10... time = 6.31 average user time over 10 runs = 6.3120 steve@solaris:~/code/tmp $ cat strcasestr.result  run 1... time = 3.82 run 2... time = 3.82 run 3... time = 3.82 run 4... time = 3.82 run 5... time = 3.82 run 6... time = 3.82 run 7... time = 3.82 run 8... time = 3.82 run 9... time = 3.82 run 10... time = 3.82 average user time over 10 runs = 3.8200 steve@solaris:~/code/tmp 

    The main function was:

    int main(void) {         char * needle='hello';         char haystack[1024];         int i;          for(i=0;i<sizeof(haystack)-strlen(needle)-1;++i)         {                 haystack[i]='A'+i%57;         }         memcpy(haystack+i,needle, strlen(needle)+1);         /*printf('%s\n%d\n', haystack, haystack[strlen(haystack)]);*/         init_stristr();          for (i=0;i<1000000;++i)         {                 /*my_stristr(haystack, needle);*/                 strcasestr(haystack,needle);         }           return 0; } 

    It was suitably modified to test both implementations. I notice as I am typing this up I left in the init_stristr call, but it shouldn’t change things too much. bench is just a simple shell script:

    #!/bin/bash function bc_calc() {         echo $(echo 'scale=4;$1' | bc) } time='/usr/bin/time -p' prog='$1' accum=0 runs=10 for a in $(jot $runs 1 $runs) do         echo -n 'run $a... '         t=$($time $prog 2>&1| grep user | awk '{print $2}')         echo 'time = $t'         accum=$(bc_calc '$accum+$t') done  echo -n 'average user time over $runs runs = ' echo $(bc_calc '$accum/$runs') 
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 108k
  • Answers 108k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer My guesses: You have some data corruption between the two… May 11, 2026 at 9:12 pm
  • Editorial Team
    Editorial Team added an answer Try this <project name="moveproject" basedir="." default="moveDirs"> <target name="moveDirs"> <move todir="${basedir}"… May 11, 2026 at 9:12 pm
  • Editorial Team
    Editorial Team added an answer This question relates to comparisons of Java 2D frameworks and… May 11, 2026 at 9:12 pm

Related Questions

Background In a C# command-line app I'm writing, several of the parameters have yes
UPDATE: I recently learned from this question that in the entire discussion below, I
I tried the following code in LINQPad and got the results given below: List<string>
For a poor man's implementation of near -collation-correct sorting on the client side I

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.