Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4574084
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T19:52:29+00:00 2026-05-21T19:52:29+00:00

my first post here (it’s a real pity I hadn’t discovered this great community

  • 0

my first post here (it’s a real pity I hadn’t discovered this great community earlier).

Anyway, I’ve coded a C function that removes from string s any character contained in string del. I was wondering if there’s room for improvement, speed-wise, especially for the part that looks for chars contained in del, inside the for-loop (I was using strpbrk(), but pmg wisely suggested strchr() ).

Bug-hunters are most welcome too! I think it’s robust, but you never know.

Here’s the code (thanks in advance for any answers)…

Current version

// remove from string s any char contained in string del (return modified s)
// alg:
// parse s via cp1, keep desired *cp1's by copying them via cp2 to the start of s
// null terminate & return the trimmed s

char *s_strip(char *s, const char *del)
{
    char *cp1;                      // for parsing the whole s
    char *cp2;                      // for keeping desired *cp1's

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( !strchr(del, *cp1) )   // *cp1 is NOT contained in del (thanks pmg!)
            *cp2++ = *cp1;          // copy it via cp2

    *cp2 = 0;                       // null terminate the trimmed s
    return s;
}

Original version

char *s_strip(char *s, const char *del)
{
    char *cp1;                              // for parsing the whole s
    char *cp2;                              // for keeping desired *cp1's

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( cp1 != strpbrk(cp1, del) ) {   // *cp1 is NOT contained in del
            *cp2 = *cp1;                    // copy it via cp2
            cp2++;
        }

    *cp2 = 0;                               // null terminate the trimmed s
    return s;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T19:52:30+00:00Added an answer on May 21, 2026 at 7:52 pm

    How long are the input strings going to be, and how many characters are you going to be deleting, on average. If you work with 8-bit code sets and longer source strings, think about:

    char *s_strip(char *s, const char *del)
    {
        char map[256] = { 0 };
        const unsigned char *up1 = (const unsigned char *)del;
        unsigned char *up2 = (unsigned char *)s;
        unsigned char *up3 = (unsigned char *)s;
    
        while (*up1 != '\0')
            map[*up1++] = 1;
    
        for ( ; *up2 != '\0'; up2++)
        {
            if (map[*up2] == 0)
                *up3++ = *up2;
        }
        *up3 = '\0';
    
        return (char *)up3;
    }
    

    This replaces the function call (strpbrk() or strchr()) with a simple array lookup at the cost of initializing the array on entry to the function. If the processed strings are fairly long, this could easily pay itself back in the vastly reduced search time.

    This version of the code returns a pointer to the null at the end of the string – which allows you to compute the length of the compressed string on return without calling strlen(). I find this a more useful design than returning the pointer to the start of the string – I already knew where the string started, but I no longer know where it ends, yet the function s_strip() does know that. (If we were able to start over, I’d lobby for the same design for strcpy() and strcat() too.)

    The coercions to ‘unsigned char’ ensure that it works properly on machines with signed and unsigned characters. I’m not really happy with the slathering of casts, but it is hard to avoid them (well, the cast assigning s to up3 could be avoided by copying `up2 instead).

    You probably could come up with a design that handles signed characters too, along the lines of:

    char realmap[256] = { 0 };
    char *map = &realmap[128];
    

    You can then rely on the signed characters being in a range such that map[-128] .. map[+127] still lands within the realmap array.


    The function above has been corrected a couple of times since it was first posted – it now seems to work consistently with the two solutions suggested in the question. I put the three solutions into a test harness to time them, mainly because I was curious about how the map code would perform, having lost out to assembler code inline functions in the past. The mapping solution very quickly shows itself to be fastest on my test system (Mac Mini 2 GHZ Intel Core 2 Duo, MacOS X 10.6.7, GCC 4.1.2).

    The times are average times in seconds. Each algorithm was given 10 runs.
    The size is the size of the source string; the delete string was 22 characters (6 each of lower-case letters, upper-case letters and digits, plus 4 punctuation). The data was a fixed string built from the letters, digits and punctuation in ASCII, repeated as often as necessary to reach the stated size, which excludes the terminating null. Note that the timing includes copying the source string each time – the ‘null’ column indicates the time spent on the copying.

    size     map        strchr     strpbrk    null       micro1     micro2
       2     0.000542   0.002292   0.001009   0.000106   0.000639   0.000707
       8     0.000654   0.004125   0.017524   0.000106   0.001012   0.000966
      32     0.001667   0.015815   0.063314   0.000196   0.002549   0.002247
     128     0.006385   0.064513   0.313749   0.000171   0.008455   0.007188
     512     0.022231   0.257910   1.293040   0.000282   0.013284   0.011829
    2048     0.089066   1.035052   5.297966   0.000819   0.043391   0.037597
    

    Even with very short source strings, the map algorithm is much faster than either the strchr() or strpbrk() algorithms (5-10 times as fast as the strchr() algorithm, and 5-50 times as fast as the strpbrk() algorithm), with the disparity growing with search string size. (I did not expect this result – because there is setup overhead in the map code.)

    The ‘micro1’ and ‘micro2’ algorithms correspond to the modifications suggested by AShelly. When the strings are long enough (somewhere between 128 and 512 bytes is the switch-over), then the micro-optimized versions are quicker than the simple map.


    Test Code

    Contact me (see my profile) if you want the source for timer.c, timer.h. I’d normally build them into a library and link with the library, but it was simpler to include the files in the program this time.

    #include <string.h>
    
    extern char *s_strip1(char *s, const char *del);
    extern char *s_strip2(char *s, const char *del);
    extern char *s_strip3(char *s, const char *del);
    
    char *s_strip3(char *s, const char *del)
    {
        char map[256] = { 0 };
        const unsigned char *up1 = (const unsigned char *)del;
        unsigned char *up2 = (unsigned char *)s;
        unsigned char *up3 = (unsigned char *)s;
    
        while (*up1 != '\0')
            map[*up1++] = 1;
    
        for ( ; *up2 != '\0'; up2++)
        {
            if (map[*up2] == 0)
                *up3++ = *up2;
        }
        *up3 = '\0';
    
        return (char *)up3;
    }
    
    char *s_strip2(char *s, const char *del)
    {
        char *cp1;
        char *cp2;
    
        for (cp1=s, cp2=s; *cp1; cp1++ )
            if ( !strchr(del, *cp1) )
                *cp2++ = *cp1;
    
        *cp2 = 0;
        return s;
    }
    
    char *s_strip1(char *s, const char *del)
    {
        char *cp1;
        char *cp2;
    
        for (cp1=s, cp2=s; *cp1; cp1++ )
            if ( cp1 != strpbrk(cp1, del) ) {
                *cp2 = *cp1;
                cp2++;
            }
    
        *cp2 = 0;
        return s;
    }
    
    #include <stdio.h>
    #include "timer.h"
    #include "timer.c"
    
    enum { NUM_REPEATS = 10000 };
    typedef char *(*Function)(char *str, const char *del);
    
    static void fill_bytes(char *buffer, size_t buflen)
    {
        static const char source[] =
            "abcdefghijklmnopqrstuvwxyz"
            "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
            "0123456789[]{}\\|,./?><;:'\"=+-_)(*&^%$#@!";
        char *end = buffer + buflen;
    
        while (buffer < end)
        {
            size_t numbytes = sizeof(source) - 1;
            if ((size_t)(end - buffer) < sizeof(source)-1)
                numbytes = end - buffer;
            memmove(buffer, source, numbytes);
            buffer += numbytes;
        }
    }
    
    static void test(Function f, const char *fn, const char *del, size_t numbytes)
    {
        Clock clk;
        char refbuf[numbytes];
        char buffer[numbytes];
        char clkbuf[32];
        fill_bytes(refbuf, sizeof(refbuf));
        strcpy(buffer, refbuf);
        clk_init(&clk);
        clk_start(&clk);
        for (size_t i = 0; i < NUM_REPEATS; i++)
        {
            memmove(buffer, refbuf, sizeof(buffer));
            if (f)
                (*f)(buffer, del);
        }
        clk_stop(&clk);
        printf("%-17s (%4zd) = %10s (%.64s)\n", fn, numbytes,
               clk_elapsed_us(&clk, clkbuf, sizeof(clkbuf)), buffer);
    }
    
    int main(void)
    {
        for (int size = 2; size <= 2048; size = size * 4)
        {
            for (int i = 0; i < 10; i++)
            {
               test(s_strip1, "s_strip1:strpbrk:", "AJQRSTajqrst234567=+[]", size);
               test(s_strip2, "s_strip2:strchr:",  "AJQRSTajqrst234567=+[]", size);
               test(s_strip3, "s_strip3:map",      "AJQRSTajqrst234567=+[]", size);
               test(0,        "s_strip4:null",     "AJQRSTajqrst234567=+[]", size);
            }
        }
        return 0;
    }
    

    Micro-optimizations

    extern char *s_strip4(char *s, const char *del);
    extern char *s_strip5(char *s, const char *del);
    
    char *s_strip5(char *s, const char *del)
    {
        char map[256];
        const unsigned char *up1 = (const unsigned char *)del;
        unsigned char *up2 = (unsigned char *)s;
        unsigned char *up3 = (unsigned char *)s;
    
        memset(map, 1, sizeof(map));
    
        while (*up1 != '\0')
            map[*up1++] = 0;
    
        for ( ; *up2 != '\0'; up2++)
        {
            *up3 = *up2;
            up3 += map[*up2];
        }
        *up3 = '\0';
    
        return (char *)up3;
    }
    
    char *s_strip4(char *s, const char *del)
    {
        char map[256] = { 0 };
        const unsigned char *up1 = (const unsigned char *)del;
        unsigned char *up2 = (unsigned char *)s;
        unsigned char *up3 = (unsigned char *)s;
    
        while (*up1 != '\0')
            map[*up1++] = 1;
    
        for ( ; *up2 != '\0'; up2++)
        {
            *up3 = *up2;
            up3 += !map[*up2];
        }
        *up3 = '\0';
    
        return (char *)up3;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

first post here at this great website! I would like to reduce the amount
This is my first post here and since I've seen many great answers I
My first post here, so i hope this is the right area. I am
This is my first post here. I have a problem. I need to take
this is my first post here on stackoverflow and am very impressed by the
I'm new to Java and this is my first post on here so hopefully
This is my first time here so I hope I post this question at
This is my first post here, therefore apologize for any blunders. I'm developing a
This is my first post here after many visits. Hello! I am trying to
This is my first post here. I want to check if a file exists

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.