my first post here (it’s a real pity I hadn’t discovered this great community

Question

0

Asked: May 21, 20262026-05-21T19:52:29+00:00 2026-05-21T19:52:29+00:00

my first post here (it’s a real pity I hadn’t discovered this great community

0

my first post here (it’s a real pity I hadn’t discovered this great community earlier).

Anyway, I’ve coded a C function that removes from string s any character contained in string del. I was wondering if there’s room for improvement, speed-wise, especially for the part that looks for chars contained in del, inside the for-loop (I was using strpbrk(), but pmg wisely suggested strchr() ).

Bug-hunters are most welcome too! I think it’s robust, but you never know.

Here’s the code (thanks in advance for any answers)…

Current version

// remove from string s any char contained in string del (return modified s)
// alg:
// parse s via cp1, keep desired *cp1's by copying them via cp2 to the start of s
// null terminate & return the trimmed s

char *s_strip(char *s, const char *del)
{
    char *cp1;                      // for parsing the whole s
    char *cp2;                      // for keeping desired *cp1's

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( !strchr(del, *cp1) )   // *cp1 is NOT contained in del (thanks pmg!)
            *cp2++ = *cp1;          // copy it via cp2

    *cp2 = 0;                       // null terminate the trimmed s
    return s;
}

Original version

char *s_strip(char *s, const char *del)
{
    char *cp1;                              // for parsing the whole s
    char *cp2;                              // for keeping desired *cp1's

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( cp1 != strpbrk(cp1, del) ) {   // *cp1 is NOT contained in del
            *cp2 = *cp1;                    // copy it via cp2
            cp2++;
        }

    *cp2 = 0;                               // null terminate the trimmed s
    return s;
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T19:52:30+00:00

How long are the input strings going to be, and how many characters are you going to be deleting, on average. If you work with 8-bit code sets and longer source strings, think about:

char *s_strip(char *s, const char *del)
{
    char map[256] = { 0 };
    const unsigned char *up1 = (const unsigned char *)del;
    unsigned char *up2 = (unsigned char *)s;
    unsigned char *up3 = (unsigned char *)s;

    while (*up1 != '\0')
        map[*up1++] = 1;

    for ( ; *up2 != '\0'; up2++)
    {
        if (map[*up2] == 0)
            *up3++ = *up2;
    }
    *up3 = '\0';

    return (char *)up3;
}

This replaces the function call (strpbrk() or strchr()) with a simple array lookup at the cost of initializing the array on entry to the function. If the processed strings are fairly long, this could easily pay itself back in the vastly reduced search time.

This version of the code returns a pointer to the null at the end of the string – which allows you to compute the length of the compressed string on return without calling strlen(). I find this a more useful design than returning the pointer to the start of the string – I already knew where the string started, but I no longer know where it ends, yet the function s_strip() does know that. (If we were able to start over, I’d lobby for the same design for strcpy() and strcat() too.)

The coercions to ‘unsigned char’ ensure that it works properly on machines with signed and unsigned characters. I’m not really happy with the slathering of casts, but it is hard to avoid them (well, the cast assigning s to up3 could be avoided by copying `up2 instead).

You probably could come up with a design that handles signed characters too, along the lines of:

char realmap[256] = { 0 };
char *map = &realmap[128];

You can then rely on the signed characters being in a range such that map[-128] .. map[+127] still lands within the realmap array.

The function above has been corrected a couple of times since it was first posted – it now seems to work consistently with the two solutions suggested in the question. I put the three solutions into a test harness to time them, mainly because I was curious about how the map code would perform, having lost out to assembler code inline functions in the past. The mapping solution very quickly shows itself to be fastest on my test system (Mac Mini 2 GHZ Intel Core 2 Duo, MacOS X 10.6.7, GCC 4.1.2).

The times are average times in seconds. Each algorithm was given 10 runs.
The size is the size of the source string; the delete string was 22 characters (6 each of lower-case letters, upper-case letters and digits, plus 4 punctuation). The data was a fixed string built from the letters, digits and punctuation in ASCII, repeated as often as necessary to reach the stated size, which excludes the terminating null. Note that the timing includes copying the source string each time – the ‘null’ column indicates the time spent on the copying.

size     map        strchr     strpbrk    null       micro1     micro2
   2     0.000542   0.002292   0.001009   0.000106   0.000639   0.000707
   8     0.000654   0.004125   0.017524   0.000106   0.001012   0.000966
  32     0.001667   0.015815   0.063314   0.000196   0.002549   0.002247
 128     0.006385   0.064513   0.313749   0.000171   0.008455   0.007188
 512     0.022231   0.257910   1.293040   0.000282   0.013284   0.011829
2048     0.089066   1.035052   5.297966   0.000819   0.043391   0.037597

Even with very short source strings, the map algorithm is much faster than either the strchr() or strpbrk() algorithms (5-10 times as fast as the strchr() algorithm, and 5-50 times as fast as the strpbrk() algorithm), with the disparity growing with search string size. (I did not expect this result – because there is setup overhead in the map code.)

The ‘micro1’ and ‘micro2’ algorithms correspond to the modifications suggested by AShelly. When the strings are long enough (somewhere between 128 and 512 bytes is the switch-over), then the micro-optimized versions are quicker than the simple map.

Test Code

Contact me (see my profile) if you want the source for timer.c, timer.h. I’d normally build them into a library and link with the library, but it was simpler to include the files in the program this time.

#include <string.h>

extern char *s_strip1(char *s, const char *del);
extern char *s_strip2(char *s, const char *del);
extern char *s_strip3(char *s, const char *del);

char *s_strip3(char *s, const char *del)
{
    char map[256] = { 0 };
    const unsigned char *up1 = (const unsigned char *)del;
    unsigned char *up2 = (unsigned char *)s;
    unsigned char *up3 = (unsigned char *)s;

    while (*up1 != '\0')
        map[*up1++] = 1;

    for ( ; *up2 != '\0'; up2++)
    {
        if (map[*up2] == 0)
            *up3++ = *up2;
    }
    *up3 = '\0';

    return (char *)up3;
}

char *s_strip2(char *s, const char *del)
{
    char *cp1;
    char *cp2;

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( !strchr(del, *cp1) )
            *cp2++ = *cp1;

    *cp2 = 0;
    return s;
}

char *s_strip1(char *s, const char *del)
{
    char *cp1;
    char *cp2;

    for (cp1=s, cp2=s; *cp1; cp1++ )
        if ( cp1 != strpbrk(cp1, del) ) {
            *cp2 = *cp1;
            cp2++;
        }

    *cp2 = 0;
    return s;
}

#include <stdio.h>
#include "timer.h"
#include "timer.c"

enum { NUM_REPEATS = 10000 };
typedef char *(*Function)(char *str, const char *del);

static void fill_bytes(char *buffer, size_t buflen)
{
    static const char source[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789[]{}\\|,./?><;:'\"=+-_)(*&^%$#@!";
    char *end = buffer + buflen;

    while (buffer < end)
    {
        size_t numbytes = sizeof(source) - 1;
        if ((size_t)(end - buffer) < sizeof(source)-1)
            numbytes = end - buffer;
        memmove(buffer, source, numbytes);
        buffer += numbytes;
    }
}

static void test(Function f, const char *fn, const char *del, size_t numbytes)
{
    Clock clk;
    char refbuf[numbytes];
    char buffer[numbytes];
    char clkbuf[32];
    fill_bytes(refbuf, sizeof(refbuf));
    strcpy(buffer, refbuf);
    clk_init(&clk);
    clk_start(&clk);
    for (size_t i = 0; i < NUM_REPEATS; i++)
    {
        memmove(buffer, refbuf, sizeof(buffer));
        if (f)
            (*f)(buffer, del);
    }
    clk_stop(&clk);
    printf("%-17s (%4zd) = %10s (%.64s)\n", fn, numbytes,
           clk_elapsed_us(&clk, clkbuf, sizeof(clkbuf)), buffer);
}

int main(void)
{
    for (int size = 2; size <= 2048; size = size * 4)
    {
        for (int i = 0; i < 10; i++)
        {
           test(s_strip1, "s_strip1:strpbrk:", "AJQRSTajqrst234567=+[]", size);
           test(s_strip2, "s_strip2:strchr:",  "AJQRSTajqrst234567=+[]", size);
           test(s_strip3, "s_strip3:map",      "AJQRSTajqrst234567=+[]", size);
           test(0,        "s_strip4:null",     "AJQRSTajqrst234567=+[]", size);
        }
    }
    return 0;
}

Micro-optimizations

extern char *s_strip4(char *s, const char *del);
extern char *s_strip5(char *s, const char *del);

char *s_strip5(char *s, const char *del)
{
    char map[256];
    const unsigned char *up1 = (const unsigned char *)del;
    unsigned char *up2 = (unsigned char *)s;
    unsigned char *up3 = (unsigned char *)s;

    memset(map, 1, sizeof(map));

    while (*up1 != '\0')
        map[*up1++] = 0;

    for ( ; *up2 != '\0'; up2++)
    {
        *up3 = *up2;
        up3 += map[*up2];
    }
    *up3 = '\0';

    return (char *)up3;
}

char *s_strip4(char *s, const char *del)
{
    char map[256] = { 0 };
    const unsigned char *up1 = (const unsigned char *)del;
    unsigned char *up2 = (unsigned char *)s;
    unsigned char *up3 = (unsigned char *)s;

    while (*up1 != '\0')
        map[*up1++] = 1;

    for ( ; *up2 != '\0'; up2++)
    {
        *up3 = *up2;
        up3 += !map[*up2];
    }
    *up3 = '\0';

    return (char *)up3;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

my first post here (it’s a real pity I hadn’t discovered this great community

Current version

Original version

Leave an answerCancel reply

1 Answer

Test Code

Micro-optimizations

Leave an answer
Cancel reply