I’m going to have to code a very basic checksum function, something like: char

Question

0

Asked: May 23, 20262026-05-23T18:42:15+00:00 2026-05-23T18:42:15+00:00

I’m going to have to code a very basic checksum function, something like: char

0

I’m going to have to code a very basic checksum function, something like:

char sum(const char * data, const int len)
{
    char sum(0);
    for (const char * end=data+len ; data<end ; ++data)
        sum += *data;
    return sum;
}

That’s trivial. Now, how should I optimize this?
First, I should probably use some std::for_each with a lambda or something like that:

char sum2(const char * data, const int len)
{
    char sum(0);
    std::for_each(data, data+len, [&sum](char b){sum+=b;});
    return sum;
}

Next, I could use multiple threads/cores to sum up chunks, then add the results. I won’t write it down, and I’m afraid the cost of creating threads (or getting them from a pool anyway), then cutting up the array, then dispatching everything, etc, would not be very good considering that I would mostly calculate checksums for small arrays, mostly 10-100 bytes, rarely up to 1000.

But what I really want is something lower level, some SIMD stuff that would sum up bytes on 128b registers, or maybe sum bytes independently between two registers without carrying the carry, or both.

Is there any such thing out there ?

Note: This IS actual premature optimization, but it’s fun, so what the hell?

Edit: I still need a way to sum up all the bytes in an SSE register, something better than

char ptr[16];
_mm_storeu_si128((__m128i*)ptr, sum);
checksum += ptr[0] + ptr[1] + ptr[2]  + ptr[3]  + ptr[4]  + ptr[5]  + ptr[6]  + ptr[7]
          + ptr[8] + ptr[9] + ptr[10] + ptr[11] + ptr[12] + ptr[13] + ptr[14] + ptr[15];

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T18:42:15+00:00

Yes, there are such instructions in the MMX instruction set, called “Packed ADD”:

_mm_add_pi8 in Visual C++
__builtin_ia32_paddb in gcc

And in the SSE2 instruction set:

_mm_add_epi8 in Visual C++
__builtin_ia32_paddb128 in gcc

EDIT: A faster way to add the partial sums:

__m128i sums;

sums = _mm_add_epi8(sums, _mm_srli_si128(sums, 1));
sums = _mm_add_epi8(sums, _mm_srli_si128(sums, 2));
sums = _mm_add_epi8(sums, _mm_srli_si128(sums, 4));
sums = _mm_add_epi8(sums, _mm_srli_si128(sums, 8));
checksum += _mm_cvtsi128_si32(sums);

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m going to have to code a very basic checksum function, something like: char

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply