Some time ago I used the (blazing fast) primesieve in python that I found

Question

0

Editorial Team

Asked: May 20, 20262026-05-20T14:16:37+00:00 2026-05-20T14:16:37+00:00

Some time ago I used the (blazing fast) primesieve in python that I found

0

Some time ago I used the (blazing fast) primesieve in python that I found here: Fastest way to list all primes below N

To be precise, this implementation:

def primes2(n):
    """ Input n>=6, Returns a list of primes, 2 <= p < n """
    n, correction = n-n%6+6, 2-(n%6>1)
    sieve = [True] * (n/3)
    for i in xrange(1,int(n**0.5)/3+1):
      if sieve[i]:
        k=3*i+1|1
        sieve[      k*k/3      ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
        sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
    return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]

Now I can slightly grasp the idea of the optimizing by automaticly skipping multiples of 2, 3 and so on, but when it comes to porting this algorithm to C++ I get stuck (I have a good understanding of python and a reasonable/bad understanding of C++, but good enough for rock ‘n roll).

What I currently have rolled myself is this (isqrt() is just a simple integer square root function):

template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T sievemax = (N-3 + (1-(N % 2))) / 2;
    T i;
    T sievemaxroot = isqrt(sievemax) + 1;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();

    primes.push_back(2);

    for (i = 0; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            primes.push_back(2*i+3);
            for (T j = 3*i+3; j <= sievemax; j += 2*i+3) sieve[j] = 0; // filter multiples
        }
    }

    for (; i <= sievemax; i++) {
        if (sieve[i]) primes.push_back(2*i+3);
    }
}

This implementation is decent and automatically skips multiples of 2, but if I could port the Python implementation I think it could be much faster (50%-30% or so).

To compare the results (in the hope this question will be successfully answered), the current execution time with N=100000000, g++ -O3 on a Q6600 Ubuntu 10.10 is 1230ms.

Now I would love some help with either understanding what the above Python implementation does or that you would port it for me (not as helpful though).

EDIT

Some extra information about what I find difficult.

I have trouble with the techniques used like the correction variable and in general how it comes together. A link to a site explaining different Eratosthenes optimizations (apart from the simple sites that say “well you just skip multiples of 2, 3 and 5” and then get slam you with a 1000 line C file) would be awesome.

I don’t think I would have issues with a 100% direct and literal port, but since after all this is for learning that would be utterly useless.

EDIT

After looking at the code in the original numpy version, it actually is pretty easy to implement and with some thinking not too hard to understand. This is the C++ version I came up with. I’m posting it here in full version to help further readers in case they need a pretty efficient primesieve that is not two million lines of code. This primesieve does all primes under 100000000 in about 415 ms on the same machine as above. That’s a 3x speedup, better then I expected!

#include <vector>
#include <boost/dynamic_bitset.hpp>

// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
    unsigned long rem = 0;
    unsigned long root = 0;

    for (short i = 0; i < 16; i++) {
        root <<= 1;
        rem = ((rem << 2) + (a >> 30));
        a <<= 2;
        root++;

        if (root <= rem) {
            rem -= root;
            root++;
        } else root--;

    }

    return static_cast<unsigned short> (root >> 1);
}

// https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// https://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
    T i, j, k, l, sievemax, sievemaxroot;

    sievemax = N/3;
    if ((N % 6) == 2) sievemax++;

    sievemaxroot = isqrt(N)/3;

    boost::dynamic_bitset<> sieve(sievemax);
    sieve.set();

    primes.push_back(2);
    primes.push_back(3);

    for (i = 1; i <= sievemaxroot; i++) {
        if (sieve[i]) {
            k = (3*i + 1) | 1;
            l = (4*k-2*k*(i&1)) / 3;

            for (j = k*k/3; j < sievemax; j += 2*k) {
                sieve[j] = 0;
                sieve[j+l] = 0;
            }

            primes.push_back(k);
        }
    }

    for (i = sievemaxroot + 1; i < sievemax; i++) {
        if (sieve[i]) primes.push_back((3*i+1)|1);
    }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T14:16:37+00:00

I’ll try to explain as much as I can. The sieve array has an unusual indexing scheme; it stores a bit for each number that is congruent to 1 or 5 mod 6. Thus, a number 6*k + 1 will be stored in position 2*k and k*6 + 5 will be stored in position 2*k + 1. The 3*i+1|1 operation is the inverse of that: it takes numbers of the form 2*n and converts them into 6*n + 1, and takes 2*n + 1 and converts it into 6*n + 5 (the +1|1 thing converts 0 to 1 and 3 to 5). The main loop iterates k through all numbers with that property, starting with 5 (when i is 1); i is the corresponding index into sieve for the number k. The first slice update to sieve then clears all bits in the sieve with indexes of the form k*k/3 + 2*m*k (for m a natural number); the corresponding numbers for those indexes start at k^2 and increase by 6*k at each step. The second slice update starts at index k*(k-2*(i&1)+4)/3 (number k * (k+4) for k congruent to 1 mod 6 and k * (k+2) otherwise) and similarly increases the number by 6*k at each step.

Here’s another attempt at an explanation: let candidates be the set of all numbers that are at least 5 and are congruent to either 1 or 5 mod 6. If you multiply two elements in that set, you get another element in the set. Let succ(k) for some k in candidates be the next element (in numerical order) in candidates that is larger than k. In that case, the inner loop of the sieve is basically (using normal indexing for sieve):

for k in candidates:
  for (l = k; ; l += 6) sieve[k * l] = False
  for (l = succ(k); ; l += 6) sieve[k * l] = False

Because of the limitations on which elements are stored in sieve, that is the same as:

for k in candidates:
  for l in candidates where l >= k:
    sieve[k * l] = False

which will remove all multiples of k in candidates (other than k itself) from the sieve at some point (either when the current k was used as l earlier or when it is used as k now).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Some time ago I used the (blazing fast) primesieve in python that I found

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply