I’ve been trying to do some research on multi-threading a prime-number generator I wrote in C++, and I’ve discovered that what I want to do is called “Parallel Processing”. I’ve been researching this for the past about 45 minutes, and I just can’t seem to figure it out.
The code I want to do this on is about 95 lines, which is too long to post here, but this is the basic concept:
unsigned long long i, total;
for(i;true;i++){
total = total + i;
cout << "Your new total is " << total << endl;
}
Is there any way I could stream this to 2 processors so that they’re work together instead of race? If so, how would I code it? I’m somewhat familiar with C++, but there’s still a lot that I don’t know, so an in-depth answer would be very-much appreciated.
EDIT: Wrong kind of algorithm the first time. I think this is it.
EDIT 2: Since alot of the answers are saying it depends on my algorithm, I’ll just post my code since it’s only 95 lines.
/*Generic GPL stuff, coded by me */
#include <iostream>
#include <list>
#include <fstream>
using namespace std;
int main(){
//Declare some variables and what not.
unsigned long long count = 0, misc = 0, length = 0, limit = 0;
list <long long> primes;
ifstream inFile;
ofstream outFile;
cout << "Initializing starting values based on your existing file of generated prime numbers.\n";
//Now let's get our starting values;
inFile.open("/home/user/Desktop/primes.txt");
//First, we need to find the prime generator thus far
for(unsigned long long x=0;inFile.good();x++){
inFile >> count;
if(!(bool)(x%100000000) && x!=0){
misc = x/100000000;
cout << misc << "00000000 primes read so far...\n";
}
}
inFile.close();
cout << "Highest generated prime found.\n";
//Now, as much as I hate to say it, we need to parse part of the file again now that we have the largest prime.
inFile.open("/media/ssd/primes_src.txt");
for(length; limit < count; length++){
inFile >> misc;
}
inFile.close();
limit = misc * misc;
cout << "Initialization complete. Now generating primes.\n";
//Loop time
l:
//We're just going to flat-out skip even numbers
count++;
count++;
//This checks to see if the number it's trying to test is beyond the current limit of accuracy.
if(count >= limit){
// Now if we are, we have 1 more possible prime factor
length++;
inFile.open("/media/ssd/primes_src.txt");
for(unsigned long long x=0; x < length; x++){
inFile >> misc;
}
inFile.close();
limit = misc * misc;
}
inFile.open("/media/ssd/primes_src.txt");
inFile >> misc; //We don't care about 2
for(unsigned long long x=1; x < length; x++){
inFile >> misc;
if(!(bool)(count%misc)){
inFile.close();
goto l;
}
}
inFile.close();
outFile.open("/home/user/Desktop/primes.txt", ios::out | ios::app);
//Now if we haven't been "goto"d, we add it to the file.
outFile << count << endl;
outFile.close();
goto l;
return 0;
}
/home/user/Desktop/primes.txt is my file holding all generated primes.
/media/ssd/primes_src.txt is my file holding all primes up to 2^32 plus 1 prime for good measure.
I don’t know if your algorithm is suitable for this method, but one way I’ve done parallel work is to create multiple threads that all run completely independently, aside from ONE point where it updates the “next candidate” (I was calculating weird numbers, so my update was a
i = __sync_fetch_and_add(¤t, 2);– current being the “numbers processes so far”. The __sync_fetch_and_add() is a standard function in g++, but Microsoft compilers have the same kind of thing, calledInterLockedAdd().When I ran my “benchmark”, I was just a fraction off 400% improvement from 4 cores on my machine (100% = 1 core).
I used plain pthread_create(), and each thread ends when I reach the “max” in the given range from the inputs.
As promised: A simple prime number finder:
Comments: The main starts “threads” number of threads (specified by
-t numon the command line – there is also a-e numthat defines the “max”). Each thread “picks” a number using the __sync_fetch_and_add() function. The thread checks if it’s a prime, and then iterates j to try to divide the number. If the number is a prime, it’s printed, otherwise just pick the next numer.If you wanted to, instead of printing numbers [and given sufficiently large numbers, you may run into problems calling
cout <<from within the thread], you can instead use an array, and use int my_index = __sync_fetch_and_add(&index, 1); and use that to store into an array.Naturally, this method does NOT work if each loop can’t run completely independently – then stuff gets much more complicated.
Edit: Note that a lot of useful error-checking is missing in this code. If you give zero threads, it won’t do anything, if you give a end value that is negative, who knows, and so on.
$ time ./prime -t 1 -e 100000 > /dev/null
and:time ./prime -t 4 -e 100000 > /dev/null
As you can see, it’s pretty much 4x faster.