i am trying to use OpenMP in my program (i am newbie using OpenMP) and the program return in two places errors.
Here is an example code:
#include <iostream>
#include <cstdint>
#include <vector>
#include <boost/multi_array.hpp>
#include <omp.h>
class CNachbarn {
public:
CNachbarn () { a = 0; }
uint32_t Get_Next_Neighbor() { return a++; }
private:
uint32_t a;
};
class CNetwork {
public:
CNetwork ( uint32_t num_elements_ );
~CNetwork();
void Validity();
void Clean();
private:
uint32_t num_elements;
uint32_t nachbar;
std::vector<uint32_t> remove_node_v;
CNachbarn *Nachbar;
};
CNetwork::CNetwork( uint32_t num_elements_ ) {
num_elements = num_elements_;
Nachbar = new CNachbarn();
remove_node_v.reserve( num_elements );
}
CNetwork::~CNetwork() {
delete Nachbar;
}
inline void CNetwork::Validity() {
#pragma omp parallel for
for ( uint32_t i = 0 ; i < num_elements ; i++ ) {
#pragma omp critical
remove_node_v.push_back(i);
}
}
void CNetwork::Clean () {
#pragma omp parallel for
for ( uint8_t j = 0 ; j < 2 ; j++ ) {
nachbar = Nachbar->Get_Next_Neighbor();
std::cout << "i: " << i << ", neighbor: " << nachbar << std::endl;
}
remove_node_v.clear();
}
int main() {
uint32_t num_elements = 1u << 3;
uint32_t i = 0;
CNetwork Network( num_elements );
do {
Network.Validity();
Network.Clean();
} while (++i < 2);
return 0;
}
I would like to know
-
if #pragma omp critical is a good solution for
push_back()? (Does solve this problem?) would it be better to define for each thread its own vector and then combine them (using insert() )? or some kind oflock? -
In my original code i get a running error at:
nachbar = Nachbar->Get_Next_Neighbor( &remove_node_v[i] );but in this example not. Nether the less, i would like OpenMP to use as the number of coresCNachbarnclasses, sinceCNachbarnis recursive computation and should not be influenced from the other threads. The question is how to do it smarty? (I dont think it is smart to defineCNachbarneach time i start the for-loop, since i call this function more the million times in my simulation and time is important.
Concerning your first problem:
Your function Validity is a perfect way to achieve below serial performance in a parallel loop. However, you already gave the correct answer. You should fill independent vectors for each thread and merge them afterwards.
EDIT: A possible remedy could look like this (if you require serial access to your elements, you need to change the loop a bit)
Your second problem could be solved by defining an array of CNachbarn with the size of the maximum number of OMP threads possible, and access distinct elements of the array from each thread like: