I’m having an interesting problem with seg faults when I attempt to move a malloc call inside an openmp for loop. Each thread must calculate it’s own copy of the distances vector in order to compute the classification correctly so the vector must be private…however when I attempt to call it with more than 1 thread it seg faults. This does not occur if the p_distances vector is declared as shared although of course this results in inaccurate distance calculations since the threads overwrite each other. Is there some very obvious rule I’m violating here…also, I’m aware that there are other bad coding practices evident in my code; I’m always open to suggestion regarding style but please help me focus on what’s actually causing the issue.
int *labels_train;
float *data_train;
int *labels_test;
float *data_test;
float *s_distances;
int *s_results, *p_results;
int i, j, k, h;
int N, D, K, M, thread_count;
void sort(float *_distances, int *_labels_train, int _N);
void computeParallelKNN()
{
// this is the target loop for multi-point parallelization
// seg fault here whenever p_distances malloc is moved inside parallel for loop and declared private
#pragma omp parallel for num_threads(thread_count) private(h, j, i)
for (i = 0; i < M; i++)
{
float *p_distances = (float*)malloc(N * sizeof(float));
k = 0;
// This is the target loop for single point parallelization
// No dependencies on outer loop (each thread can calculate distance for current point with some
// different training point)
for (h = 0; h < N*D; h+=D)
{
float dTmp = 0;
// Reduction operation..no dependencies here either (I don't think?)
// dTmp is critical variable for parallel operations
for (j = 0; j < D; j++)
{
dTmp += pow(data_test[i*D+j] - data_train[h+j],2);
}
p_distances[k] = (float)sqrt((double)dTmp);
k++;
}
// Make a copy of labels (since sort will invalidate original data/labels correlation)
int *temp_labels;
temp_labels = (int*)malloc(N * sizeof(int));
for (h = 0; h < N; h++)
temp_labels[h] = labels_train[h];
// Sort distances/labels_train vector
sort(p_distances, temp_labels, N);
// Calculate/print KNN classification
int neg = 0;
int pos = 0;
for (h = 0; h < K; h++)
{
if(temp_labels[h] == -1) neg++;
else pos++;
}
if (pos > neg) p_results[i] = 1;
else p_results[i] = -1;
free(p_distances);
}
}
// Selection sort algorithm modified to sort labels according to distance data
void sort(float *_distances, int *_labels_train, int _N)
{
int k;
for (k = 1; k < _N; ++k)
{
float dist_key = _distances[k];
int label_key = _labels_train[k];
int i = k - 1;
while ((i >= 0) && (dist_key < _distances[i]))
{
_distances[i + 1] = _distances[i];
_labels_train[i + 1] = _labels_train[i];
--i;
}
_distances[i + 1] = dist_key;
_labels_train[i + 1] = label_key;
}
}
I can post the complete code but this is definitely the area where the fault is occurring. Thanks in advance, hopefully it’s just a stupid mistake I’m making.
First of all, there is
kwhich is shared across all threads; there are no declarations to inform about a critical section around it or that it should be done atomically.Rewrite your code in a cleaner way and avoid global variables as much as possible – you can define your variables when you have just entered a new scope.
For example,
is the same as: