I’m implementing a distance matrix that calculates the distance between each point and all the other points and I have 100,000 points, so my matrix size will be 100,000 x 100,000. I implemented that using vector<vector<double> > dist. However, for this large data size it give out of memory error. The following is my code and any help will be really appreciated.
vector<vector<double> > dist(dat.size()) vector<double>(dat.size()));
size_t p,j;
ptrdiff_t i;
#pragma omp parallel for private(p,j,i) default(shared)
for(p=0;p<dat.size();++p)
{
// #pragma omp parallel for private(j,i) default(shared)
for (j = p + 1; j < dat.size(); ++j)
{
double ecl = 0.0;
for (i = 0; i < c; ++i)
{
ecl += (dat[p][i] - dat[j][i]) * (dat[p][i] - dat[j][i]);
}
ecl = sqrt(ecl);
dist[p][j] = ecl;
dist[j][p] = ecl;
}
}
A 100000 x 100000 matrix? A quick calculation shows why this is never going to work:
Even if it was possible to allocate this much memory I doubt very much whether this would be an efficient approach for a real problem.
If you’re looking to do some kind of geometric processing on large data sets you may be interested in some kind of spatial tree structure: kd-trees, quadtrees, r-trees maybe?