I am working on a graph with 875713 nodes and 5105039 edges. Using vector<bitset<875713>> vec(875713) or array<bitset<875713>, 875713> throws a segfault at me. I need to calculate all-pair-shortest-paths with path recovery. What alternative data structures do I have?
I found this SO Thread but it doesn’t answer my query.
EDIT
I tried this after reading the suggestions, seems to work. Thanks everyone for helping me out.
vector<vector<uint>> neighboursOf; // An edge between i and j exists if
// neighboursOf[i] contains j
neighboursOf.resize(nodeCount);
while (input.good())
{
uint fromNodeId = 0;
uint toNodeId = 0;
getline(input, line);
// Skip comments in the input file
if (line.size() > 0 && line[0] == '#')
continue;
else
{
// Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);
// Store the edge
neighboursOf[fromNodeId].push_back(toNodeId);
}
}
You could store lists of edges per node in a single array. If the number of edges per node is variable you can terminate the lists with a null edge. This will avoid the space overhead for many small lists (or similar data structures). The result could look like this:
Minimizing the space overhead is very important since you have a huge number of small data structures. The overhead for each list of nodes is just one integer, this is much less than the overhead of e.g. a stl vector. Also the lists are continuously layed out in memory, which means that there is no wasted space between any two lists. With variable sized vectors this will not be the case.
Reading all edges for any given node will be very fast because the edges for any node are stored continuously in memory.
The downside of this data arrangement is that when you initialize the arrays and construct the edge lists, you need to have all the edges for a node at hand. This is not a problem if you get the edges sorted by node, but does not work well if the edges are in random order.