I am working on a graph with 875713 nodes and 5105039 edges . Using

Question

0

Asked: June 7, 20262026-06-07T17:10:45+00:00 2026-06-07T17:10:45+00:00

I am working on a graph with 875713 nodes and 5105039 edges . Using

0

I am working on a graph with 875713 nodes and 5105039 edges. Using vector<bitset<875713>> vec(875713) or array<bitset<875713>, 875713> throws a segfault at me. I need to calculate all-pair-shortest-paths with path recovery. What alternative data structures do I have?

I found this SO Thread but it doesn’t answer my query.

EDIT

I tried this after reading the suggestions, seems to work. Thanks everyone for helping me out.

vector<vector<uint>> neighboursOf; // An edge between i and j exists if
                                   // neighboursOf[i] contains j
neighboursOf.resize(nodeCount);

while (input.good())
{
    uint fromNodeId = 0;
    uint toNodeId = 0;

    getline(input, line);

    // Skip comments in the input file
    if (line.size() > 0 && line[0] == '#')
        continue;
    else
    {
        // Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
        sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);

        // Store the edge
        neighboursOf[fromNodeId].push_back(toNodeId);
    }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T17:10:46+00:00

You could store lists of edges per node in a single array. If the number of edges per node is variable you can terminate the lists with a null edge. This will avoid the space overhead for many small lists (or similar data structures). The result could look like this:

enum {
    MAX_NODES = 875713,
    MAX_EDGES = 5105039,
};

int nodes[MAX_NODES+1];         // contains index into array edges[].
                                // index zero is reserved as null node
                                // to terminate lists.

int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
                                // each edge occupies a single entry in the
                                // array. each list ends with a null node.
                                // there are MAX_EDGES entries and MAX_NODES
                                // lists.

[...]

/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
    edge = edges[edge_index];
    /* do something with edge... */
}

Minimizing the space overhead is very important since you have a huge number of small data structures. The overhead for each list of nodes is just one integer, this is much less than the overhead of e.g. a stl vector. Also the lists are continuously layed out in memory, which means that there is no wasted space between any two lists. With variable sized vectors this will not be the case.

Reading all edges for any given node will be very fast because the edges for any node are stored continuously in memory.

The downside of this data arrangement is that when you initialize the arrays and construct the edge lists, you need to have all the edges for a node at hand. This is not a problem if you get the edges sorted by node, but does not work well if the edges are in random order.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am working on a graph with 875713 nodes and 5105039 edges . Using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply