I want to understand how to store a graph with huge data. I am designing an application which has a graph of huge railway route network. Where vertices are the railway station name. I have designed using adjacency list in C++. But now i found that it is consuming very high memory and sometime i also get no-memory error. I was wondering how such huge graph are stored so that algorithm on the graph can be used.
Graph is defined as
std::map<std::string, std::set<std::string> > railway_graph;
or how does google/facebook store there graph data structure.
Your choice of data structure will require a lot superfluous memory, dynamically allocated on the heap.
std::mapandstd::stringwill allocate a piece of memory for each single entry (plus its own overhead).std::stringwill also allocate a piece of memory for the string.This is comfortable and totally ok for many cases. But not ok for large data structures.
In the end you have a map, which contains pointers (which itself were allocated one by one) to sets, which contains pointers (which itself were allocated one by one) to strings, which contain pointers to the actual string buffers.
Your actual problem is the overhead that dynamic allocation incurs. On most platforms, a heap allocation requires an extra 16-byte of memory just for heap management (though the numbers vary…).
I suggest, that you re-define your graph in the following way:
Or, alternatively the following data structures may be easier for your use cases. It is similar to your example, but is much more compact in memory representation:
EDIT: Added and used
NodeIdList…If this still consumes too much memory, then you should think about keeping data on disk and loading it on demand.
If your node names are constant, then you should also think about some kind of string-table, a more compact representation of string data in memory. But this is rather low-level stuff.
Try to use better data structures first!