I currently have a graph that has about 10 million nodes and 35 million edges. For now the complete graph is loaded into memory at program start. This takes a couple of minutes (it is Java after all) and needs about half a gigabyte of RAM. For now it runs on a machine with a dual core processor and 4 gigabytes of RAM.
When the graph is searched using a breadth-first-search the memory usage rises to a peak of one gigabyte and it takes ten seconds on average.
I would like to deploy the program on a couple of computers. The functionality apart from the graph search does take very little resources. My target system is very miniature and has only 512 megabytes of RAM.
Any suggestions on how to implement a method (probably using a database) to search that graph without consuming too much memory? The program is idle most of the time as it is accessing a hardware device, so the path-finding could take about 5 minutes max for the mentioned graph…
Thanks for any thoughts thrown in my direction.
UPDATE:
Just found neo4j. Anybody knows if it would be suitable for this kind of humongous graph?
Your question is a little vague, but in general, a good strategy that mostly follows breadth first semantics while using the same amount of memory as depth-first search is Iterative Deepening. The idea is that you do a depth-first search limited to 1 level at first; if that fails to find a solution, start from scratch and limit it to 2 levels; if that fails, try 3 levels, and so on.
This may seem a bit redundant at first, but since you’re doing a depth-first search, you keep much fewer nodes in memory, and always search one less level than a straightforward breadth-first search. Since the amount of nodes in a level grows exponentially, on larger graphs, it’s very likely that saving that one last extra level pays off for trying all preceding layers redundantly.