I need to be able to quickly find n closest destinations for a given destinations, calculate n x n distance matrix for n destinations and several other such operation related to distances between two or more destination.
I have learned a Graph DB will give far better performance compared to a MySQL database. My application is written in PHP.
SO my question is – Is it possible to use Graph DB with a PHP application, If yes then which one is the best option and opensource and how to store this data in graph DB and how would it be accessed.
Thanks in advance.
Neo4j is a very solid graph DB and has flexible (if a bit complex) licensing as well. It implements the Blueprints API and should be pretty easy to use from just about any language, including PHP. It also has a REST API as well, which is about as flexible as it gets, and there is at least one good example of using it from PHP.
Depending on what data you have, there are a number of ways to store it.
If you have “route” data, where your points are already connected to each other via specific paths (ie. you can’t jump from one point directly to another), then you simply make each point a node and the connections you have between points in your routes are edges between nodes, with the distances as properties of those edges. This would give you a graph that looks like your classic “traveling salesman” sort of problem, and calculating distances between nodes is just a matter of doing a weighted breadth-first search (assuming you want shortest path).
If you can jump from place to place with your data set, then you have a fully connected graph. Obviously this is a lot of data, and grows quadratically as you add more destinations, but a graph DB is probably better at dealing with this than a relational DB is. To store the distances, as you add nodes to the graph, you also add an edge to each other existing node with the distance pre-calculated as one of it’s properties. Then, to retrieve the distances between a pair of nodes, you simply find the edge between them and get it’s distance property.
However, if you have a large number of fully-connected nodes, you would probably be better off just storing the coordinates of those nodes and calculating the distances as-needed, and optionally caching the results to speed things up.
Lastly, if you use the Blueprints API and the other tools in that stack, like Gremlin and Rexter, you should be able to swap in/out any compatible graph database, which lets you play around with different implementations that may meet your needs better, like using Titan on top of a Cassandra / Hadoop cluster.