I’m trying to do a benchmark on the scalability of a graph db. As of now I have data that is being generated in multiple tracks for loading into the database. I also formulated some complex queries which do run successfully when called explicitly on the generated data.
However, I’m clueless about the next step, which is to write a script for the benchmark. This will invoke the queries and run on the database application. The data generator is a small program written in Java. Can anyone give me an idea how to go about this?
How to invoke the databases depends on how they are writtena nd what their API is. Not sure there is one way to do it all. In the case of http://www.neo4j.org, it is written in Java, so it can be invoked from any JVM language, see http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded.html, or via REST as a standalone server (which of course skews performance numbers), see http://docs.neo4j.org/chunked/snapshot/rest-api.html
There is an EU project underway to try and set an independent benchmarking council, however, that project will likely take a few years.
/peter