I’ve been investigating the appengine to see if I can use it for a
project and while trying to choose between Python and Java, I ran into
a surprising difference in datastore query performance: medium to
large datastore queries are more than 3 times slower in Python than in
Java.
My question is: is this performance difference for datastore queries
(Python 3x slower than Java) normal, or am I doing something wrong in
my Python code that’s messing with the numbers?
My entity looks like this:
Person
firstname (length 8)
lastname (length 8)
address (20)
city (10)
state (2)
zip (5)
I populate the datastore with 2000 Person records, with each field
exactly the length noted here, all filled with random data and with no
fields indexed (just so the inserts go faster).
I then query 1k Person records from Python (no filters, no ordering):
q = datastore.Query("Person")
objects = list(q.Get(1000))
And 1k Person records from Java (likewise no filters, no ordering):
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query q = new Query("Person");
PreparedQuery pq = ds.prepare(q);
// Force the query to run and return objects so we can be sure
// we've timed a full query.
List<Entity> entityList = new ArrayList<Entity>(pq.asList(withLimit(1000)));
With this code, the Java code returns results in ~200ms; the Python
code takes much longer, averaging >700ms. Both apps are on the same
app id (with different versions), so they use the same datastore and should
be on a level playing field.
All my code is available here, in case I’ve missed any details:
This would be an expected difference between Python and Java. Most likely you aren’t seeing differences in the amount of time to make the query, but the amount of time it takes to parse the result and fill the receiving data structure.
You can test this by comparing the time it takes to query a single record. Remember that you’ll need to test several times and average the total to get a true benchmark to account for possible fluctuations in latency on the backend.
In general, you can expect a
compiledstatically typed language like Java or Scala to always be faster than aninterpreted languagedynamically typed language like Ruby or Python.