I’m seeing terrible performance using the GAE datastore on both the dev server and the production server. I have the following simplified model:
class Team(db.Model):
name = db.StringProperty()
# + 1 other property
# home_games from Game
# away_games from Game
class Game(db.Model):
date = db.DateProperty()
year = db.IntegerProperty()
home_team = db.ReferenceProperty(Team, collection_name='home_games')
away_team = db.ReferenceProperty(Team, collection_name='away_games')
# + 4 other properties
# results from TeamResults
class TeamResults(db.Model):
game = db.ReferenceProperty(Game, collection_name='results')
location = db.StringProperty(choices=('home', 'away'))
score = db.IntegerProperty()
# + 17 other properties
I only have one index, on Game year and date. Inserting a small dataset of 478 teams and 786 games took about 50 seconds. A simple query:
games = Game.all()
games.filter('year = ', 2000)
games.order('date')
for game in games:
for result in game.results:
# do something with the result
took about 45 seconds.
I’m moving from SQLite-based data storage, and the above query on a much larger dataset takes a fraction of a second. Is my data just modeled poorly? Is Datastore just this slow?
Edit 1
To give a little more background, I’m inserting data from a user-uploaded file. The file is uploaded into the blobstore, then I use csv.reader to parse it. This happens periodically, and queries are run based on cron jobs.
your problem is that you insert these records one by one
you need to use batch inserts, see https://developers.google.com/appengine/docs/python/tools/uploadingdata
Or you may want to insert list of records, as described in documentation:
https://developers.google.com/appengine/docs/python/datastore/entities#Batch_Operations