I’m developing a high score web service for my game, and it’s running on Google App Engine.
My game has 5 difficulties, so I originally had 5 boards with entries for each (player_login, score and time). If the player submitted a lower score than the previously scored, it got dismissed, so only the highest score is kept for each player.
But to add more fun into this, I’d decided to include daily/weekly/monthly/yearly high score tables. So I’ve created 5 boards for each difficulty, making it 25 boards. When a score is submitted, it’s saved into each board, and the boards are supposed to be cleared on every day/week/month/year.
This happens by a cron job that is invoked and deletes all entries from a specific board.
Here comes the problem: it looks like deleting entries from the datastore is slow. From my test daily cleanups it looks like deleting a single entry takes around 200 ms.
In the worst-case scenario, if the game would be quite popular and would have, say, 100 000 players, and each of them would have an entry in the yearly board, it would take 100 000 * 0.012 seconds = 12 000 seconds (3 hours!!) to clear that board. I think we are allowed to have jobs of up to 30 seconds in App Engine, so this wouldn’t work.
I’m deleting with following code (thanks to Nick Johnson):
q = Score.all(keys_only=True).filter('b = ',boardToClear)
results = q.fetch(500)
while results:
self.response.out.write("deleting one batch;")
db.delete(results)
q = Score.all(keys_only=True).filter('b = ',boardToClear).with_cursor(q.cursor())
results = q.fetch(500)
What do you recommend me to do with this problem?
One approach that comes to my mind is to use a task queue and delete older scores than that are permitted in each board, i.e. which have expired, but in smaller quantities. This way I wouldn’t hit the CPU limit for one task, but the cleanup would not be (nearly) instantaneous, so my 12 000 seconds long cleanup would be split into 1 200 tasks, each roughly 10 seconds long.
But I think that there is something that I’m doing wrong, this kind of operation would be a lot faster when done in relational database. Possibly something is wrong with my approach to the datastore and scoring, because being locked in RDBMS mindset.
First, a couple of small suggestions:
keys_onlyquery and then calldb.delete()on an entire list of keys at once.These may not fundamentally address your problem, though. I think there’s no way to get around the fact that deleting a large number of records (hundreds of thousands, say), will take some time. I’m not sure that this is as big a problem for your use case though, as I can see a couple of techniques that would help.