So I’ve read all the RMDB vs BigTable debates
I tried to model a simple game class using BigTable concepts.
Goals : Provide very fast reads and considerably easy writes
Scenario: I have 500,000 user entities in my User model. My user sees a user statistics at the top of his/her game page (think of a status bar like in Mafia Wars), so everywhere he/she goes in the game, the stats get refreshed.
Since it gets called so frequently, why don’t I model my User around that fact?
Code:
# simple User class for a game
class User(db.Model):
username = db.StringProperty()
total_attack = db.IntegerProperty()
unit_1_amount = db.IntegerProperty()
unit_1_attack = db.IntegerProperty(default=10)
unit_2_amount = db.IntegerProperty()
unit_2_attack = db.IntegerProperty(default=20)
unit_3_amount = db.IntegerProperty()
unit_3_attack = db.IntegerProperty(default=50)
def calculate_total_attack(self):
self.total_attack = self.unit_1_attack * self.unit_1_amount + \
self.unit_2_attack * self.unit_2_amount + \
self.unit_3_attack * self.unit_3_amount + \
here’s how I’m approaching it ( feel free to comment/critique)
Advantages:
1. Everything is in one big table
2. No need to use ReferenceProperty, no MANY-TO-MANY relationships
3. Updates are very easily done : Just get the user entity by keyname
4. It’s easy to transfer queried entity to the templating engine.
Disadvantages:
1. If I have 100 different units with different capabilities (attack,defense,dexterity,magic,etc), then i’ll have a very HUGE table.
2. If I have to change a value of a certain attack unit, then I’m going to have to go through all 500,000 user entities to change every one of them. ( maybe a cron job/task queue will help)
Each entity will have a size of 5-10 kb ( btw how do I check how large is an entity once I’ve uploaded them to the production server? ).
So I’m counting on the fact that disk space at App Engine is cheap, and I need to minimize the amount of datastore API calls. And I’ll try to memcache the entity for a period of time.
In essence, everything here goes against RMDB
Would love to hear your thoughts/ideas/experiences.
First a simple answer to “how do I know how big an entity is?”: Once you’ve got some data in your app on the app engine servers, you can go to your app’s console and click the ‘Datastore statistics’ link. That will give you some basic stats on your entities, like how much space each Kind is using, what property types are using the most disk space, etc. I don’t think you can drill down to the level of one particular User however.
Now here are some thoughts on your design. It is worth it to create a separate table for your Units. Even if you end up with a few hundred units, it will be easy to keep them all in memcache, so looking up the details of each unit will be negligible. It will cost you a few extra API calls to initially populate memcache with a unit’s info the first time it is used, but after that you will be saving a good amount of CPU cycles by not having to fetch the details of each unit from the database,and saving huge amounts of API calls when you need to update a unit (which you have already realized will be very expensive) In addition, each User object will use less disk space if it only needs a reference to a Unit entity rather than holding all the details itself. (Of course this depends on the amount of info you need to store about each unit, but you did mention that eventually you will be storing lots of stats for each unit)
If you do have a separate table for Units, it will also allow you to keep your User object more flexible. Instead of needing a specific field for each unit, you could just have a list of refernces to units. That way, if you add a unit type, you would not have to modify your User kind.