After reading about the GAE Datastore API, I am still unsure if I need to duplicate key names and parents as properties for an entity.
Let’s say there are two kinds of entities: Employee and Division. Each employee has a division as its parent, and is identified by an account name. I use the account name as the key name for employees. But when modeling Employee, I would still keep these two as properties:
division = db.ReferenceProperty(Division)
account_name = db.StringProperty()
Obviously I have to manually keep division consistent with its parent, and account_name with its key name. The reasons I am doing this extra work are:
- I am afraid GQL/Datastore API may not support parent and key name as well as normal property. Is there anything I can do about a property but not parent or key name (or are they essentially reference properties)? How do I use key names in GQL queries?
- The meaning of key name and parent is not particularly clear. As the names are not self-descriptive, I have to inform other contributors that we use account name as key name…
But this is really unnecessary work, wasting time and storage space. I cannot get rid of the SQL-thinking that – why doesn’t Google just let us define a property to be the key? and another to be the parent? Then we could name them and use as normal properties…
What’s the best practice here?
Keep in mind that in the GAE Datastore you can never change the parent or key_name of an entity once it has been created. These values are permanent for the life of the entity.
If there is even a small chance that the account_name of an Employee could change then you can not use it as a key_name. If it never changes then it could be a very good key_name and will allow you to do cheap gets for Employees using Employee.get_by_key_name() instead of expensive queries.
Parent is not meant to be equivalent to a foreign key. A better equivalent to a foreign key is a reference property.
The main reason you use parent is so that the parent and child entities are in the same entity group which allows you to operate on them both in a single transaction. If you just need a reference to the division from the Employee then just use a reference property. I suggest getting familiar with how entity groups work as this is very important on GAE data modeling:
Using parent can also cause write performance issues as there is a limit to how quickly you can write to a single entity group (approximately one write per second). When deciding whether to use parent or a reference property you need to think about which entities need to be modified in the same transaction. In many cases you can use Cross Group (XG) transactions instead. It is all about which trade-offs you want to make.
So my suggestions are: