On the database side, I gather that a natural primary key is preferable as long as it’s not prohibitively long, which can cause indexing performance problems. But as I’m reading through projects that use sqlalchemy via google code search, I almost always find something like:
class MyClass(Base):
__tablename__ = 'myclass'
id = Column(Integer, primary_key=True)
If I have a simple class, like a tag, where I only plan to store one value and require uniqueness anyway, what do I gain through a surrogate primary key, when I’m using sqlalchemy? One of the SQL books I’m reading suggests ORM’s are a legitimate use of the ‘antipattern,’ but the ORMs he envisions sound more like ActiveRecord or Django. This comes up a few places in my model, but here’s one:
class Tag(Base):
__tablename__ = 'tag'
id = Column(Integer, primary_key=True) #should I drop this and add primary_key to Tag.tag?
tag = Column(Unicode(25), unique=True)
....
In my broader, relational model, Tag has multiple many-to-many relationships with other objects. So there will be a number of intermediate tables that have to store a longer key. Should I pick tag or id for my primary key?
Although ORMs or programming languages make some usages easier than others, I think that choosing primary key is a database design problem unrelated to ORM. It is more important to get database schema right on its own grounds. Databases tend to live longer than code that accesses them, anyways.
Search SO (and google) for more general questions on how to chose primary key, e.g.: https://stackoverflow.com/search?q=primary+key+natural+surrogate+database-design ( Surrogate vs. natural/business keys, Relational database design question – Surrogate-key or Natural-key?, When not to use surrogate primary keys?, …)
I assume that
Tagtable will not be very large or very dynamic.In this case I would try to use
tagas a primary key, unless there are important reasons to add some invisible to end user primary key, e.g.:poor performance under real world data (measured, not imagined),
frequent changes of tag names (but then, I’d still use some unique string based on first used tag name as key),
invisible behind-the-scenes merging of tags (but, see previous point),
problems with different collations — comparing international data — in your RDBMS (but, …)
…
In general I observed that people tend to err in both directions:
by using complex multi-field “natural” keys (where particular fields are themselves opaque numbers), when table rows have their own identity and would benefit from having their own surrogate IDs,
by introducing random numeric codes for everything, instead of using short meaningful strings.
Meaningful primary key values — if possible — will prove themselves useful when browsing database by hand. You won’t need multiple joins to figure out your data.