I have a database with two tables that currently has 3 columns each.
Table_A: id, uid, url
Table_B: id, uid, url
The id is the primary key that auto increments by 1 on each new row inserted.
The question I have is do I need a primary key still. I will never query the db for id.
The uid column is simply to separate per user so it’s not unique per row.
Table_A and Table_B will be compared by uid often.
I have uid, url indexed and I expect the table to grow possible in the billions and I don’t want to waste space on a id.
If you use InnoDB, and don’t declare a primary key column, InnoDB will create one for you, using a 6-byte integer. So the only thing you’re gaining by dropping the id column is possibly trading an 8-byte BIGINT for the 6-byte implicit PK column.
The reason is that InnoDB tables are stored as a B-tree, a clustered index based on the primary key. Every table must have a column it uses to organize this B-tree, even if it’s an implicitly created column.
You can also declare a table with a compound primary key:
In this case, the requirement for a primary key is satisfied, and InnoDB creates no implicit column.
Re your comments:
I try not to use MyISAM. MyISAM is more susceptible to data corruption than InnoDB, and usually InnoDB performs better because it caches both data and indexes. It’s true there are some cases where MyISAM can use less disk space, but disk space is cheap and I’d much rather get the benefits of InnoDB.
Regarding indexes, if you have
PRIMARY KEY(uid, url)then you automatically have a compound index over those two columns. No need to create an extra index on uid.But if you have queries that search for url alone, without looking for a specific uid, then you need a separate index on url.
I talk more about index design in my presentation: How to Design Indexes, Really