I have a table looking like this created with sqlite.
CREATE TABLE Cars (
POWER DOUBLE ,
CAPACITY DOUBLE,
SPEED DOUBLE,
TIME INTEGER NOT NULL,
TYPE INTEGER NOT NULL,
MODEL INTEGER NOT NULL,
PRIMARY KEY ( TIME, TYPE, MODEL ));
There are 15 different values of TYPE, and each type have 20 different values of MODEL.
For every model there is inserted a new record every 10th second.
A little example:
POWER----TIME----TYPE----MODEL
45.6 2588 3 14
46.8 2588 3 15
44.7 2588 3 16
This table is really huge with millions of rows.
As you can see my primary key is (TIME, TYPE, MODEL) because that is making a unique identifier.
My application runs a select query several times which can take a really long time when when the time range is large, or if I run the query for several models.
For example I run this type of query quite often:
SELECT power, time, type, model
FROM CARS
WHERE type = 3 AND model = 14 AND time BETWEEN 2588 and 13550;
I have tried to experiment with a primary key like (TYPE, MODEL, TIME) which has increased the performance for some situations, but not over a large time interval.
My question is how I can optimize this retrieval of records, and what primary key that seems to be optimal for this situation?
Insertions and updates are not an issue in terms of performance.
The order of fields in your primary key should reflect how selective each one is going to be (most selective first).
Superficially, time should appear to come first, since selecting on a specific time would return fewer records than a specific type or model.
However, if most or all of your queries are going to select a range of times, then it would be better to have the time at the end of the primary key, since range selection is less selective than specific values.
I suggest changing the primary key to be (model, type, time) – in that order.