So here’s my schema (give or take):
cmds.Add(@"CREATE TABLE [Services] ([Id] INTEGER PRIMARY KEY, [AssetId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(@"CREATE INDEX [IX_Services_AssetId] ON [Services] ([AssetId])");
cmds.Add(@"CREATE INDEX [IX_Services_Name] ON [Services] ([Name])");
cmds.Add(@"CREATE TABLE [Telemetry] ([Id] INTEGER PRIMARY KEY, [ServiceId] INTEGER NULL, [Name] TEXT NOT NULL)");
cmds.Add(@"CREATE INDEX [IX_Telemetry_ServiceId] ON [Telemetry] ([ServiceId])");
cmds.Add(@"CREATE INDEX [IX_Telemetry_Name] ON [Telemetry] ([Name])");
cmds.Add(@"CREATE TABLE [Events] ([Id] INTEGER PRIMARY KEY, [TelemetryId] INTEGER NOT NULL, [TimestampTicks] INTEGER NOT NULL, [Value] TEXT NOT NULL)");
cmds.Add(@"CREATE INDEX [IX_Events_TelemetryId] ON [Events] ([TelemetryId])");
cmds.Add(@"CREATE INDEX [IX_Events_TimestampTicks] ON [Events] ([TimestampTicks])");
And here’s my queries with their strange timer results:
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.296402 sys 0.374402
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.062400 sys 0.124801
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 1;
634678974004420000
CPU Time: user 0.000000 sys 0.000000
sqlite> SELECT MIN(e.TimestampTicks) FROM Events e INNER JOIN Telemetry ss ON ss.ID = +e.TelemetryID INNER JOIN Services s ON s.ID = ss.ServiceID WHERE s.AssetID = 2;
634691940264680000
CPU Time: user 0.265202 sys 0.078001
Now I can understand why adding the ‘+’ might change the time, but why is it so inconsistent with the AssetId change? Is there some other index I should create for these MIN queries? There are 900000 rows in the Events table.
Query Plans (first with ‘+’):
0|0|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TimestampTicks (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE Services AS s USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|0|2|SEARCH TABLE Services AS s USING COVERING INDEX IX_Services_AssetId (AssetId=?) (~1 rows)
0|1|1|SEARCH TABLE Telemetry AS ss USING COVERING INDEX IX_Telemetry_ServiceId (ServiceId=?) (~1 rows)
0|2|0|SEARCH TABLE Events AS e USING INDEX IX_Events_TelemetryId (TelemetryId=?) (~1 rows)
EDIT: In summary, given the tables above what indexes would you create if these were the only queries to ever be executed:
SELECT MIN/MAX(e.TimestampTicks) FROM Events e INNER JOIN Telemetry t ON t.ID = e.TelemetryID INNER JOIN Services s ON s.ID = t.ServiceID WHERE s.AssetID = @AssetId;
SELECT e1.* FROM Events e1 INNER JOIN Telemetry t1 ON t1.Id = e1.TelemetryId INNER JOIN Services s1 ON s1.Id = t1.ServiceId WHERE t1.Name = @TelemetryName AND s1.Name = @ServiceName;
SELECT * FROM Events e INNER JOIN Telemetry t ON t.Id = e.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId WHERE s.AssetId = @AssetId AND e.TimestampTicks >= @StartTimeTicks ORDER BY e.TimestampTicks LIMIT 1000;
SELECT e.Id, e.TelemetryId, e.TimestampTicks, e.Value FROM (
SELECT e2.Id AS [Id], MAX(e2.TimestampTicks) as [TimestampTicks]
FROM Events e2 INNER JOIN Telemetry t ON t.Id = e2.TelemetryId INNER JOIN Services s ON s.Id = t.ServiceId
WHERE s.AssetId = @AssetId AND e2.TimestampTicks <= @StartTimeTicks
GROUP BY e2.TelemetryId) AS grp
INNER JOIN Events e ON grp.Id = e.Id;
Brannon,
Regarding time differences with change of AssetID:
Perhaps you’ve already tried this, but have you run each query several times in succession? The memory caching of BOTH your operating system and sqlite will often make a second query much faster than the first run within a session. I would run a given query four times in a row, and see if the 2nd-4th runs are more consistent in timing.
Regarding use of the “+”
(For those who may not know, within a SELECT preceding a field with “+” gives sqlite a hint NOT to use that field’s index in the query. May cause your query to miss results if sqlite has optimized the storage to keep the data ONLY in that index. Suspect this is deprecated.)
Have you run the ANALYZE command? It helps the sqlite optimizer quite a bit when making decisions.
http://sqlite.org/lang_analyze.html
Once your schema is stable and your tables are populated, you may only need to run it once — no need to run it every day.
INDEXED BY
INDEXED BY is a feature the author discourages for typical use, but you might find it helpful in your evaluations.
http://www.sqlite.org/lang_indexedby.html
I’d be interested to know what you discover,
Donald Griggs, Columbia SC USA