I’ve got a greatest-n-per-group problem I’m trying to solve. I’ve been reading through existing solutions, but none seem to match the quirks I’m dealing with.
Scenario: Let’s say there is an oil company, which has a collection of oil wells. Each well has a number of oil tanks with it. Every day, someone takes a reading of each well. On occasion they also take readings of the tanks, however tank readings are much less frequent and may be spread over a number of days.
All well and tank readings are recorded into a database, organized by date.
CREATE TABLE "wellReadings" (
"id" INT PRIMARY AUTO_INCREMENT,
"date" DATETIME,
"wellName" VARCHAR(10),
...
);
CREATE TABLE "tankReadings" (
"id" INT PRIMARY AUTO_INCREMENT,
"date" DATETIME,
"well" INT NOT NULL,
"tankName" VARCHAR(10),
...
);
Problem: For any given well reading (in the wellReadings table) on any given date, I want to find the tank readings (in the tankReadings table) for all of the tanks associated with that well, taken on that same date. If a particular tank has no reading on that date, I want the most recent reading before that date.
So far I’ve been trying to use joins and subquerys, but haven’t been able to narrow the results down to just the most recent tank reading (my test queries keep giving me ALL tank readings that occur on or before the well reading date). A correlated subquery might work, but my DB doesn’t support them (SQLite).
You could always try something like:
This may not be the most efficient way to do it, but it ought to work.
Ps. If it’s possible that some wells might have no past tank readings at all, you may want to use a left join instead: