I have a table widget_events that records event_what events occurring to
widget widget_id on date event_when. It’s possible for the same event to
occur multiple times to the same widget on the same day. For this reason,
column event_id is used as primary key to distinguish such rows. Here is
the table declaration:
CREATE TABLE widget_events
(
event_id int4 UNIQUE NOT NULL,
event_when date NOT NULL,
event_what text NOT NULL,
widget_id int4 REFERENCES widgets (widget_id) NOT NULL,
PRIMARY KEY (event_id)
);
The client application processes events in batches, where each batch consists
of all events for one widget on one date. However, the application has no
previous knowledge of which widgets and dates are stored in widget_events.
One possible solution is to start by selecting one random row from
widget_events (using SQL’s LIMIT), and then do another query for all
rows with the same widget_id and widget_when. After this batch is
processed, those rows can be deleted from widget_events, and we go back
to the first step. The algorithm stops when the first step reports that
there is no more random row to return.
My question is whether there is a faster, more elegant way to do this.
Is it possible in SQL (in particular the SQL understood by PostgreSQL)
to return each distinct batch in a single query?
To select distinct batches:
Or you could pick up a single batch in one query, like: