My PostgreSQL database contains a table to store instances of a registered entity. This table is populated via spreadsheet upload. A web interface allows an operator to modify the information presented. However, the original data is not modified. All changes are stored in a separate table changes with the columns unique_id, column_name, value and updated_at.
Once changes are made, they are presented to the operator by first querying the original table and then querying the change table (using instance ID and the latest change date, grouped by column name). The two results are merged in PHP and presented on the web interface. This is a rather rigid way of going about the task and I would like to keep all logic within SQL.
I can easily select the latest changes for the table using the following query:
SELECT fltr_chg.unique_id, fltr_chg.column_name, chg_val.value
FROM changes AS chg_val
JOIN (
SELECT chg_rec.unique_id, chg_rec.column_name, MAX( chg_rec.updated_at )
FROM information_schema.columns AS source
JOIN changes AS chg_rec ON source.table_name = 'instances'
AND source.column_name = chg_rec.column_name
GROUP BY chg_rec.unique_id, chg_rec.column_name
) AS fltr_chg ON fltr_chg.unique_id = chg_val.unique_id
AND fltr_chg.column_name = chg_val.column_name;
And selecting the entries from the instances table is just as easy:
SELECT * FROM instances;
Now, if there was only a way of transforming the former result and substituting the resulting values into the latter, based on the unique_id and column_name, and still retaining the result as a table, the problem would be solved. Is this possible to do?
I am sure that this is not the rarest of the problems and most likely, some systems do keep track of changes to the data in a similar way. How do they apply them back to the data if not through one of the the above described ways (current and sought solutions)?
Assuming Postgres 9.1 or later.
I simplified / optimized your basic query to retrieve the latest values:
I query the PostgreSQL catalog instead of the standard information schema because that is faster. Note the special cast to
::regclass.Now, that gives you a table. You want all values for one
unique_idin a row.To achieve that you have basically three options:
One subselect (or join) per column. Expensive and unwieldy. But a valid option for only a few columns.
A big
CASEstatement.A pivot function. PostgreSQL provides the
crosstab()function in the additional moduletablefuncfor that.Basic instructions:
Basic pivot table with
crosstab()I completely rewrote the function:
I separated the catalog lookup from the value query, as the
crosstab()function with two parameters provides column names separately. Missing values (no entry in changes) are substituted withNULLautomatically. A perfect match for this use case!Assuming that
attnamematchescolumn_name. Excludingunique_id, which plays a special role.Full automation
Addressing your comment: There is a way to supply the column definition list automatically. It’s not for the faint of heart, though.
I use a number of advanced Postgres features here:
crosstab(), plpgsql function with dynamic SQL, composite type handling, advanced dollar quoting, catalog lookup, aggregate function, window function, object identifier type, …Test environment:
Automated function for one table
The table
instancesis hard-coded, schema qualified to be unambiguous. Note the use of the table type as return type. There is a row type registered automatically for every table in PostgreSQL. This is bound to match the return type of thecrosstab()function.This binds the function to the type of the table:
DROPthe tableALTER TABLE. You have to recreate it (without changes). I consider this a bug in 9.1.ALTER TABLEshouldn’t silently break the function, but raise an error.This performs very well.
Call:
Note how
col1isNULLhere.Use in a query to display an instance with its latest values:
Full automation for any table
(Added 2016. This is dynamite.)
Requires Postgres 9.1 or later. (Could be made out to work with pg 8.4, but I didn’t bother to backpatch.)
Call (providing the table type with
NULL::public.instances:Related: