I have a task that to implement a ‘rollback’ (not the usual rollback) function for a batch of entries from different tables. For example:
def rollback(cursor, entries):
# entries is a dict of such form:
# {'table_name1': [id1, id2, ...], 'table_name2': [id1, id2, ...], ...}
I need to delete entries in each table_name. But because these entries may have relationship between so a bit complex. My idea is in several steps:
- Find out all columns from all tables that are nullable.
- Update all entries set all columns that are nullable to null. After this step there should be no circular depends (if not, i think they can’t be insert into the table)
- Find out their depends and make a topological sort.
- Delete one by one.
My questions are:
- Does the idea make sense?
- Has anyone done something similar before? And how?
- How to query the meta tables for step 3? coz i’m quite new to postgresql.
Any idea and suggestion would be appreciate.
(1) and (2) are not right. It’s quite likely that there will be columns defined
NOT NULL REFERENCES othertable(othercol)– there are in any normal schema.What I think you need to do is to sort the foreign key dependency graph to find an ordering that allows you to
DELETE, table-by-table, the data you need to remove. Be aware that circular dependencies are possible due to deferred foreign key constraints, so you need to demote/ignoreDEFERRABLE INITIALLY DEFERREDconstraints; you can temporarily violate those so long as it’s all consistent again atCOMMITtime.Even then you might run into issues. What if a client used
SET CONSTRAINTSto make aDEFERRABLE INITIALLY IMMEDIATEconstraintDEFERREDduring a transaction? You’d then fail to cope with the circular dependency. To handle this your code must [SET CONSTRAINTS ALL DEFERRED] before proceeding.You will need to look at the
information_schemaor the PostgreSQL-specific system catalogs to work out the dependencies. It might be worth a look at thepg_dumpsource code too, since it tries to order dumped tables to avoid dependency conflicts. You’ll be particularly interested in thepg_constraintcatalog, or its
information_schemaequivalentsinformation_schema.referential_constraints,information_schema.constraint_table_usageandinformation_schema.constraint_column_usage.You can use the either the
information_schemaorpg_catalog. Don’t use both.information_schemais SQL-standard and more portable, but can be slow to query and doesn’t have all the informationpg_catalogcontains. On the flip side,pg_catalog‘s schema isn’t guaranteed to remain compatible across major versions (like 9.1 to 9.2) – though it generally does – and its use isn’t portable.