Hallo,
I am writing a database application that does a lot of inserts and updates with fake serialisable isolation level (snapshot isolation).
To not do tonnes of network roundtrips I’m batching inserts and updates in one transaction with PreparedStatements. They should fail very seldom because the inserts are prechecked and nearly conflict free to other transactions, so rollbacks don’t occur often.
Having big transactions should be good for WAL, because it can flush big chunks and doesn’t have to flush for mini transactions.
1.) I can only see positive effects of a big transaction. But I often read that they are bad. Why could they be bad in my use case?
2.) Is the checking for conflicts so expensive when the local snapshots are merged back into the real database? The database will have to compare all write sets of possible conflicts (parallel transaction). Or does it do some high speed shortcut? Or is that quite cheap anyways?
[EDIT] It might be interesting if someone could bring some clarity into how a snapshot isolation database checks if transaction, which have overlapping parts on the timeline, are checked for disjunct write sets. Because that’s what fake serializable isolation level is all about.
The real issues here are two fold. The first possible problem is bloat. Large transactions can result in a lot of dead tuples showing up at once. The other possible problem is from long running transactions. As long as a long running transaction is running, the tables it’s touching can’t be vacuumed so can collect lots of dead tuples as well.
I’d say just use check_postgresql.pl to check for bloating issues. As long as you don’t see a lot of table bloat after your long transactions you’re ok.