I’m writing a program to run a batch process over hundreds of thousands of entities of a few related types. I was originally doing this with a single transaction per persist. This seemed very slow, so I tried doing somewhat naive batch updates in the way described in http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html, with longer transactions and occasional flush+clears. I’m running into a ConstraintViolationException for some of my entity types, because I have unique field constraints. However, I’m unsure of how to check for existing instances; I currently have a criteria to list collisions, but it seems to not return entities that I have saveOrUpdated within the same transaction.
A made-up example may help:
entities Family, Person, Name
Family has many Persons (One to Many)
Persons have many names, different Persons can have the same Name. (Many to Many)
My updates include persisting a Family along with its Persons and Names, but I’m not sure how to dedupe Names (may collide with existing Name in db or another Name in the same update batch). I could just keep track of new entities’ unique constraint fields outside of hibernate, but I thought this is probably not necessary. Is there any built-in way of checking for duplicates both in the db and uncommitted changes? I saw Hibernate batch updates with constraintviolationexception, but I do not savor using exceptions in the normal codepath. Thanks, I appreciate any guidance.
Short answer: no. For batch operations, Hibernate doesn’t keeps track of the generated ids, so, you’d have to go to the database for each
Name, as you’d do a query based on the name, not on the ID, unless you are using some query cache (which would be tricky for your case, I suppose).What I would suggest is to do this in a two-step (three?) process: first, batch-insert all
Nameobjects. Then, load them all using Hibernate itself, storing them on aMap. Then, just persist the other data, linking theNameto the non-persistedPerson. Of course, you’d need as much memory as you have names 🙂 But why are you keepingNameas a separate entity, anyway?