I’ve been using Hibernate to store a parent-child relationship using @OneToMany with an @JoinColumn for some time, it has worked great.
But now I’ve reached a point where the total size of the objects is just too big to fit in memory. (e.g. there are 3 million child records now). The records are all stored in a file, then parsed into Java objects before being hibernated.
I’d like to “chunk” or “batch” the records so that I’ll only need to read a fraction of them into memory at a time. My approach is something like “load collection of 10,000 children objects, persist to database (calling ‘update’ on parent obj), empty out children collection to free up RAM, repeat”.
I want this to work like:
Iteration 1: Chunk1 (records 1-10,000) stored
Iteration 2: Chunk2 (records 10,001-20,000) stored
Iteration 3: Chunk3 (records 20,001-30,000) stored
etc
Here’s where I’m having trouble. The collection I’m saving changes with each iteration, which causes hibernate to drop all the old children before saving the new children. Instead of getting all my chunks saved, I end up with
Iteration 1: Chunk1 stored
Iteration 2: Chunk1 objects deleted, Chunk 2 stored
Iteration 3: Chunk2 objects deleted, Chunk 3 stored
etc
So in the end, only my final chunk is saved.
Is there any way to change this behavior? I have read about JDBC batching but that’s not quite what I’m looking for. I’ve also tried storing each Child separately, instead of via an “update” to the parent, but when I do this the Child records are persisted without a pointer to their parent.
Update:
Thanks for the speedy and terrific response. The relationship is not bidirectional — I will try to make it that way. I have legacy code that won’t cooperate with schema changes so am a little constrained.
Thanks
The main question here is: is your relationship bi-directional? That is, do you have a
@ManyToOneon the child side pointing back to parent?If you do, that relationship needs to be owned by the child side:
When set up this way, you do NOT need to load all (or any, for that matter) children in your parent’s collection – you can instead load (or create) children, set parent on their end and save them; you can certainly do that in batches.
If your relationship is not bi-directional then based on your question it’s owned by the parent – you’ll then need to make it bi-directional as shown above (or uni-directional, but opposite to what it is now – see below).
On a somewhat separate note, do you even need a
@OneToManyon parent side here? With 3 million children records I can’t really imagine where it’d be useful.