I have a system which breaks a large taks into small tasks using about 30 threads as a time. As each individual thread finishes it persists its calculated results to the database. What I want to achieve is to have each thread pass its results to a new persisance class that will perform a type of double buffering and data persistance while running in its own thread.
For example, after 100 threads have moved their data to the buffer the persistance class then the persistance class swaps the buffers and persists all 100 entries to the database. This would allow utilization of prepared statements and thus cut way down on the I/O between the program and the database.
Is there a pattern or good example of this type of multithreading double buffering?
I’ve seen this pattern referred to as asynchronous database writing or the write behind pattern. It’s a typical pattern supported by the distributed cache products (Teracotta, Coherence, GigaSpaces, …) because you don’t want your cache updates to also include writing the change to the underlying database.
The complexity of this pattern depends on your tolerance for lost database updates. Because of the delay between completing the work and writing the result to the database, you can lose the updates due to bugs, power failures, … (you get the picture).
I’d suggest some sort of queue for the completed results to be written to the DB and then process them in batches of 100 (using your example) OR after an amount of time. The reason for also using a time delay is to cope with result sets that aren’t divisible by 100.
If you have no requirements for resilience/durability, then you can do all this in the same process. If, however, you can’t tolerate any loss, then you can replace the in-vm queue with a persistent JMS queue (slower but safer).