I have a system with two processes, one of which does a single insert, and the other a bulk insert. Obviously the second process is faster, and I’m working on migrating the first process to a bulk insert mechanism, but I was stumped this morning by a question from a colleague about “why bulk insert would be faster than single inserts”.
So indeed, why is bulk insert faster than single insert?
Also, are there differences between bulk and single inserts in MySQL and HBase, given that their database architectures are completely different? I am using both for my project, and am wondering if there are differences in the bulk and single inserts for these two databases.
As far as i know, this depends on the
Hbaseconfiguration also. Normally a bulk insert would mean usage ofList of Putstogether, in this case, the insert ( calledflushingin habse layer) is done automatically when you calltable.put. Single inserts might wait for any other insert call so as to do a batch flush in the middle layer. However this will depend on the configuration also.Another reason may be the easiness of task, its more efficient Map and Reduce, if you have more jobs at a time. The migration of file chunks are decided for all inputs single time. But in indvidual inserts, this becomes a crucial point.