I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):
execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...
There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).
I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.
I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).
Feed the data into a blackhole
Note that this is a
blackholetable, so the data is going nowhere.However you can create a trigger on the blackhole table, something like this.
And pass it on using a trigger
Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.
Disable index updates to speed things up
Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that’s not affected)
If you have any unique keys and you don’t want MySQL to check for them during the mass-insert, make sure you do an
alter tableto eliminate the unique key and enable it afterwards.Note that the
alter tableto put the unique key back in will take a long time.