I have a large CSV file (5.4GB) of data. It’s a table with 6 columns a lot of rows. I want to import it into MySQL across several tables. Additionally I have to do some transformations to the data before import (e.g. parse a cell, and input the parts into several table values etc.). Now I can either do a script does a transformation and inserts a row at a time but it will take weeks to import the data. I know there is the LOAD DATA INFILE for MySQL but I am not sure how or if I can do the needed transformations in SQL.
Any advice how to proceed?
In my limited experience you won’t want to use the Django ORM for something like this. It will be far too slow. I would write a Python script to operate on the CSV file, using Python’s
csvlibrary. And then use the native MySQL facilityLOAD DATA INFILEto load the data.If the Python script to massage the CSV file is too slow you may consider writing that part in C or C++, assuming you can find a decent CSV library for those languages.