Suppose I have a CSV file with 1M email addresses. I need to iterate through the file and add each entry, for example:
with open(file) as csv:
for item in csv:
Email.objects.create(email=item)
This seems like it would be very slow going through the django ORM like this to create 1M objects and insert them into the db. Is there a better way than this, or should I go away from django for this task and do it directly with the db?
Besides
bulk_create, you could put all inserts into one transaction as long as your DB backend supports it:Also note that
bulk_createtreats items w/ same values to be same, thusactually creates one row instead of two
Because of more SQLs turnaround, the transaction solution is still slower than the
bulk_createone, but you don’t have to create all one millionEmail()instances in memory (generator seems not work here)Furthermore, you could do it in SQL-level directly