I’m working on processing a large csv file and I found this article about batch import: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/. I tried to do the same, but it seems to have no effect.
Should the instances be viewable in the database after each flushing? Because now there is either 0 or all of the entites when I try to query ‘SELECT COUNT(*) FROM TABLE1’, so it looks like the instances are commited all at once.
Then I also noticed that the import works quickly when importing for the first time to the blank table, but when the table is full and the entity should be either updated or saved as new, the whole process is enormously slow. It’s mainly because of the memory not being cleaned and decreases to 1MB or less and the app gets stuck. So is it because of not flushing the session?
My code for importing is here:
public void saveAll(List<MedicalInstrument> listMedicalInstruments) {
log.info("start saving")
for (int i = 0; i < listMedicalInstruments.size() - 1; i++) {
def medicalInstrument = listMedicalInstruments.get(i)
def persistedMedicalInstrument = MedicalInstrument.findByCode(medicalInstrument.code)
if (persistedMedicalInstrument) {
persistedMedicalInstrument.properties = medicalInstrument.properties
persistedMedicalInstrument.save()
} else {
medicalInstrument.save()
}
if ((i + 1) % 100 == 0) {
cleanUpGorm()
if ((i + 1) % 1000 == 0) {
log.info("saved ${i} entities")
}
}
}
cleanUpGorm()
}
protected void cleanUpGorm() {
log.info("cleanin GORM")
def session = sessionFactory.currentSession
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
Thank you very much for any help!
Regards,
Lojza
P.S.: my JVM memory has 252.81 MB in total, but it’s only testing environment for me and 3 other people.
From my experience they still won’t necessarily be visible in the database until they are all committed at the end. I’m using Oracle and the only way I’ve been able to get them to commit in batches (so visible in the database) is by creating separate transactions for each batch and close it out after the flush. That, however, resulted in errors at the end of the process on the final flush. I didn’t have time to figure it out, and using the process above I NEVER had issues no matter how large the data load was.
Using this method still helps tremendously – it really does flush and clear the session. You can watch your memory utilization to see that.
As far as updating a table with records do you have indexes on the table? Sometimes the indexes will slow down mass insert/updates like this because the database is trying to keep the index fresh. Maybe disable the index before the import/update and then enable when it is done?