I am trying to import data from speadsheet into a database using Java. There are two ways that I could do this: 1) Read and extract the data from speardsheets and organize them into data structures, such as ArrayLists, Vectors or maps of different objects, so that I could get rid of redundant entries etc, then write the data structures into the database. 2) Extract the data and put them into the database directly as the cells are read and extracted. I think the first way is probably better but would the second way be faster? Any other considerations i should think of?
Thank.
You would want to do a executeBatch() here which is similar to approach #1. So basically you read data from the spread sheet for a batch size (ie. 1000 records) and then you do a commit for transactions a batch at a time to the DB. After that move on to the next batch and so on and so forth. With this approach you utilize database efficiently, save yourself network trips, and also you do not end up hoarding a lot of data in memory which could lead to out of memory exceptions. You should also re-use the same connection and prepared statement objects.
Regarding the data clean up process, you should definitely sanitize your data before putting into a persistent storage such as a table. You may need to generate reports or use the data in other applications in the future, so having clean & well structured tables will help you in the long run. For batch applications, usually the performance requirements are not as high as the transactional systems.
You should also utilize a helper library like apache poi for reading excel documents. As far as the data structure is concerned it will depend on your data, but generally an ArrayList should suffice here.
Another point you might consider is that ypically most ETL tools offer these kinds of data loading tasks out of the box. If your situation allows for it, I highly recommend looking at an ETL tool like Kettle to load the data. You may be able to save yourself some time and learn a new tool.
Hope this helps!