Currently I am building a sample application which is supposed to read a moderately huge excel sheet (10-15MB) then select a few columns and create as many text files as the number of rows in the excel sheet , but containing only the columns selected.
e.g For clarity.
Suppose my xl sheet contains 5 columns Name,empid,email, mobileNo:and address and total rows or records is 50000. Now I want to create 50000 text files each containing Name,mobileNo: and the emailId only
Since the size of the excel sheet was huge and the excel sheet can be xlsx or xls I am using apache poi to read it . But I am unable to decide any best approach to be followed after this.
Approach 1: Should I try to move the entire excel sheet to a data base like mysql. So then i would have to create a table on the fly based on the header columns retrieved and dump all the rows in it. And then using a select query I can get the columns necessary and create the text files
Approach 2: I read about ASM library which can create class files on the fly,. I was thinking of creating an object per row and add everything to a list. But here retrieving the particular column will involve lot of iterations and the size of the list will be proportional to the rows in the excel sheet, which will be huge.
Currently I am unable to decide on any approach. I know for sure that both approaches suck 🙁 . Any advice on how to proceed would be of immense help to me.
Why don’t you just create an in-memory datastructure that holds the information content of the spread-sheet and work from that.
It could be something as simple as a list of arrays of strings, where each array represents a row. To deal with the column names, use a hashmap that maps column names to column numbers.
Your approach 1 is over-kill … unless the spreadsheet is too big to store in memory.
Your approach 2 is unnecessarily complicated. Creating class files on the fly doesn’t achieve anything that can’t be achieved with a simple general-purpose data structure.