All,
I have to redesign an existing logging system being used in web application. The existing system reads an Excel sheet for records, processes(data validation) it, records the error messages for each entry in the Excel sheet into the database as soon as an error is found and displays the result in the end for all the records. So,
If I have 2 records in the excelsheet, R1 and R2, both fail with 3 validation error each, an insert query is fired 6 times for each validation message and the user sees all the 6 messages in the end of the validation process.
This method worked for smaller set of entries. But for 20,000 records, this obviously has become a bottleneck.
As per my initial redesign approach, following are the options I need suggestion on from everyone at SO:
1> Create a custom logger class with all the required information for logging and for each record in error, store the record ID as key and the Logger class object as value in a HashMap. When all the records are processed completely, perform database inserts for all the records in the HashMap in one shot.
2> Fire SQL inserts periodically i.e. for X records in total, process Y <= X records each time, perform insert operation once. and processing remaining records again.
We really do not have a set criteria at this point except for definitely improving the performance.
Can everyone please provide your feedback as to what would be an efficient logging system design and if there are better approaches than what I mentioned above ?
I would guess your problems are due to the fact you are doing row based operations, rather than set based ?
A set based operation would be the quickest way to load the data. If that is not possible I would go with the insert x records at a time as it is more scalable , inserting them all at once would require ever increasing amounts of memory (but would probably be quicker).
good discussion here on ask tom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1583402705463