We have a number of files generated from a test, with each file having almost 60,000 lines of data. Requirement is to calculate number of parameters with the help of data present in these files. There could be two ways of processing the data :
- Each file is read line-by-line and processed to obtain required parameters
- The file data is bulk copied into the database tables and required parameters are calculated with the help of aggregate functions in the stored procedure.
I was trying to figure out the overheads related to both the methods. As a database is meant to handle such situations, I am concerned with overheads which may be a problem when database grows larger.
Will it affect the retrieval rate from the tables, consequently making the calculations slower? Thus will file processing be a better solution taking into account the database size? Should database partitioning solve the problem for large database?
If you set up indexes correctly you won’t suffer performance issues. Additionally, there is nothing stopping you loading the files into a table and running the calculations and then moving the data into an archive table or deleting it altogether.