I’m working with CSV transaction data files that are 350 mb+ and 1,100,000+ lines each.
I was wondering how can I perform some simple fast queries on these files through the VBA IDE, save the CSV, and then open the result as a workbook in Excel.
For example, I want to do this:
- Load the CSV into RAM as a table
- Remove all rows where the field called transaction_type is recorded as “failed”
- Save the result as a new CSV
- Open the result as a workbook in Excel
My goal is to do this operation with the highest performance possible. I think that this functionality is provided by the Extensible Storage Engine (ESE), but I’m not sure how to use it through the Excel VBA IDE.
Thanks!
You could use a ‘text database’ and use ADO (or DAO) to query the files. See this article for more information: http://msdn.microsoft.com/en-us/library/ms974559.aspx
That way you can just create a schema.ini file for the file you wish to query and query the file using standard SQL. You would then simply write your result recordset to file.