Using VBA, I need to “unpivot” data that is currently in delimited text files (as hundreds of columns by tens of thousands of rows) into a normalized form (four columns by millions of rows); that is, the resulting table will comprise columns that, for each cell:
- identify the original table/file;
- identify the cell’s row in the original table;
- identify the cell’s column in the original table;
- contain the value of that cell.
I would generally be grateful for any thoughts on how one can efficiently accomplish this task.
So far, I have considered using ADODB to construct a SELECT INTO ... UNION ... query that builds the output table, but the default text file providers are sadly limited to 255 columns (are there any which aren’t?).
Sébastien Lorion has built a terrific Fast CSV Reader, which I would love to use, but I do not know how to use it from within VBA – grateful for any thoughts (I don’t think it has been compiled to export COM interfaces, and I don’t have the tools to recompile it). For that matter, Microsoft also provide a TextFieldParser class, but again I do not know if/how this can be used from VBA.
Another approach might be to have Excel >=2007 open the source file and construct the output table from there, but that intuitively ‘feels’ as though it will incur considerable wasted overhead…
I decided to build a tiny COM-aware wrapper around
TextFieldParserin VB.NET. Not ideal, but the best I can come up with at present.