i have a text file that holds a 2-dimensional matrix. it looks like the following.
01 02 03 04 05
06 07 08 09 10
11 12 13 14 15
16 17 18 19 20
as you can see, each row is delimited by a new line and each column is delimited by a space. i need to transpose this matrix in an efficient way.
01 06 11 16
02 07 12 17
03 08 04 05
04 09 14 19
05 10 15 20
in reality, the matrix is 10,000 by 14,000. the individual elements are double/float. it would be costly, if not impossible, to attempt to transpose this file/matrix all in memory.
does anyone know of a util API to do something like this or an efficient approach?
what i have tried: my naive approach has been to create a temporary file for each column (of the transposed matrix). so, with 10,000 rows, i will have 10,000 temporary files. when i read each line, i tokenize each value, and append the value to the corresponding file. so with the example above, i will have something like the following.
file-0: 01 06 11 16
file-1: 02 07 12 17
file-3: 03 08 13 18
file-4: 04 09 14 19
file-5: 05 10 15 20
i then read each file back in and append them into one file. i wonder if there’s a smarter way because i know the file i/o operations will be a pain point.
Solution with minimal memory consumption and extremely low performance: