I am writing a multidimensional array data to a text file. I am doing this row wise. The size of the file keeps growing. what techniques should I follow to get the least possible size for the output file?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
If your array has many zeros you can use sparse matrix representation: instead of writing the whole matrix to the file, only write the nonzero elements (of course, you need to write each element with its indices, one by one). Suppose you want to write this matrix:
You can write this to the file:
In each line, the first number is the row, the second is the column and the third is the stored value.
If you are writing the file as text, you can switch to binary format: when you write text, you will use a byte for each digit; in binary, you use fixed amount of bytes per number, and wouldn’t have to represent spaces and newlines:
Writing the numbers
100 200 300to a file takes 11 bytes if you use text format. But they may be written using 6 bytes if you write three 16-bit integers. In Python, use “wb” and “rb” modes for opening binary files, then write them as bytes:Or — more efficiently,
Otherwise, then you should probably try compressing the data structure, using standard techniques. Since you tagged your question with
python, you will probably be interested in these Python libraries for data compressionThere is also this nice introduction to data compression, a bit heavy on the theoretical side, in case you would like to know more about it.