I have found a few similar questions here in Stack Overflow, but I believe

Question

0

Asked: June 8, 20262026-06-08T23:03:03+00:00 2026-06-08T23:03:03+00:00

I have found a few similar questions here in Stack Overflow, but I believe

0

I have found a few similar questions here in Stack Overflow, but I believe I could benefit from advice specific for my case.

I must store around 80 thousand lists of real valued numbers in a file and read them back later.

First, I tried cPickle, but the reading time wasn’t appealing:

>>> stmt = """
with open('pickled-data.dat') as f:
    data = cPickle.load(f)
"""
>>> timeit.timeit(stmt, 'import cPickle', number=1)
3.8195440769195557

Then I found out that storing the numbers as plain text allows faster reading (makes sense, since cPickle must worry about a lot of things):

>>> stmt = """
data = []
with open('text-data.dat') as f:
    for line in f:
        data.append([float(x) for x in line.split()])
"""
>>> timeit.timeit(stmt, number=1)
1.712096929550171

This is a good improvement, but I think I could still optimize it somehow, since programs written in other languages can read similar data from files considerably faster.

Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T23:03:04+00:00

If numpy arrays are workable, numpy.fromfile will likely be the fastest option to read the files (here’s a somewhat related question I asked just a couple days ago)

Alternatively, it seems like you could do a little better with struct, though I haven’t tested it:

import struct
def write_data(f,data):
    f.write(struct.pack('i',len()))
    for lst in data:
        f.write(struct.pack('i%df'%len(lst),len(lst),*lst))

def read_data(f):
    def read_record(f):
        nelem = struct.unpack('i',f.read(4))[0]
        return list(struct.unpack('%df'%nelem,f.read(nelem*4))) #if tuples are Ok, remove the `list`.

    nrec = struct.unpack('i',f.read(4))[0]
    return [ read_record(f) for i in range(nrec) ]

This assumes that storing the data as 4-byte floats is good enough. If you want a real double precision number, change the format statements from f to d and change nelem*4 to nelem*8. There might be some minor portability issues here (endianness and sizeof datatypes for example).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have found a few similar questions here in Stack Overflow, but I believe

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply