I am creating a very large array. Rather than having this array stored in

Question

0

Asked: June 15, 20262026-06-15T20:28:48+00:00 2026-06-15T20:28:48+00:00

I am creating a very large array. Rather than having this array stored in

0

I am creating a very large array. Rather than having this array stored in memory, I want to be able to write it to a file. This needs to be in a format I can later import.

I would use pickle but it appears pickle is used for completed file structures.

In the following example, I need a way for the out variable to be a file rather than a memory stored object:

out = []
for x in y:
    z = []
    #get lots of data into z
    out.append(z)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T20:28:49+00:00

Take a look at streaming-pickle.

streaming-pickle allows you to save/load a sequence of Python data structures to/from disk in a streaming (incremental) manner, thus using far less memory than regular pickle.

It’s actually just a single file with three short methods. I added a snippet with an example:

try:
    from cPickle import dumps, loads
except ImportError:
    from pickle import dumps, loads


def s_dump(iterable_to_pickle, file_obj):
    """ dump contents of an iterable iterable_to_pickle to file_obj, a file
    opened in write mode """
    for elt in iterable_to_pickle:
        s_dump_elt(elt, file_obj)

def s_dump_elt(elt_to_pickle, file_obj):
    """ dumps one element to file_obj, a file opened in write mode """
    pickled_elt_str = dumps(elt_to_pickle)
    file_obj.write(pickled_elt_str)
    # record separator is a blank line
    # (since pickled_elt_str might contain its own newlines)
    file_obj.write('\n\n')

def s_load(file_obj):
    """ load contents from file_obj, returning a generator that yields one
        element at a time """
    cur_elt = []
    for line in file_obj:
        cur_elt.append(line)

        if line == '\n':
            pickled_elt_str = ''.join(cur_elt)
            elt = loads(pickled_elt_str)
            cur_elt = []
            yield elt

Here’s how you could use it:

from __future__ import print_function
import os
import sys

if __name__ == '__main__':
    if os.path.exists('obj.serialized'):
        # load a file 'obj.serialized' from disk and 
        # spool through iterable      
        with open('obj.serialized', 'r') as handle:
            _generator = s_load(handle)
            for element in _generator:
                print(element)
    else:
        # or create it first, otherwise
        with open('obj.serialized', 'w') as handle:
            for i in xrange(100000):
                s_dump_elt({'i' : i}, handle)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am creating a very large array. Rather than having this array stored in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply