Sorry another newbie query 😐 To build upon the suggestion which was given here,
optimizing
I need to be able to incrementally build a dictionary i.e. one key: value at a time inside a for loop. To be specific, the dictionary would look something like (N keys, with each value being a list of lists. The smaller inner list has 3 elements):
dic_score ={key1:[ [,,], [,,], [,,] ...[,,] ], key2:[ [,,], [,,], [,,] ..[,,] ] ..keyN:[[,,], [,,], [,,] ..[,,]]}
This dic is being generated from the following paradigm, a nested for loop.
for Gnodes in G.nodes() # Gnodes iterates over 10000 values
Gvalue = someoperation(Gnodes)
for Hnodes in H.nodes() # Hnodes iterates over 10000 values
Hvalue =someoperation(Hnodes)
score = SomeOperation on (Gvalue,Hvalue)
dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])
I then need to sort these lists, but the answer for that was given here, optimizing (use of generator expression in place of the inner loop is an option)
[Note that the dic would contain 10000 keys with each key associated with a 10000 elements of smaller lists]
Since the loop counters are big, the dictionary generated is huge and I am running out of memory.
How can I write the write the Key:value (a list of lists) as soon as it is generated to a file, so that I don’t need to hold the entire dictionary in memory. I then want to be able to read back the dictionary in the same format i.e. something like dic_score_after_reading[key], returns me the list of list I am looking for.
I am hopping that doing this writing and reading per key:value would considerably ease the memory requirements. Is there a better data structure to do this? Shall I be considering a database , probably like Buzhug, which would give me the flexibility to access and iterate over lists associated with each key ?
I am currently using cPickle to dump the entire dictionary and then reading it back via load(), but cPickle crashes while dumping such a big data in one go.
Apologies, but I am unaware of the best practices to do this type of stuff. Thanks !
You could look into using the ZODB in combination with the included
BTreesimplementation.What that gives is a mapping-like structure that writes individual entries separately to the object store. You’d need to use savepoints or plain transactions to flush data out to the storage, but you can handle huge amounts of data this way.