I have a code that involves building a very large dictionary from pre-calculated data (i.e. with many lookups). It uses four-integer tuples as keys and lists of tuples as items. This dictionary is then updated with the appended function for each key (so that each item could be sorted at a later stage). The above process is repeated thousands of times but the original (base) dictionary stays the same. The original dictionary is roughly the same size as the updates (but it varies).
I tried using deepcopy and in order to use the original dictionary as “a base” for updating. The copying turned out to be an order of magnitude slower than rebuilding-the entire dictionary from scratch.
If this is unclear, perhaps this simplified code will make more sense:
print timeit.timeit('''
for iteration in xrange(10):
base_dictionary = {(month, day, hour): [(value, 'some_data_name' + str(value)) for value in xrange(10)]
for month in xrange(5)
for day in xrange(5)
for hour in xrange(5)
}
for valuenumber in xrange(10):
for id_set in base_dictionary:
base_dictionary[id_set].append((valuenumber, 'some_data_name' + str(valuenumber)))
'''
,
'''
''', number=100)
RESULT: 1.30800844321 seconds
print timeit.timeit('''
for iteration in xrange(10):
new_dictionary = deepcopy(base_dictionary)
for valuenumber in xrange(10):
for id_set in new_dictionary:
new_dictionary[id_set].append((valuenumber, 'some_data_name' + str(valuenumber)))
'''
,
'''
from copy import deepcopy
base_dictionary = {(month, day, hour): [(value, 'some_data_name' + str(value)) for value in xrange(10)]
for month in xrange(5)
for day in xrange(5)
for hour in xrange(5)
}
''', number=100)
RESULT: 13.8005886255 seconds
It feels very wasteful to rebuild the same dictionary every time I run an iteration. Is there a way to accelerate this process?
You need something between shallow copy and deep copy. The tuples don’t need to be copied because they are immutable.