I’m doing some indexing and memory is sufficient but CPU isn’t. So I have one huge dictionary and then a smaller dictionary I’m merging into the bigger one:
big_dict = {"the" : {"1" : 1, "2" : 1, "3" : 1, "4" : 1, "5" : 1}}
smaller_dict = {"the" : {"6" : 1, "7" : 1}}
#after merging
resulting_dict = {"the" : {"1" : 1, "2" : 1, "3" : 1, "4" : 1, "5" : 1, "6" : 1, "7" : 1}}
My question is for the values in both dicts, should I use a dict (as displayed above) or list (as displayed below) when my priority is to use as much memory as possible to gain the most out of my CPU?
For clarification, using a list would look like:
big_dict = {"the" : [1, 2, 3, 4, 5]}
smaller_dict = {"the" : [6,7]}
#after merging
resulting_dict = {"the" : [1, 2, 3, 4, 5, 6, 7]}
Side note: The reason I’m using a dict nested into a dict rather than a set nested in a dict is because JSON won’t let me do json.dumps because a set isn’t key/value pairs, it’s (as far as the JSON library is concerned) {“a”, “series”, “of”, “keys”}
Also, after choosing between using dict to a list, how would I go about implementing the most efficient, in terms of CPU, method of merging them?
I appreciate the help.
Hmmm. I would first go for a dict-of-dicts approach, as Python has one of the most fine-tuned dict implementation, so I highly doubt you can get any better with using a dict-of-lists.
As for merging the dicts, this should be enough:
I would probably also experiment with subclassing
json.JSONEncoderto be able to serialize set types:This latter method might add some overhead on the serialization side, however, and you will also need to convert these dicts to sets upon deserialization, either by subclassing
json.JSONDecoderor doing it yourself in an extra step.