I have a python dictionary of JSON serialized values.
I want to add to these serialized strings without first doing loads(...), then later doing dumps(...) – so I ‘fiddle’ with the serialized values:
currently I have:
for key, value in my_dict.items():
# creating JSON of additional data I want in the JSON string
extra = dumps({ 'key1': 3, 'key2': 1 }, default=str)
# cutting the last '}' from the end off 'value', the '{' and '}' from the
# start and end of 'extra', and then concatting them together.
my_dict[key] = '%s,%s' % (value[:-1], extra[1:])
I am doing this because I consider the dumps and loads a waste, but my current method is not very pythonic.
Is there a better method?
Note: the ‘extra’ values are from a different source to the initial JSON values, and cannot be inserted at the point where the original data was serialized.
time differences when using a dict of ~20 JSON blobs:
- fiddling: 0.0005 seconds
- json>py>json: 0.0025 seconds
5 times quicker
and for fun with 20,000:
- fiddling’: 0.333
- json>py>json: 0.813
over 60% quicker
with 200,000:
- fiddling’: 4.5
- json>py>json: 10.25
over 60% quicker
The Pythonic way would be to parse the JSON string, modify the values then serialize it. JSON is very quick to parse, much faster than the standard pickle/unpickle functions, and will probably not slow you down unless you have enormous amounts of data (tens of thousands of lines). Don’t fall into the trap of optimizing prematurely.
In any case, you should always write your application in a nice, Pythonic and readable fashion, then (if necessary!) optimize the slow parts of your code later.
Another method of optimization could be to write the relevant code in C, or use a C library for JSON serialization. Take a look at ultrajson or take a look at this answer, which explains how the standard library
simplejsoncan be much faster than thejsonmodule you are using.