Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8860869
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T15:21:38+00:00 2026-06-14T15:21:38+00:00

( Sometimes our host is wrong; nanoseconds matter ;) I have a Python Twisted

  • 0

(Sometimes our host is wrong; nanoseconds matter 😉

I have a Python Twisted server that talks to some Java servers and profiling shows spending ~30% of its runtime in the JSON encoder/decoder; its job is handling thousands of messages per second.

This talk by youtube raises interesting applicable points:

  • Serialization formats – no matter which one you use, they are all
    expensive. Measure. Don’t use pickle. Not a good choice. Found
    protocol buffers slow. They wrote their own BSON implementation which
    is 10-15 time faster than the one you can download.

  • You have to measure. Vitess swapped out one its protocols for an HTTP
    implementation. Even though it was in C it was slow. So they ripped
    out HTTP and did a direct socket call using python and that was 8%
    cheaper on global CPU. The enveloping for HTTP is really expensive.

  • Measurement. In Python measurement is like reading tea leaves.
    There’s a lot of things in Python that are counter intuitive, like
    the cost of grabage colleciton. Most of chunks of their apps spend
    their time serializing. Profiling serialization is very depending on
    what you are putting in. Serializing ints is very different than
    serializing big blobs.

Anyway, I control both the Python and Java ends of my message-passing API and can pick a different serialisation than JSON.

My messages look like:

  • a variable number of longs; anywhere between 1 and 10K of them
  • and two already-UTF8 text strings; both between 1 and 3KB

Because I am reading them from a socket, I want libraries that can cope gracefully with streams – its irritating if it doesn’t tell me how much of a buffer it consumed, for example.

The other end of this stream is a Java server, of course; I don’t want to pick something that is great for the Python end but moves problems to the Java end e.g. performance or torturous or flaky API.

I will obviously be doing my own profiling. I ask here in the hope you describe approaches I wouldn’t think of e.g. using struct and what the fastest kind of strings/buffers are.

Some simple test code gives surprising results:

import time, random, struct, json, sys, pickle, cPickle, marshal, array

def encode_json_1(*args):
    return json.dumps(args)

def encode_json_2(longs,str1,str2):
    return json.dumps({"longs":longs,"str1":str1,"str2":str2})

def encode_pickle(*args):
    return pickle.dumps(args)

def encode_cPickle(*args):
    return cPickle.dumps(args)

def encode_marshal(*args):
    return marshal.dumps(args)

def encode_struct_1(longs,str1,str2):
    return struct.pack(">iii%dq"%len(longs),len(longs),len(str1),len(str2),*longs)+str1+str2

def decode_struct_1(s):
    i, j, k = struct.unpack(">iii",s[:12])
    assert len(s) == 3*4 + 8*i + j + k, (len(s),3*4 + 8*i + j + k)
    longs = struct.unpack(">%dq"%i,s[12:12+i*8])
    str1 = s[12+i*8:12+i*8+j]
    str2 = s[12+i*8+j:]
    return (longs,str1,str2)

struct_header_2 = struct.Struct(">iii")

def encode_struct_2(longs,str1,str2):
    return "".join((
        struct_header_2.pack(len(longs),len(str1),len(str2)),
        array.array("L",longs).tostring(),
        str1,
        str2))

def decode_struct_2(s):
    i, j, k = struct_header_2.unpack(s[:12])
    assert len(s) == 3*4 + 8*i + j + k, (len(s),3*4 + 8*i + j + k)
    longs = array.array("L")
    longs.fromstring(s[12:12+i*8])
    str1 = s[12+i*8:12+i*8+j]
    str2 = s[12+i*8+j:]
    return (longs,str1,str2)

def encode_ujson(*args):
    return ujson.dumps(args)

def encode_msgpack(*args):
    return msgpacker.pack(args)

def decode_msgpack(s):
    msgunpacker.feed(s)
    return msgunpacker.unpack()

def encode_bson(longs,str1,str2):
    return bson.dumps({"longs":longs,"str1":str1,"str2":str2})

def from_dict(d):
    return [d["longs"],d["str1"],d["str2"]]

tests = [ #(encode,decode,massage_for_check)
    (encode_struct_1,decode_struct_1,None),
    (encode_struct_2,decode_struct_2,None),
    (encode_json_1,json.loads,None),
    (encode_json_2,json.loads,from_dict),
    (encode_pickle,pickle.loads,None),
    (encode_cPickle,cPickle.loads,None),
    (encode_marshal,marshal.loads,None)]

try:
    import ujson
    tests.append((encode_ujson,ujson.loads,None))
except ImportError:
    print "no ujson support installed"

try:
    import msgpack
    msgpacker = msgpack.Packer()
    msgunpacker = msgpack.Unpacker()
    tests.append((encode_msgpack,decode_msgpack,None))
except ImportError:
    print "no msgpack support installed"

try:
    import bson
    tests.append((encode_bson,bson.loads,from_dict))
except ImportError:
    print "no BSON support installed"

longs = [i for i in xrange(10000)]
str1 = "1"*5000
str2 = "2"*5000

random.seed(1)
encode_data = [[
        longs[:random.randint(2,len(longs))],
        str1[:random.randint(2,len(str1))],
        str2[:random.randint(2,len(str2))]] for i in xrange(1000)]

for encoder,decoder,massage_before_check in tests:
    # do the encoding
    start = time.time()
    encoded = [encoder(i,j,k) for i,j,k in encode_data]
    encoding = time.time()
    print encoder.__name__, "encoding took %0.4f,"%(encoding-start),
    sys.stdout.flush()
    # do the decoding
    decoded = [decoder(e) for e in encoded]
    decoding = time.time()
    print "decoding %0.4f"%(decoding-encoding)
    sys.stdout.flush()
    # check it
    if massage_before_check:
        decoded = [massage_before_check(d) for d in decoded]
    for i,((longs_a,str1_a,str2_a),(longs_b,str1_b,str2_b)) in enumerate(zip(encode_data,decoded)):
        assert longs_a == list(longs_b), (i,longs_a,longs_b)
        assert str1_a == str1_b, (i,str1_a,str1_b)
        assert str2_a == str2_b, (i,str2_a,str2_b)

gives:

encode_struct_1 encoding took 0.4486, decoding 0.3313
encode_struct_2 encoding took 0.3202, decoding 0.1082
encode_json_1 encoding took 0.6333, decoding 0.6718
encode_json_2 encoding took 0.5740, decoding 0.8362
encode_pickle encoding took 8.1587, decoding 9.5980
encode_cPickle encoding took 1.1246, decoding 1.4436
encode_marshal encoding took 0.1144, decoding 0.3541
encode_ujson encoding took 0.2768, decoding 0.4773
encode_msgpack encoding took 0.1386, decoding 0.2374
encode_bson encoding took 55.5861, decoding 29.3953

bson, msgpack and ujson all installed via easy_install

I would love to be shown I’m doing it wrong; that I should be using cStringIO interfaces or however else you speed it all up!

There must be a way to serialise this data that is an order of magnitude faster surely?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T15:21:40+00:00Added an answer on June 14, 2026 at 3:21 pm

    In the end, we chose to use msgpack.

    If you go with JSON, your choice of library on Python and Java is critical to performance:

    On Java, http://blog.juicehub.com/2012/11/20/benchmarking-web-frameworks-for-games/ says:

    Performance was absolutely atrocious until we swapped out the JSON Lib (json-simple) for Jackon’s ObjectMapper. This brought RPS for 35 to 300+ – a 10x increase

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We have some EDI files coming in to our BizTalk server that we drop
Our application is sometimes timing out, we have a Java client connecting to a
We've got our servers running on CentOS and our Java backend sometimes has to
While using Team Foundation Server sometimes we have to delete our projects entirely and
Sometimes we receive stack traces from our customer with wrong line numbers. It happens
Our build workflows are a bit involved in the sense they sometimes have to
We have soap services running on our unix box (local network with AFS). Sometimes
I have a problem with our Jetty application server. Since yesterday, we have a
We use ERPConnect in our ASP.NET application but sometimes we get some error messages
Well, we have PHP 5.2.15 on our servers and we really like our language

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.