Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 637179
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T20:35:45+00:00 2026-05-13T20:35:45+00:00

I currently have a Python application where newline-terminated ASCII strings are being transmitted to

  • 0

I currently have a Python application where newline-terminated ASCII strings are being transmitted to me via a TCP/IP socket. I have a high data rate of these strings and I need to parse them as quickly as possible. Currently, the strings are being transmitted as CSV and if the data rate is high enough, my Python application starts to lag behind the input data rate (probably not all that surprising).

The strings look something like this:

chan,2007-07-13T23:24:40.143,0,0188878425-079,0,0,True,S-4001,UNSIGNED_INT,name1,module1,...

I have a corresponding object that will parse these strings and store all of the data into an object. Currently the object looks something like this:

class ChanVal(object):
    def __init__(self, csvString=None,**kwargs):

        if csvString is not None:
            self.parseFromCsv(csvString)

        for key in kwargs:
                setattr(self,key,kwargs[key])

    def parseFromCsv(self, csvString):

        lst = csvString.split(',')

        self.eventTime=lst[1]
        self.eventTimeExact=long(lst[2])
        self.other_clock=lst[3]
        ...

To read the data in from the socket, I’m using a basic “socket.socket(socket.AF_INET,socket.SOCK_STREAM)” (my app is the server socket) and then I’m using the “select.poll()” object from the “select” module to constantly poll the socket for new input using its “poll(…)” method.

I have some control over the process sending the data (meaning I can get the sender to change the format), but it would be really convenient if we could speed up the ASCII processing enough to not have to use fixed-width or binary formats for the data.

So up until now, here are the things I’ve tried and haven’t really made much of a difference:

  1. Using the string “split” method and then indexing the list of results directly (see above), but “split” seems to be really slow.
  2. Using the “reader” object in the “csv” module to parse the strings
  3. Changing the strings being sent to a string format that I can use to directly instantiate an object via “eval” (e.g. sending something like “ChanVal(eventTime=’2007-07-13T23:24:40.143′,eventTimeExact=0,…)”)

I’m trying to avoid going to a fixed-width or binary format, though I realize those would probably ultimately be much faster.

Ultimately, I’m open to suggestions on better ways to poll the socket, better ways to format/parse the data (though hopefully we can stick with ASCII) or anything else you can think of.

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T20:35:45+00:00Added an answer on May 13, 2026 at 8:35 pm

    You can’t make Python faster. But you can make your Python application faster.

    Principle 1: Do Less.

    You can’t do less input parsing over all but you can do less input parsing in the process that’s also reading the socket and doing everything else with the data.

    Generally, do this.

    Break your application into a pipeline of discrete steps.

    1. Read the socket, break into fields, create a named tuple, write the tuple to a pipe with something like pickle.

    2. Read a pipe (with pickle) to construct the named tuple, do some processing, write to another pipe.

    3. Read a pipe, do some processing, write to a file or something.

    Each of these three processes, connected with OS pipes, runs concurrently. That means that the first process is reading the socket and make tuples while the second process is consuming tuples and doing calculations while the third process is doing calculations and writing a file.

    This kind of pipeline maximizes what your CPU can do. Without too many painful tricks.

    Reading and writing to pipes is trivial, since linux assures you that sys.stdin and sys.stdout will be pipes when the shell creates the pipeline.

    Before doing anything else, break your program into pipeline stages.

    proc1.py

    import cPickle
    from collections import namedtuple
    
    ChanVal= namedtuple( 'ChanVal', ['eventTime','eventTimeExact', 'other_clock', ... ] )
    for line socket:
        c= ChanVal( **line.split(',') )
        cPickle.dump( sys.stdout )
    

    proc2.py

    import cPickle
    from collections import namedtuple
    ChanVal= namedtuple( 'ChanVal', ['eventTime','eventTimeExact', 'other_clock', ... ] )
    while True:
        item = cPickle.load( sys.stdin )
        # processing
        cPickle.dump( sys.stdout )
    

    This idea of processing namedtuples through a pipeline is very scalable.

    python proc1.py | python proc2.py
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 366k
  • Answers 366k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer After setting the row into the istringstream... separate.str(row); ... reset… May 14, 2026 at 4:37 pm
  • Editorial Team
    Editorial Team added an answer 4 years, just 2 active days? Get it and make… May 14, 2026 at 4:37 pm
  • Editorial Team
    Editorial Team added an answer "Each is a base64-encoded 128-bit (presumably random) number, with the… May 14, 2026 at 4:37 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.