I have a Python application that, to be brief, receives data from a remote server, processes it, responds to the server, and occasionally saves the processed data to disk. The problem I’ve encountered is that there is a lot of data to write, and the save process can take upwards of half a minute. This is apparently a blocking operation, so the network IO is stalled during this time. I’d like to be able to make the save operation take place in the background, so-to-speak, so that the application can continue to communicate with the server reasonably quickly.
I know that I probably need some kind of threading module to accomplish this, but I can’t tell what the differences are between thread, threading, multiprocessing, and the various other options. Does anybody know what I’m looking for?
Since you’re I/O bound, then use the
threadingmodule.You should almost never need to use
thread, it’s a low-level interface; thethreadingmodule is a high-level interface wrapper forthread.The
multiprocessingmodule is different from the threading module,multiprocessinguses multiple subprocesses to execute a task;multiprocessingjust happens to use the same interface asthreadingto reduce learning curve.multiprocessingis typically used when you have CPU bound calculation, and need to avoid the GIL (Global Interpreter Lock) in a multicore CPU.A somewhat more esoteric alternative to multi-threading is asynchronous I/O using
asyncoremodule. Another options includes Stackless Python and Twisted.