I plan do program a simple data flow framework, which basically consists of lazy method calls of objects. If I ever consider distributed programming, what is the easiest way to enable that in Python? Any transparent solution without me doing network programming?
Or for a start, how can I make use of multi-core processors in Python?
Can be anything at all really, so let’s break it down:
Simple Let-Me-Call-That-Function (RPC)
Well lucky you! python has the one of greatest implementations of Remote Procedure Calls:
RPyC.
Just run the server (double click a file, see the tutorial),
Open an interpreter and:
And a lazy version (async):
Simple Data Distribution
You have a defined unit of work, say a complex image manipulation.
What you do is roughly create
Node(s), which does the actual work (aka, take an image, do the manipulation, and return the result), someone who collect the results (aSink) and someone who create the work (theDistributor).Take a look at Celery.
If it’s very small scale, or if you just want to play with it, see the
Poolobject in themultiprocessingpackage:And the truly-lazy version:
Which returns a Result object which can be inspected for results.
Complex Data Distribution
Some multi-level more-than-just-fire&forget complex data manipulation, or a multi-step processing use case.
In such case, you should use a Message Broker such as ZeroMQ or RabbitMQ.
They allow to you send ‘messages’ across multiple servers with great ease.
They save you from the horrors of the TCP land, but they are a bit more complex (some, like RabbitMQ, require a separate process/server for the Broker). However, they give you much more fine-grained control over the flow of data, and help you build a truly scalable application.
Lazy-Anything
While not data-distribution per se, It is the hottest trend in web server back-ends: use ‘green’ threads (or events, or coroutines) to delegate IO heavy tasks to a dedicated thread, while the application code is busy maxing-out the CPU.
I like Eventlet a lot, and gevent is another option.