I’m new to Python,
Do someone know what’s relationships between Python (and functional languages’) functions map() / reduce() and MapReduce concept related to distributed computations?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The cloud concept of map/reduce is very similar, but changed to work in parallel. First, each data object is passed through a function that
maps it to a new object (usually, some sort of dictionary). Then, areducefunction is called on pairs of the objects returned bymapuntil there is only one left. That is the result of the map/reduce operation.One important consideration is that, because of the parallelization, the
reducefunction must be able to take in objects from themapfunction as well as objects from priorreducefunctions. This makes more sense when you think about how the parallelization goes. Many machines will each reduce their data to a single object, and those objects will then be reduced to a final output. Of course, this may happen in more than one layer if there is a lot of data.Here’s a simple example of how you might use the map/reduce framework to count words in a list:
The map function would look like this:
And the reduce function would look like this:
Then you can map/reduce like this:
But you can also do it like this (which is what parallelization would do):