Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8007141
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T17:44:00+00:00 2026-06-04T17:44:00+00:00

A few ‘quick questions’: what types/classes of algorithms can be recast in the MapReduce

  • 0

A few ‘quick questions’:

  • what types/classes of algorithms can be recast in the MapReduce paradigm? (eg k-means has a MR implementation)

  • Are there any that can’t be expressed in this way?

  • What algorithm characteristics make them less attractive/complex to be recast in MR paradigm

Thanks in advance for any help.

Max.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T17:44:01+00:00Added an answer on June 4, 2026 at 5:44 pm

    I am working through these same questions for a collection of big data algorithms that come from the MPI world. Here is my take.

    The basic pipeline for MR formulations appears to be an expansion/contraction. The map is applied to a large set, possibly creating an even larger set, and then the reduce is used to sort/organize that set so that it can be aggregated into a consolidated data set, preferably much smaller. The number of maps and reduces you need is the cleverness of the MR algorithm.

    As a general computational approach you can solve any computational problem with MR, but from a practical point of view, the resource utilization of MR is skewed in favor of computational problems that have high concurrent I/O requirements. Embarrassingly parallel algorithms like word counting would certainly fit that bill, but it is broader than that, for example, your k-means algorithm is a constraint minimization problem, which nobody would categorize as embarrassingly parallel, but still has an efficient MR formulation.

    My current formal framework characterizes a distributed computer system in terms of five attributes:

    1. processor performance
    2. memory capacity (we can ignore memory performance as it tends to be architected in by the processor designers to support the processor’s performance)
    3. disk storage capacity
    4. network bandwidth performance
    5. network messaging latency

    Disk performance is something I am still struggling with to cleanly incorporate as rotational vs SSD storage technologies have huge performance implications but only if SSDs are integrated via PCIe. If the are integrated via SAS or SATA then you hit the interface limit, and rotational can easily saturate that interface as well. In that case, only the superb latency of SSD will aid in the performance improvement, but that only benefits smaller data sets with smaller data records. So for the moment, let’s assume we have a true big data problem and need rotational storage to contain the data cost effectively.

    MapReduce uses the above list of distributed resources in that expansion/contraction flow: it uses processor+memory+disk to execute the map functions, and then leans heavily on the performance of the network for the reduce function. As adding servers will scale the processor+memory+disk resource, unfortunately, network performance only modestly increases in capacity but decreases in latency performance. Since network latency is a very difficult performance characteristic to minimize in a distributed system, MR algorithms are most effective for bandwidth-centric operators: that is, algorithms that have billions of little packets that are independent. The commutative and associative attributes Nishant highlights are a perfect summary to identify that class of algorithms as ordering requirements among these packets are greatly simplified and thus simple queuing operators will be sufficient.

    I am looking for insights in whether or not there exist efficient MR algorithms for PDE solvers and optimization algorithms, such as integer programming. Found a great graphic from the folks that are doing FutureGrid:
    Domains of Algorithmic Organization

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Few days back I put a question regarding mapping two classes Message and MessageStatusHistory
A few questions here, regarding letcc that is used in The Seasoned Schemer. (define
Few questions on Pipe and Filter. In this example, as illustrated on the image
Few questions on ORM mappers like nhibernate (for .net/c# environment). When queries are run
Few of my friend said that traditional way of programming can conflict. For an
Few questions on installing: When installing basic4android, why is it recommended to use API
A few days ago I has asked a question on SO about helping me
Few years ago I found an implementation of the Singleton pattern in Python by
Few facts first: 1. I can only use ActionScript 2. 2. All files are
Few questions: I'm using CI and JQuery AJAX. In my code below, I assemble

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.