Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 339931
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T10:34:08+00:00 2026-05-12T10:34:08+00:00

I wanted to compare the performance characteristics of immutable.Map and mutable.Map in Scala for

  • 0

I wanted to compare the performance characteristics of immutable.Map and mutable.Map in Scala for a similar operation (namely, merging many maps into a single one. See this question). I have what appear to be similar implementations for both mutable and immutable maps (see below).

As a test, I generated a List containing 1,000,000 single-item Map[Int, Int] and passed this list into the functions I was testing. With sufficient memory, the results were unsurprising: ~1200ms for mutable.Map, ~1800ms for immutable.Map, and ~750ms for an imperative implementation using mutable.Map — not sure what accounts for the huge difference there, but feel free to comment on that, too.

What did surprise me a bit, perhaps because I’m being a bit thick, is that with the default run configuration in IntelliJ 8.1, both mutable implementations hit an OutOfMemoryError, but the immutable collection did not. The immutable test did run to completion, but it did so very slowly — it takes about 28 seconds. When I increased the max JVM memory (to about 200MB, not sure where the threshold is), I got the results above.

Anyway, here’s what I really want to know:

Why do the mutable implementations run out of memory, but the immutable implementation does not? I suspect that the immutable version allows the garbage collector to run and free up memory before the mutable implementations do — and all of those garbage collections explain the slowness of the immutable low-memory run — but I’d like a more detailed explanation than that.

Implementations below. (Note: I don’t claim that these are the best implementations possible. Feel free to suggest improvements.)

  def mergeMaps[A,B](func: (B,B) => B)(listOfMaps: List[Map[A,B]]): Map[A,B] =
    (Map[A,B]() /: (for (m <- listOfMaps; kv <-m) yield kv)) { (acc, kv) =>
      acc + (if (acc.contains(kv._1)) kv._1 -> func(acc(kv._1), kv._2) else kv)
    }

  def mergeMutableMaps[A,B](func: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A,B] =
    (mutable.Map[A,B]() /: (for (m <- listOfMaps; kv <- m) yield kv)) { (acc, kv) =>
      acc + (if (acc.contains(kv._1)) kv._1 -> func(acc(kv._1), kv._2) else kv)
    }

  def mergeMutableImperative[A,B](func: (B,B) => B)(listOfMaps: List[mutable.Map[A,B]]): mutable.Map[A,B] = {
    val toReturn = mutable.Map[A,B]()
    for (m <- listOfMaps; kv <- m) {
      if (toReturn contains kv._1) {
        toReturn(kv._1) = func(toReturn(kv._1), kv._2)
      } else {
        toReturn(kv._1) = kv._2
      }
    }
    toReturn
  }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T10:34:08+00:00Added an answer on May 12, 2026 at 10:34 am

    Well, it really depends on what the actual type of Map you are using. Probably HashMap. Now, mutable structures like that gain performance by pre-allocating memory it expects to use. You are joining one million maps, so the final map is bound to be somewhat big. Let’s see how these key/values get added:

    protected def addEntry(e: Entry) { 
      val h = index(elemHashCode(e.key)) 
      e.next = table(h).asInstanceOf[Entry] 
      table(h) = e 
      tableSize = tableSize + 1 
      if (tableSize > threshold) 
        resize(2 * table.length) 
    } 
    

    See the 2 * in the resize line? The mutable HashMap grows by doubling each time it runs out of space, while the immutable one is pretty conservative in memory usage (though existing keys will usually occupy twice the space when updated).

    Now, as for other performance problems, you are creating a list of keys and values in the first two versions. That means that, before you join any maps, you already have each Tuple2 (the key/value pairs) in memory twice! Plus the overhead of List, which is small, but we are talking about more than one million elements times the overhead.

    You may want to use a projection, which avoids that. Unfortunately, projection is based on Stream, which isn’t very reliable for our purposes on Scala 2.7.x. Still, try this instead:

    for (m <- listOfMaps.projection; kv <- m) yield kv
    

    A Stream doesn’t compute a value until it is needed. The garbage collector ought to collect the unused elements as well, as long as you don’t keep a reference to the Stream‘s head, which seems to be the case in your algorithm.

    EDIT

    Complementing, a for/yield comprehension takes one or more collections and return a new collection. As often as it makes sense, the returning collection is of the same type as the original collection. So, for example, in the following code, the for-comprehension creates a new list, which is then stored inside l2. It is not val l2 = which creates the new list, but the for-comprehension.

    val l = List(1,2,3)
    val l2 = for (e <- l) yield e*2
    

    Now, let’s look at the code being used in the first two algorithms (minus the mutable keyword):

    (Map[A,B]() /: (for (m <- listOfMaps; kv <-m) yield kv)) 
    

    The foldLeft operator, here written with its /: synonymous, will be invoked on the object returned by the for-comprehension. Remember that a : at the end of an operator inverts the order of the object and the parameters.

    Now, let’s consider what object is this, on which foldLeft is being called. The first generator in this for-comprehension is m <- listOfMaps. We know that listOfMaps is a collection of type List[X], where X isn’t really relevant here. The result of a for-comprehension on a List is always another List. The other generators aren’t relevant.

    So, you take this List, get all the key/values inside each Map which is a component of this List, and make a new List with all of that. That’s why you are duplicating everything you have.

    (in fact, it’s even worse than that, because each generator creates a new collection; the collections created by the second generator are just the size of each element of listOfMaps though, and are immediately discarded after use)

    The next question — actually, the first one, but it was easier to invert the answer — is how the use of projection helps.

    When you call projection on a List, it returns new object, of type Stream (on Scala 2.7.x). At first you may think this will only make things worse, because you’ll now have three copies of the List, instead of a single one. But a Stream is not pre-computed. It is lazily computed.

    What that means is that the resulting object, the Stream, isn’t a copy of the List, but, rather, a function that can be used to compute the Stream when required. Once computed, the result will be kept so that it doesn’t need to be computed again.

    Also, map, flatMap and filter of a Stream all return a new Stream, which means you can chain them all together without making a single copy of the List which created them. Since for-comprehensions with yield use these very functions, the use of Stream inside the prevent unnecessary copies of data.

    Now, suppose you wrote something like this:

    val kvs = for (m <- listOfMaps.projection; kv <-m) yield kv
    (Map[A,B]() /: kvs) { ... }
    

    In this case you aren’t gaining anything. After assigning the Stream to kvs, the data hasn’t been copied yet. Once the second line is executed, though, kvs will have computed each of its elements, and, therefore, will hold a complete copy of the data.

    Now consider the original form::

    (Map[A,B]() /: (for (m <- listOfMaps.projection; kv <-m) yield kv)) 
    

    In this case, the Stream is used at the same time it is computed. Let’s briefly look at how foldLeft for a Stream is defined:

    override final def foldLeft[B](z: B)(f: (B, A) => B): B = { 
      if (isEmpty) z 
      else tail.foldLeft(f(z, head))(f) 
    } 
    

    If the Stream is empty, just return the accumulator. Otherwise, compute a new accumulator (f(z, head)) and then pass it and the function to the tail of the Stream.

    Once f(z, head) has executed, though, there will be no remaining reference to the head. Or, in other words, nothing anywhere in the program will be pointing to the head of the Stream, and that means the garbage collector can collect it, thus freeing memory.

    The end result is that each element produced by the for-comprehension will exist just briefly, while you use it to compute the accumulator. And this is how you save keeping a copy of your whole data.

    Finally, there is the question of why the third algorithm does not benefit from it. Well, the third algorithm does not use yield, so no copy of any data whatsoever is being made. In this case, using projection only adds an indirection layer.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I just wanted to compare different solutions used when implementing ACLs in Rails.
Wanted to convert <br/> <br/> <br/> <br/> <br/> into <br/>
Compared to most people on this site I am admittedly a novice. I wanted
Wanted to get some consensus around a UI feature I'm working on right now.
I wanted some of those spiffy rounded corners for a web project that I'm
I wanted to show the users Name Address (see www.ipchicken.com ), but the only
I wanted to emulate a popular flash game, Chrontron, in C++ and needed some
I wanted to generate one fix view using interface builder, but the size of
Just wanted to get an idea for ways (web) developers get round the short
I wanted to do something like this: <asp:Label ID=lblMyLabel onclick=lblMyLabel_Click runat=server>My Label</asp:Label> I know

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.