I am trying to implement a simple wordcount in scala using an immutable map(this

Question

0

Asked: June 2, 20262026-06-02T01:34:42+00:00 2026-06-02T01:34:42+00:00

I am trying to implement a simple wordcount in scala using an immutable map(this

0

I am trying to implement a simple wordcount in scala using an immutable map(this is intentional) and the way I am trying to accomplish it is as follows:

Create an empty immutable map
Create a scanner that reads through the file.
While the scanner.hasNext() is true:
- Check if the Map contains the word, if it doesn’t contain the word, initialize the count to zero
- Create a new entry with the key=word and the value=count+1
- Update the map
At the end of the iteration, the map is populated with all the values.

My code is as follows:

val wordMap = Map.empty[String,Int]
val input = new java.util.scanner(new java.io.File("textfile.txt"))
while(input.hasNext()){
  val token = input.next()
  val currentCount = wordMap.getOrElse(token,0) + 1
  val wordMap = wordMap + (token,currentCount)
}

The ides is that wordMap will have all the wordCounts at the end of the iteration…
Whenever I try to run this snippet, I get the following exception

recursive value wordMap needs type.

Can somebody point out why I am getting this exception and what can I do to remedy it?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T01:34:43+00:00

val wordMap = wordMap + (token,currentCount)

This line is redefining an already-defined variable. If you want to do this, you need to define wordMap with var and then just use

wordMap = wordMap + (token,currentCount)

Though how about this instead?:

io.Source.fromFile("textfile.txt")            // read from the file
  .getLines.flatMap{ line =>                  // for each line
     line.split("\\s+")                       // split the line into tokens
       .groupBy(identity).mapValues(_.size)   // count each token in the line
  }                                           // this produces an iterator of token counts
  .toStream                                   // make a Stream so we can groupBy
  .groupBy(_._1).mapValues(_.map(_._2).sum)   // combine all the per-line counts
  .toList

Note that the per-line pre-aggregation is used to try and reduce the memory required. Counting across the entire file at once might be too big.

If your file is really massive, I would suggest using doing this in parallel (since word counting is trivial to parallelize) using either Scala’s parallel collections or Hadoop (using one of the cool Scala Hadoop wrappers like Scrunch or Scoobi).

EDIT: Detailed explanation:

Ok, first look at the inner part of the flatMap. We take a string, and split it apart on whitespace:

val line = "a b c b"
val tokens = line.split("\\s+") // Array(a, b, c, a, b)

Now identity is a function that just returns its argument, so if wegroupBy(identity)`, we map each distinct word type, to each word token:

val grouped = tokens.groupBy(identity) // Map(c -> Array(c), a -> Array(a), b -> Array(b, b))

And finally, we want to count up the number of tokens for each type:

val counts = grouped.mapValues(_.size) // Map(c -> 1, a -> 1, b -> 2)

Since we map this over all the lines in the file, we end up with token counts for each line.

So what does flatMap do? Well, it runs the token-counting function over each line, and then combines all the results into one big collection.

Assume the file is:

a b c b
b c d d d
e f c

Then we get:

val countsByLine = 
  io.Source.fromFile("textfile.txt")            // read from the file
    .getLines.flatMap{ line =>                  // for each line
       line.split("\\s+")                       // split the line into tokens
         .groupBy(identity).mapValues(_.size)   // count each token in the line
    }                                           // this produces an iterator of token counts
println(countsByLine.toList) // List((c,1), (a,1), (b,2), (c,1), (d,3), (b,1), (c,1), (e,1), (f,1))

So now we need to combine the counts of each line into one big set of counts. The countsByLine variable is an Iterator, so it doesn’t have a groupBy method. Instead we can convert it to a Stream, which is basically a lazy list. We want laziness because we don’t want to have to read the entire file into memory before we start. Then the groupBy groups all counts of the same word type together.

val groupedCounts = countsByLine.toStream.groupBy(_._1)
println(groupedCounts.mapValues(_.toList)) // Map(e -> List((e,1)), f -> List((f,1)), a -> List((a,1)), b -> List((b,2), (b,1)), c -> List((c,1), (c,1), (c,1)), d -> List((d,3)))

And finally we can sum up the counts from each line for each word type by grabbing the second item from each tuple (the count), and summing:

val totalCounts = groupedCounts.mapValues(_.map(_._2).sum)
println(totalCounts.toList)
List((e,1), (f,1), (a,1), (b,3), (c,3), (d,3))

And there you have it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to implement a simple wordcount in scala using an immutable map(this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply