Note: This is an FAQ, asked specifically so I can answer it myself, as this issue seems to come up fairly often and I want to put it in a location where it can (hopefully) be easily found via a search
As prompted by a comment on my answer here
For example:
"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]
Looking in the scaladoc, all of these use the map operation inherited from TraversableLike, so how come it’s always able to return the most specific valid collection? Even String, which provides map via an implicit conversion.
Scala collections are clever things…
Internals of the collection library is one of the more advanced topics in the land of Scala. It involves higher-kinded types, inference, variance, implicits, and the
CanBuildFrommechanism – all to make it incredibly generic, easy to use, and powerful from a user-facing perspective. Understanding it from the point-of-view of an API designer is not a light-hearted task to be taken on by a beginner.On the other hand, it’s incredibly rare that you’ll ever actually need to work with collections at this depth.
So let us begin…
With the release of Scala 2.8, the collection library was completely rewritten to remove duplication, a great many methods were moved to just one place so that ongoing maintenance and the addition of new collection methods would be far easier, but it also makes the hierarchy harder to understand.
Take
Listfor example, this inherits from (in turn)LinearSeqOptimisedGenericTraversableTemplateLinearSeqSeqSeqLikeIterableIterableLikeTraversableTraversableLikeTraversableOnceThat’s quite a handful! So why this deep hierarchy? Ignoring the
XxxLiketraits briefly, each tier in that hierarchy adds a little bit of functionality, or provides a more optimised version of inherited functionality (for example, fetching an element by index on aTraversablerequires a combination ofdropandheadoperations, grossly inefficient on an indexed sequence). Where possible, all functionality is pushed as far up the hierarchy as it can possibly go, maximising the number of subclasses that can use it and removing duplication.mapis just one such example. The method is implemented inTraversableLike(Though theXxxLiketraits only really exist for library designers, so it’s generally considered to be a method onTraversablefor most intents and purposes – I’ll come to that part shortly), and is widely inherited. It’s possible to define an optimised version in some subclass, but it must still conform to the same signature. Consider the following uses ofmap(as also mentioned in the question):In each case, the output is of the same type as the input wherever possible. When it’s not possible, superclasses of the input type are checked until one is found that does offer a valid return type. Getting this right took a lot of work, especially when you consider that
Stringisn’t even a collection, it’s just implicitly convertible to one.So how is it done?
One half of the puzzle is the
XxxLiketraits (I did say I’d get to them…), whose main function is to take aReprtype param (short for “Representation”) so that they’ll know the true subclass actually being operated on. So e.g.TraversableLikeis the same asTraversable, but abstracted over theReprtype param. This param is then used by the second half of the puzzle; theCanBuildFromtype class that captures source collection type, target element type and target collection type to be used by collection-transforming operations.It’s easier to explain with an example!
BitSet defines an implicit instance of
CanBuildFromlike this:When compiling
BitSet(1,2,3,4) map {2*}, the compiler will attempt an implicit lookup ofCanBuildFrom[BitSet, Int, T]This is the clever part… There’s only one implicit in scope that matches the first two type parameters. The first parameter is
Repr, as captured by theXxxLiketrait, and the second is the element type, as captured by the current collection trait (e.g.Traversable). Themapoperation is then also parameterised with a type, this typeTis inferred based on the third type parameter to theCanBuildFrominstance that was implicitly located.BitSetin this case.So the first two type parameters to
CanBuildFromare inputs, to be used for implicit lookup, and the third parameter is an output, to be used for inference.CanBuildFrominBitSettherefore matches the two typesBitSetandInt, so the lookup will succeed, and inferred return type will also beBitSet.When compiling
BitSet(1,2,3,4) map {_.toString}, the compiler will attempt an implicit lookup ofCanBuildFrom[BitSet, String, T]. This will fail for the implicit in BitSet, so the compiler will next try its superclass –Set– This contains the implicit:Which matches, because Coll is a type alias that’s initialised to be
BitSetwhenBitSetderives fromSet. TheAwill match anything, ascanBuildFromis parameterised with the typeA, in this case it’s inferred to beString… Thus yielding a return type ofSet[String].So to correctly implement a collection type, you not only need to provide a correct implicit of type
CanBuildFrom, but you also need to ensure that the concrete type of that of that collection is supplied as theReprparam to the correct parent traits (for example, this would beMapLikein the case of subclassingMap).Stringis a little more complicated as it providesmapby an implicit conversion. The implicit conversion is toStringOps, which subclassesStringLike[String], which ultimately derivesTraversableLike[Char,String]–Stringbeing theReprtype param.There’s also a
CanBuildFrom[String,Char,String]in scope so that the compiler knows that when mapping the elements of aStringtoChars, then the return type should also be a string. From this point onwards, the same mechanism is used.