I am trying to write a function that converts between Maps with integer keys to the corresponding arrays. I have the base case done, but am trying to write the recursive case (i.e. multi-dimensional arrays: converting Map[Int, Map[Int, X]] to Array[Array[X]]).
This task arose out of the need to construct an array from a stream without knowing how large the array will be in advance, allowing for the possibility of elements to come off the stream in random order and the possibility of duplicate elements coming off the stream.
I have a function that does it:
def toArrayHard[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
if (x.size == 0) new Array(0)
else
{
val max:Int = 1 + x.keys.max
val a:Array[X] = new Array(max)
var i = 0
while (i < max)
{
a(i) = x(i)
i += 1
}
a
}
}
Note, I’m aware that the code fails if the map contains key k but doesn’t contain key i where 0 <= i < k. This is okay for my purposes.
Now I’m looking to do the same for arbitrarily deep multi-dimensional arrays.
For example, converting between Map[Int, Map[Int, X]] to Array[Array[X]]. Unfortunately, I am getting tripped up by the types. Using the above as a base case, here’s what I have so far:
def toArrayHardRec[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
import scala.collection.Map
if (x.size == 0) new Array(0)
else
{
x match
{
case t:Map[Int, Map[Int, Y]] forSome { type Y } =>
{
val f0 = t.mapValues{m => toArrayHardRec[Map[Int, Y]](m)}
toArrayHard(f0)
}
case _ => toArrayHard(x)
}
}
}
This is the error that I get:
‘=>’ expected but ‘forSome’ found.
Since this is an educational pursuit, any feedback is greatly appreciated. Specifically, I would appreciate any code critiques of my awfully java-looking code, existing scala functions that do the same thing, or suggestions for an alternative way to build these arrays.
This is meaningless:
because of erasure, all you can check for is this:
which, in fact, is the only way you won’t get warnings about erasure. Also, an existential type is almost always unnecessary, and, personally, I always find its syntax a bit tricky to get right. Still, this is the one case where using
_to signify existential type is adequate. Note, however, that in my own code (the first version) I can’t use it, because the types won’t match if I do.Next,
That’s not a particular good idea. You must know what type you’ll return to declare it — if the deepness is “arbitrary”, then so is the type. You can get away with a
Array[AnyRef], but it will be painful to use.Still, here is one way. I did away with all the loops, replacing them with map. Note where I take advantage of the fact that a
Mapis also aFunction1, by callingmap m. Note also that I just usemapover an array (created withtoArray) to avoid having to manage map creation.There are two problems. To begin with, if the keys are not contiguous (
Map(0 -> "a", 2 -> "b"), for example), the elements will be out of position. Here’s a way around it, replacing the next to last line of code with:Here I used an empty string to stand for any hole in the arrays.
The other problem is that I assume all maps will be of type
Map[Int, AnyRef]. To fix this, the could would have to be rewritten like this:Here I discard all pairs whose keys aren’t
Intand values aren’tAnyRef. One could further safe-check the code to test if something’s missing and report and error in that case:Finally, about performance. Using
mapthe way I did means a new array will be allocated for eachmap. Some people take that to mean I have to iterate three times instead of one, but since each time I do a different task, it isn’t really doing much more work. However, the fact that I allocate a new array each time definitely has performance impact — both because of the allocation penalty (Java pre-initializes all arrays), and because of cache locality concerns.One way to avoid that is using a
view. When you domap,flatMapandfilterover aview, you get a new view back, with that operation stored for future use (other methods might work that way too — check the documentation, or test when in doubt). When you finally do anapplyon theviewobject, it will apply all the operations necessary to get the specific element you have asked for. It will do so every time youapplyfor that element too, so performance can be both better or worse.Here I’ll start with a
Rangeview, to avoid array allocation, and then convert the view to anArrayat the end. Still,keyswill create a set, imposing some overhead. After this I’ll explain how to avoid that.You should leave
viewto necessary optimizations, however, instead of pro-actively using them. It isn’t necessarily faster than normal collections, and its non-strictness can be tricky. Another alternative to usingviewis using aStreaminstead. AStreamis much like aList, except it only computes it’s elements on demand. That means none of themapoperations will be applied until necessary. To use it, just replace the next to last line to this:The main advantage of a
Streamover aviewis that once a value in aStreamhas been computed, it stays computed. That is also it’s main disadvantage over aview, ironically enough. In this particular case, I thinkviewis faster.Finally, about doing a
maxwithout computing aSet(the result ofkeys) first. AMapis also anIterable, and allIterablehave amaxmethod, so you can simply dom.max. That, however, will compute the max ofTuple2[Int, AnyRef], a type for which there is noOrdering. However,maxreceives an implicit parameter telling it whatOrderingto use, which can be thus provided:This is a bit of a mouthful, so we can define the implicit we need and use it, err, implicitly. 🙂
Note that
maxreturns a tuple,keyandvalue, so we need_1to get thekey. Also, note that we create anOrderingobject at each nesting ofdeMap. Not bad, but it could be made better by defining it elsewhere.Once you have done all that, a
whileloop will still be faster, though I have no idea by how much. Given enough JVM optimizations, they might come pretty close. On the other hand, if you really want speed, you could go all the way to assembler code. It comes down to a balance between how easy and fast it is to write the code, how easy it is to maintain it, and how fast it runs.