Looking for the proper data type (such as IndexedSeq[Double]) to use when designing a domain-specific numerical computing library. For this question, I’m limiting scope to working with 1-Dimensional arrays of Double. The library will define a number functions that are typically applied for each element in the 1D array.
Considerations:
- Prefer immutable data types, such as
VectororIndexedSeq - Want to minimize data conversions
- Reasonably efficient in space and time
- Friendly for other people using the library
- Elegant and clean API
Should I use something higher up the collections hierarchy, such as Seq?
Or is it better to just define the single-element functions and leave the mapping/iterating to the end user?
This seems less efficient (since some computations could be done once per set of calls), but at at the same time a more flexible API, since it would work with any type of collection.
Any recommendations?
If your computations are to do anything remotely computationally intensive, use
Array, either raw or wrapped in your own classes. You can provide a collection-compatible wrapper, but make that an explicit wrapper for interoperability only. Everything other thanArrayis generic and thus boxed and thus comparatively slow and bulky.If you do not use
Array, people will be forced to abandon whatever things you have and just useArrayinstead when performance matters. Maybe that’s okay; maybe you want the computations to be there for convenience not efficiency. In that case, I suggest usingIndexedSeqfor the interface, assuming that you want to let people know that indexing is not outrageously slow (e.g. is notList), and useVectorunder the hood. You will use about 4x more memory thanArray[Double], and be 3-10x slower for most low-effort operations (e.g. multiplication).For example, this:
is about three times slower than this:
If you use the
mapmethod onArray, it’s just as slow as theVector[Double]way; operations onArrayare generic and hence boxed. (And that’s where the majority of the penalty comes from.)