The fact that Haskell’s default String implementation is not efficient both in terms of speed and memory is well known. As far as I know the [] lists in general are implemented in Haskell as singly-linked lists and for most small/simple data types (e.g. Int) it doesn’t seem like a very good idea, but for String it seems like total overkill. Some of the opinions on this matter include:
On simple benchmarks like this, even programs written in interpreted languages such as Python can outperform Haskell code that uses String by an order of magnitude.
Efficient String Implementation in Haskell
Since a String is just [Char], that is a linked list of Char, it means Strings have poor locality of reference, and again means that Strings are fairly large in memory, at a minimum it’s N * (21bits + Mbits) where N is the length of the string and M is the size of a pointer (…). Strings are much less likely to be able to be optimized to loops, etc. by the compiler.
I know that Haskell has ByteStrings (and Arrays) in several nice flavors and that they can do the job nicely, but I would expect the default implementation to be the most efficient one.
TL;DR: Why is Haskell’s default String implementation a singly-linked list even though it is terribly inefficient and rarely used for real world applications (except for the really simple ones)? Are there historical reasons? Is it easier to implement?
Because singly-linked lists support:
and so
Stringas[Char](unicode points) means a string type that fits the language goals (as of 1990), and essentially come “for free” with the list library.In summary, historically the language designers were interested more in well-designed core data types, than the modern problems of text processing, so we have an elegant, easy to understand, easy to teach
Stringtype, that isn’t quite a unicode text chunk, and isn’t a dense, packed, strict data type.