I’m curious how I should go about improving the performance of a Haskell routine that finds the lexicographically minimal cyclic rotation of a string.
import Data.List
swapAt n = f . splitAt n where f (a,b) = b++a
minimumrotation x = minimum $ map (\i -> swapAt i x) $ elemIndices (minimum x) x
I’d imagine that I should use Data.Vector rather than lists because Data.Vector provides in-place operations, probably just manipulating some indices into the original data. I shouldn’t actually need to bother tracking the indices myself to avoid excess copying, right?
I’m curious how the ++ impact the optimization though. I’d imagine it produces a lazy string thunk that never does the appending until the string gets read that far. Ergo, the a should never actually be appended onto the b whenever minimum can eliminate that string early, like because it begins with some very later letter. Is this correct?
xs ++ ysadds some overhead in all the list cells fromxs, but once it reaches the end ofxsit’s free — it just returnsys.Looking at the definition of
(++)helps to see why:i.e., it has to “re-build” the entire first list as the result is traversed. This article is very helpful for understanding how to reason about lazy code in this way.
The key thing to realise is that appending isn’t done all at once; a new linked list is incrementally built by first walking through all of
xs, and then puttingyswhere the[]would go.So, you don’t have to worry about reaching the end of
band suddenly incurring the one-time cost of “appending”ato it; the cost is spread out over all the elements ofb.Vectors are a different matter entirely; they’re strict in their structure, so even examining just the first element of
xs V.++ ysincurs the entire overhead of allocating a new vector and copyingxsandysto it — just like in a strict language. The same applies to mutable vectors (except that the cost is incurred when you perform the operation, rather than when you force the resulting vector), although I think you’d have to write your own append operation with those anyway. You could represent a bunch of appended (immutable) vectors as[Vector a]or similar if this is a problem for you, but that just moves the overhead to when you flattening it back into a single Vector, and it sounds like you’re more interested in mutable vectors.