I’m trying to learn Haskell and I was trying to create a function that takes a list of lists and groups the sublist by equivalent sums. This is not homework.
import Data.List
let x = [[1,2],[2,1],[5,0],[0,3],[1,9]]
let groups = groupBy (\i j -> sum i == sum j) x
I get this output in GHCi:
[[[1,2],[2,1]],[[5,0]],[[0,3]],[[1,9]]]
I get [[1,2],[2,1]] grouping together, but not with [0,3]. Why is this?
I suspect I need to use map, but I can’t seem to make it work.
The
groupByfunction preserves the input order and is thus invertible. If you’re willing to throw away that information, you could use code along the lines ofIn action:
How it works
The application of
elemsfrom Data.Map gives a clue for what’s happening.Mapping
A Map associates values of some type κ with values of another possibly distinct type α. In the example from your question, you start with
xwhose type isThat is,
xis a list of integer lists. The type of the resulting partition ofxyou want isor a list of lists where each of the latter lists are themselves lists that all have the same sum. The
Num τ =>bit is a context that constrains the type τ to be an instance of the typeclassNum. Happy for us,Integeris such a type:We know then that the type of the partition is
[[[Integer]]]. This typeclass nonsense may seem unnecessarily fussy, but we’ll need the concept again in just a moment. (To give you an idea of what’s going on, the typechecker doesn’t have enough information to decide whether the literal0, for example, is of typeIntorInteger.)Each sublist contains lists with the same sum. In other words, there exists a mapping from a sum to a list of integer lists. Therefore, the type of the Map used in
bucketBymust resembleFor example, with the sum 3 we associate the list
The fold recursion pattern
Folding is a highly general pattern. Left fold,
foldland friends in Haskell lets you “insert” an operator between elements of a list beginning with the zero value at the left end of the list. For example, the sum of[5,3,9,1]expressed as a left fold isor
That is, beginning with a base value of zero, we successively add elements of the list and accumulate the sum.
Recall the definition of
bucketBycontainsThis means the result of the left fold must be of type
Map Integer [[Integer]], the zero value for our fold is the empty Map of that type, andgois somehow adding each successive value of a list into the map.Note that
foldl'is the strict cousin offoldl, but strictness is beyond the scope of this answer. (See also “Stack overflow” on HaskellWiki.)Dude, where’s my list?
Given the type of
foldl'we should have three arguments in the application, but only two are present in the code above. This is because the code is written in point-free style. Your list is there implicitly due to partial application of
foldl'.Think back to the sum-as-fold example above. The type of that application without the final argument is
Partial application allows us to create new functions. Here we defined a function that computes a number from some list of numbers. Hmm, sounds familiar.
The
.combinator expresses function composition. Its name is chosen to resemble the notation g∘f as commonly seen in mathematics textbooks to mean “do f first and then compute g from the result.” This is exactly what’s happening in the definition ofbucketBy: fold the list of values into a Map and then get the values of out the Map.If ya gotta go, go with a smile
In your comment, you asked about the purpose of
m. With an explicit type annotation, we might definegoasMatching variables with types,
mis the Map we’ve accumulated so far, andlis the nextIntegerlist that we want to toss into the appropriate bucket. Recall thateqis an argument to the outerbucketBy.We can control how a new item goes into the map using
insertWith'. (By convention, functions whose names end with trailing quotes are strict variants.)The
(++)combinator appends lists. The applicationeq ldetermines the appropriate bucket forl.Had we written
lrather than[l], the result would want to bebut then we lose the structure of the innermost lists.
We’ve already constrained the type of
bucketBy‘s result to be[[[α]]]and thus the type of the Map’s elements. Say the next itemlto fold is[1,2]. We want to append,(++), it to some other list of type[[Integer]], but the types don’t match.*Main> [[0,3],[2,1]] ++ [1,2] <interactive>:1:21: No instance for (Num [t0]) arising from the literal `2' Possible fix: add an instance declaration for (Num [t0]) In the expression: 2 In the second argument of `(++)', namely `[1, 2]' In the expression: [[0, 3], [2, 1]] ++ [1, 2]Wrapping
lgets usGeneralizing further
You might stop with
or even
and be perfectly happy because it handles the case from your question.
Suppose down the road you have a different list
ydefined asEven though the definition is very nearly identical to
x,bucketByis of no use withy.*Main> bucketBy sum y <interactive>:1:15: Couldn't match expected type `Integer' with actual type `Int' Expected type: [[Integer]] Actual type: [[Int]] In the second argument of `bucketBy', namely `y' In the expression: bucketBy sum yLet’s assume you can’t change the type of
yfor some reason. You might copy-and-paste to create another function, saybucketByInt, where the only change is replacingIntegerwithIntin the type annotations.This would be highly, highly unsatisfying.
Maybe later you have some list of strings that you want to bucket according to the length of the longest string in each. In this imaginary paradise you could
What you want is entirely reasonable: bucket some list of things using the given criterion. But alas
*Main> bucketBy (maximum . map length) [["a","bc"],["d"],["ef","g"],["hijk"]] <interactive>:1:26: Couldn't match expected type `Integer' with actual type `[a0]' Expected type: Integer -> Integer Actual type: [a0] -> Int In the first argument of `map', namely `length' In the second argument of `(.)', namely `map length'Again, you may be tempted to write
bucketByString, but by this point, you’re ready to move away and become a shoe cobbler.The typechecker is your friend. Go back to your definition of
bucketBythat’s specific toIntegerlists, simply comment out the type annotation and ask its type.Now you can apply
bucketByfor the different cases above and get the expected results. You were already in paradise but didn’t know it!Now, in keeping with good style, you provide annotations for the toplevel definition of
bucketByto help the poor reader, perhaps yourself. Note that you must provide theOrdconstraint due to the use ofinsertWith', whose type isYou may want to be really explicit and give an annotation for the inner
go, but this requires use of the scoped type variables extension.Without the extension and with a type annotation of
the typechecker will fail with errors of the form
Could not deduce (b ~ b1) from the context (Ord b) bound by the type signature for bucketBy :: Ord b => (a -> b) -> [a] -> [[a]] at prog.hs:(10,1)-(12,46) `b' is a rigid type variable bound by the type signature for bucketBy :: Ord b => (a -> b) -> [a] -> [[a]] at prog.hs:10:1 `b1' is a rigid type variable bound by the type signature for go :: Map b1 [a1] -> a1 -> Map b1 [a1] at prog.hs:12:9 In the return type of a call of `eq' In the second argument of `insertWith'', namely `(eq l)' In the expression: insertWith' (++) (eq l) [l] mThis is because the typechecker treats the
bon the inner type annotation as a distinct and entirely unrelated typeb1even though a human reader plainly sees the intent that they be the same type.Read the scoped type variables documentation for details.
One last small surprise
You may wonder where the outer layer of brackets went. Notice that the type annotation generalized from
to
Note that
[Integer]is itself another type, represented here asa.