The following (unoptimal) code generates all the subsets of size N for certain subset.
This code works but, as I said, is highly unoptimal. Using an intermediate list to avoid the O(log(n)) of Set.insert doesn’t seem help due to the large cost of later reconverting the list to a Set
Can anybody suggest how to optimize the code?
import qualified Data.Set as Set
subsetsOfSizeN :: Ord a => Int -> Set.Set a -> Set.Set (Set.Set a)
subsetsOfSizeN n s
| Set.size s < n || n < 0 = error "subsetOfSizeN: wrong parameters"
| otherwise = doSubsetsOfSizeN n s
where doSubsetsOfSizeN n s
| n == 0 = Set.singleton Set.empty
| Set.size s == n = Set.singleton s
| otherwise =
case Set.minView s of
Nothing -> Set.empty
Just (firstS, restS) ->
let partialN n = doSubsetsOfSizeN n restS in
Set.map (Set.insert firstS) (partialN (n-1)) `Set.union` partialN n
Doesn’t seem so terribly bad to me. The number of subsets of size
kof a set of sizenisn `choose` kwhich grows rather fast fork ~ n/2. So creating all the subsets must scale badly.Hmm, I found using lists to give better performance. Not asymptotically, I think, but a not negligible more-or-less constant factor.
But first, there is an inefficiency in your code that is simple to repair:
Note that
Set.mapmust rebuild a tree from scratch. But we know thatfirstSis always smaller than any element in any of the sets inpartialN (n-1), so we can useSet.mapMonotonicthat can reuse the spine of the set.And that principle is also what makes lists attractive, the subsets are generated in lexicographic order, so instead of
Set.fromListwe can use the more efficientSet.fromDistinctAscList. Transcribing the algorithm yieldswhich in the few benchmarks I’ve run is between 1.5 and 2× faster than the amended algorithm using
Sets.And that is in turn, in my criterion benchmarks, nearly twice as fast as dave4420‘s.