I have a function frequencyBy which I would like to parallelize. Here follows a simple test case:
import Control.Parallel.Strategies
import Control.DeepSeq
import System.Environment
frequencyBy :: (a -> b -> Bool) -> [a] -> [b] -> [(a,Int)]
frequencyBy f as bs = map
(\a ->(a, foldr (\b -> if f a b then (+) 1 else id) 0 bs)) as
main :: IO ()
main = do
x:xs <- getArgs
let result = frequencyBy (==) [1::Int .. 10000] [1 .. (read x)] `using`
parList rdeepseq
print $ product $ map snd $ result
I would like to run the map in frequencyBy in parallel. I’m trying to achieve this using parList rdeepseq (all the other stuff in main is just to ensure not everything is optimized away). However, this doesn’t work, two threads do twice as much work as one thread does in the same time. I don’t understand what I’m doing wrong here.
It could be that the overhead is slowing things down, depending on how big x is; if the work you’re doing in each spark is comparable to the time it takes to spawn each spark (and of course there’s scheduling overhead, etc.), then you’ll run into problems.
You could try
parListChunk, e.g.parListChunk 64 rdeepseq; you’ll have to experiment to figure out which chunk size to use. While your current strategy is creating a spark for every element of the list,parListChunkcreates a spark for each chunk of a certain size in the list, and uses the strategy you specify sequentially over each element of that chunk.By the way, the
foldrinfrequencyByis probably slowing things down due to excessive thunk creation; something likeshould fix that.
Of course, as always, make sure you’re compiling with
-O2and running with+RTS -N.