I’m facing problems to make my code runs in parallel. It is a 3D Delaunay generator using a divide & conquer algorithm named DeWall.
The main function is:
deWall::[SimplexPointer] -> SetSimplexFace -> Box -> StateT DeWallSets IO ([Simplex], [Edge])
deWall p afl box = do
...
...
get >>= recursion box1 box2 p1 p2 sigma edges
...
...
It calls the “recursion” function that might call the dewall function back. And it is here where the parallization opportunity appears. The following code shows the sequential solution.
recursion::Box -> Box -> [SimplexPointer] -> [SimplexPointer] -> [Simplex] -> [Edge] -> DeWallSets -> StateT DeWallSets IO ([Simplex], [Edge])
recursion box1 box2 p1 p2 sigma edges deWallSet
| null afl1 && null afl2 = return (sigma, edges)
| (null) afl1 = do
(s, e) <- deWall p2 afl2 box2
return (s ++ sigma, e ++ edges)
| (null) afl2 = do
(s,e) <- deWall p1 afl1 box1
return (s ++ sigma, e ++ edges)
| otherwise = do
x <- get
liftIO $ do
(s1, e1) <- evalStateT (deWall p1 afl1 box1) x
(s2, e2) <- evalStateT (deWall p2 afl2 box2) x
return (s1 ++ s2 ++ sigma, e1 ++ e2 ++ edges)
where afl1 = aflBox1 deWallSet
afl2 = aflBox2 deWallSet
State and IO monads are used to pipe the state and to generate UID for each tetrahedron found using MVar’s. My first attempt was to add a forkIO but it doesn’t work. It gives a wrong output due a lack of control during the merge part that doesn’t wait for both threads to finish. I don’t know how to make it wait for them.
liftIO $ do
let
s1 = evalStateT (deWall p1 afl1 box1) x
s2 = evalStateT (deWall p2 afl2 box2) x
concatThread var (a1, b1) = takeMVar var >>= \(a2, b2) -> putMVar var (a1 ++ a2, b1 ++ b2)
mv <- newMVar ([],[])
forkIO (s1 >>= concatThread mv)
forkIO (s2 >>= concatThread mv)
takeMVar mv >>= \(s, e) -> return (s ++ sigma, e ++ edges)
So, my next attempt was to use a better parallel strategy “par” and “pseq” which gives the right result but no parallel execution according to threadScope.
liftIO $ do
let
s1 = evalStateT (deWall p1 afl1 box1) x
s2 = evalStateT (deWall p2 afl2 box2) x
conc = liftM2 (\(a1, b1) (a2, b2) -> (a1 ++ a2, b1 ++ b2))
(stotal, etotal) = s1 `par` (s2 `pseq` (s1 `conc` s2))
return (stotal ++ sigma, etotal ++ edges)
What am I doing wrong?
UPDATE: Somehow this problem seems to be related with the presence of IO monads. In an other (old) version with no IO monad, only State monad, the parallel execution runs with 'par' and 'pseq'. The GHC -sstderr gives SPARKS: 1160 (69 converted, 1069 pruned).
recursion::Box -> Box -> [SimplexPointer] -> [SimplexPointer] -> [Simplex] -> [Edge] -> DeWallSets -> State DeWallSets ([Simplex], [Edge])
recursion p1 p2 sigma deWallSet
| null afl1 && null afl2 = return sigma
| (null) afl1 = do
s <- deWall p2 afl2 box2
return (s ++ sigma)
| (null) afl2 = do
s <- deWall p1 afl1 box1
return (s ++ sigma)
| otherwise = do
x <- get
let s1 = evalState (deWall p1 afl1 box1) x
let s2 = evalState (deWall p2 afl2 box2) x
return $ s1 `par` (s2 `pseq` (s1 ++ s2 ++ sigma))
where afl1 = aflBox1 deWallSet
afl2 = aflBox2 deWallSet
Cloud someone explain that?
The easiest way to make this work would be use something like:
This works, but there’s some unnecessary overhead. A better solution is:
Or possible this if the results aren’t being evaluated where you’d like them to be:
(these are answers that I gave to the poster in #haskell that I thought would be useful here as well)
Edit: removed unnecessary evaluate.