Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8566173
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T17:35:44+00:00 2026-06-11T17:35:44+00:00

I have written the following program using Parallel Haskell to find the divisors of

  • 0

I have written the following program using Parallel Haskell to find the divisors of 1 billion.

import Control.Parallel

parfindDivisors :: Integer->[Integer]
parfindDivisors n = f1 `par` (f2 `par` (f1 ++ f2))
              where f1=filter g [1..(quot n 4)]
                    f2=filter g [(quot n 4)+1..(quot n 2)]
                    g z = n `rem` z == 0

main = print (parfindDivisors 1000000000)

I’ve compiled the program with ghc -rtsopts -threaded findDivisors.hs and I run it with:
findDivisors.exe +RTS -s -N2 -RTS

I have found a 50% speedup compared to the simple version which is this:

findDivisors :: Integer->[Integer]
findDivisors n = filter g [1..(quot n 2)] 
      where  g z = n `rem` z == 0

My processor is a dual core 2 duo from Intel.
I was wondering if there can be any improvement in above code. Because in the statistics that program prints says:
Parallel GC work balance: 1.01 (16940708 / 16772868, ideal 2)
and SPARKS: 2 (1 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
What are these converted , overflowed , dud, GC’d, fizzled and how can help to improve the time.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T17:35:45+00:00Added an answer on June 11, 2026 at 5:35 pm

    IMO, the Par monad helps for reasoning about parallelism. It’s a little higher-level than dealing with par and pseq.

    Here’s a rewrite of parfindDivisors using the Par monad. Note that this is essentially the same as your algorithm:

    import Control.Monad.Par
    
    findDivisors :: Integer -> [Integer]
    findDivisors n = runPar $ do
        [f0, f1] <- sequence [new, new]
        fork $ put f0 (filter g [1..(quot n 4)])
        fork $ put f1 (filter g [(quot n 4)+1..(quot n 2)])
        [f0', f1'] <- sequence [get f0, get f1]
        return $ f0' ++ f1'
      where g z  = n `rem` z == 0
    

    Compiling that with -O2 -threaded -rtsopts -eventlog and running with +RTS -N2 -s yields the following relevant runtime stats:

      36,000,130,784 bytes allocated in the heap
           3,165,440 bytes copied during GC
              48,464 bytes maximum residency (1 sample(s))
    
                                        Tot time (elapsed)  Avg pause  Max pause
      Gen  0     35162 colls, 35161 par    0.39s    0.32s     0.0000s    0.0006s
      Gen  1         1 colls,     1 par    0.00s    0.00s     0.0002s    0.0002s
    
      Parallel GC work balance: 1.32 (205296 / 155521, ideal 2)
    
      MUT     time   42.68s  ( 21.48s elapsed)
      GC      time    0.39s  (  0.32s elapsed)
      Total   time   43.07s  ( 21.80s elapsed)
    
      Alloc rate    843,407,880 bytes per MUT second
    
      Productivity  99.1% of total user, 195.8% of total elapsed
    

    The productivity is very high. To improve the GC work balance slightly we can increase the GC allocation area size; run with +RTS -N2 -s -A128M, for example:

      36,000,131,336 bytes allocated in the heap
              47,088 bytes copied during GC
              49,808 bytes maximum residency (1 sample(s))
    
                                        Tot time (elapsed)  Avg pause  Max pause
      Gen  0       135 colls,   134 par    0.19s    0.10s     0.0007s    0.0009s
      Gen  1         1 colls,     1 par    0.00s    0.00s     0.0010s    0.0010s
    
      Parallel GC work balance: 1.62 (2918 / 1801, ideal 2)
    
      MUT     time   42.65s  ( 21.49s elapsed)
      GC      time    0.20s  (  0.10s elapsed)
      Total   time   42.85s  ( 21.59s elapsed)
    
      Alloc rate    843,925,806 bytes per MUT second
    
      Productivity  99.5% of total user, 197.5% of total elapsed
    

    But this is really just nitpicking. The real story comes from ThreadScope:

    lots of utilisation

    The utilisation is essentially maxed out for two cores, so additional significant parallelization (for two cores) is probably not going to happen.

    Some good notes on the Par monad are here.

    UPDATE

    A rewrite of the alternative algorithm using Par looks something like this:

    findDivisors ::  Integer -> [Integer]
    findDivisors n = let sqrtn = floor (sqrt (fromInteger n)) in runPar $ do
        [a, b] <- sequence [new, new]
        fork $ put a [a | (a, b) <- [quotRem n x | x <- [1..sqrtn]], b == 0]
        firstDivs  <- get a
        fork $ put b [n `quot` x | x <- firstDivs, x /= sqrtn]
        secondDivs <- get b
        return $ firstDivs ++ secondDivs
    

    But you’re right in that this will not get any gains from parallelism due to the dependence on firstDivs.

    You can still incorporate parallelism here, by getting Strategies involved to evaluate the elements of the list comprehensions in parallel. Something like:

    import Control.Monad.Par
    import Control.Parallel.Strategies
    
    findDivisors ::  Integer -> [Integer]
    findDivisors n = let sqrtn = floor (sqrt (fromInteger n)) in runPar $ do
        [a, b] <- sequence [new, new]
        fork $ put a 
            ([a | (a, b) <- [quotRem n x | x <- [1..sqrtn]], b == 0] `using` parListChunk 2 rdeepseq)
        firstDivs  <- get a
        fork $ put b 
            ([n `quot` x | x <- firstDivs, x /= sqrtn] `using` parListChunk 2 rdeepseq)
        secondDivs <- get b
        return $ firstDivs ++ secondDivs
    

    and running this gives some stats like

           3,388,800 bytes allocated in the heap
              43,656 bytes copied during GC
              68,032 bytes maximum residency (1 sample(s))
    
                                        Tot time (elapsed)  Avg pause  Max pause
      Gen  0         5 colls,     4 par    0.00s    0.00s     0.0000s    0.0001s
      Gen  1         1 colls,     1 par    0.00s    0.00s     0.0002s    0.0002s
    
      Parallel GC work balance: 1.22 (2800 / 2290, ideal 2)
    
                            MUT time (elapsed)       GC time  (elapsed)
      Task  0 (worker) :    0.01s    (  0.01s)       0.00s    (  0.00s)
      Task  1 (worker) :    0.01s    (  0.01s)       0.00s    (  0.00s)
      Task  2 (bound)  :    0.01s    (  0.01s)       0.00s    (  0.00s)
      Task  3 (worker) :    0.01s    (  0.01s)       0.00s    (  0.00s)
    
      SPARKS: 50 (49 converted, 0 overflowed, 0 dud, 0 GC'd, 1 fizzled)
    
      MUT     time    0.01s  (  0.00s elapsed)
      GC      time    0.00s  (  0.00s elapsed)
      Total   time    0.01s  (  0.01s elapsed)
    
      Alloc rate    501,672,834 bytes per MUT second
    
      Productivity  85.0% of total user, 95.2% of total elapsed
    

    Here almost 50 sparks were converted – that is, meaningful parallel work was being done – but the computations were not large enough to observe any wall-clock gains from parallelism. Any gains were probably offset by the overhead of scheduling computations in the threaded runtime.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written the following C program. The output is 32. Why is this?
i have written the following code class Program { static void Main(string[] args) {
I have written down the following program that uses the quicksort algorithm to sort
I have written the following simple C++ program in order to learn how to
I have written a small example C++ program, using boost::thread. Since it's 215 lines,
Using the ndk I have compiled a code written in C. The program is
I have written a simple C program using gcc compiler in Ubuntu enviroment. The
We have photo client program written using im4j wrapper to invoke ImageMajick to process
I have written following code to attach gesture recogniser to multiple imageviews. [imageview1 setUserInteractionEnabled:YES];
I am trying my hands on WPF MVVM. I have written following code in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.