Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8272485
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T07:08:12+00:00 2026-06-08T07:08:12+00:00

I have a loop which is loading decent size files around 5MB each and

  • 0

I have a loop which is loading decent size files around 5MB each and than running some computations on them. I need to load 500-1000 of them. Seems like an easy job for foreach.

I am doing this but the performance of doSNOW seems to be horrendous.

I found this post and this fellow seems to have had the same issues:

http://statsadventure.blogspot.com/2012/06/performance-with-foreach-dosnow-and.html

So a couple of questions.

  1. Is there an alternative to doSnow? I realize there is doMC but I am running windows.
  2. Is doMC on linux that much faster than doSNOW?
  3. Is there anyway to output to screen from a worker so I can at least get some sort of idea how my job is progressing.

Thank you in advance!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T07:08:14+00:00Added an answer on June 8, 2026 at 7:08 am

    Multiple threads trying to access different files on the hard disk can lead to very bad performance.

    However, load balanced parallelization may still lead to improvement if enough time goes into calculations: the nodes will get out of synchronization thus hard disk requests will come in one after the other instead of all at the same time.

    Here’s a simple example of snow::clusterApply vs the load balanced snow::clusterApplyLB. I use snow instead of parallel as it provides timing and plotting:

    library (snow)
    system(sprintf('taskset -p 0xffffffff %d', Sys.getpid()))
    cl <- makeSOCKcluster (rep ("localhost", 2))
    
    times <- sample (1:6) / 4
    times
    ## [1] 1.50 0.25 0.75 1.00 0.50 1.25
    
    t <- snow.time (l <- clusterApply (cl, times, function (x) Sys.sleep (x)))
    plot (t, main = "\n\nclusterApply") 
    for (i in 1 : 2)
      points (t$data[[i]][,"send_start"], rep (i, 3), pch = 20, cex = 2)
    

    clusterApply

    tlb <- snow.time (l <- clusterApplyLB (cl, times, function (x) Sys.sleep (x)))
    plot (tlb, main = "\n\nclusterApplyLB")
    for (i in 1 : 2)
      points (tlb$data[[i]][,"send_start"], rep (i, 3), pch = 20, cex = 2)
    

    clusterApplyLB

    The black dots mark the start of a new function call. If the function starts with loading the file, all nodes will always try to access the hard disk at the same time with clusterApply because the cluster waits for all nodes to return results before dealing out the new round of tasks. With clusterApplyLB, the next task is handed out as soon as a node returned the result. Even if the tasks take basically the same time, they will get out of synchronization rather fast and the file loading will not be exactly at the same time.

    (I don’t know whether this is the actual problem, though)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following loop which is giving me problems $(#divResults).append('<table>'); $.each( results.d, function(
I have text files which I need to remove stop words from them. I
I have a loop which basically calls this every few seconds (after the timeout):
I have a loop which needs to create an unspecified and indefinite amount of
I have a for-loop which is setting the value of the progress bar on
I have a loop MC which will be duplicate to stage several times according
I have a for-loop which performs the following function: Take a M by 8
i have a php loop which displays only one record even if there is
I have a loop in razor which generates a template a number of times.
I have a binary search loop which gets hit many times in the execution

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.