Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7058229
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T04:05:16+00:00 2026-05-28T04:05:16+00:00

I am trying to work out how to parallelize some code from data mining

  • 0

I am trying to work out how to parallelize some code from “data mining with R – learning with case studies” in order to have it run quicker on my macbook pro. The particular code in question is below. The code basically uses the same data (DSs) and applies six different learners (e.g. svm, nnet for regression and classification etc) with a small number of variants.

The full code is HERE (near the bottom, in the “model evaluation and selection” section).

for(td in TODO) {
  assign(td,
     experimentalComparison(
       DSs,         
       c(
         do.call('variants',
                 c(list('singleModel',learner=td),VARS[[td]],
                   varsRootName=paste('single',td,sep='.'))),
         do.call('variants',
                 c(list('slide',learner=td,
                        relearn.step=c(60,120)),
                   VARS[[td]],
                   varsRootName=paste('slide',td,sep='.'))),
         do.call('variants',
                 c(list('grow',learner=td,
                        relearn.step=c(60,120)),
                   VARS[[td]],
                   varsRootName=paste('grow',td,sep='.')))
         ),
        MCsetts)
     )
  # save the results
  save(list=td,file=paste(td,'Rdata',sep='.'))
}

Most of the parallelization information I find, seems to be more applicable to things like ‘apply’, where the same function is applied to different subsets of the data. What this code does, is the opposite – different functions applied the same data.

Would it be better to parallel the outer FOR loop, so that the code within is run for multiple learners at a time, as opposed to parallel the code within the loop so that the different windowing approaches are paralleled for a single learner.

Execution for a single iteration is just over 2 hours on my macbook, where only 2 cores appear to be doing anything (the other two just sit idle). The actual code from the link is set to 20 iterations… It would be great to use my idle cores to reduce this

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T04:05:16+00:00Added an answer on May 28, 2026 at 4:05 am

    In the non-parallel case, passing functions into an lapply loop is straightforward.

    lapply(c(mean, sum), function(f) f(1:5))
    

    The are a few different systems for parallel programming with R. This next example uses snow.

    library(snow)
    cl <- makeCluster(c("localhost","localhost"), type = "SOCK")
    clusterApply(cl, c(mean, sum), function(f) f(1:5))
    stopCluster(cl)
    

    You should get the same answer in each case!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to work out a code sample to demonstrate the debugging functionality
Trying to work out distance between two points (lat & lng) I have an
Im trying to work out the best way scale my site, and i have
I´m trying to work out a problem with registering my configuration classes. I have
I'm trying to work out how this can be achieved using jQuery, I have
I'm trying to work out what's not working in this code: #!/usr/bin/python import cmd
I am trying to work out through the sample code that comes with Fast
I am trying to work out if users have had activity in the past
I am trying to work out how to have the UNICODE representation of Sun,
I'm trying to work out some regex that will eliminate all of the special

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.