Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6973049
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T17:03:29+00:00 2026-05-27T17:03:29+00:00

I am using R to analyze genome-wide association study data. I have about 500,000

  • 0

I am using R to analyze genome-wide association study data. I have about 500,000 potential predictor variables (single-nucleotide polymorphisms, or SNPs) and want to test the association between each of them and a continuous outcome (in this case low-density lipoprotein concentration in the blood).

I have already written a script that does this without problem. To briefly explain, I have a data object, called “Data”. Each row corresponds to a particular patient in the study. There are columns for age, gender, body mass index (BMI), and blood LDL concentration. There are also half a million other columns with the SNP data.

I am currently using a for loop to run the linear model half a million times, as shown:

# Repeat loop half a million times
for(i in 1:500000) {

# Select the appropriate SNP
SNP <- Data[i]

# For each iteration, perform linear regression adjusted for age, gender, and BMI and save the result in an object called "GenoMod"
GenoMod  <- lm(bloodLDLlevel ~ SNP + Age + Gender + BMI, data = Data)

# For each model, save the p value and error for each SNP. I save these two data points in columns 1 and 2 of a matrix called "results"
results[i,1] <- summary(GenoMod)$coefficients["Geno","Pr(>|t|)"]
results[i,2] <- summary(GenoMod)$coefficients["Geno","Estimate"]
}

All of that works fine. However, I would really like to speed up my analysis. I’ve therefore been experimenting with the multicore, DoMC, and foreach packages.

My question is, could someone please help me adapt this code using the foreach scheme?

I am running the script on a Linux server that apparently has 16 cores available. I’ve tried experimenting with the foreach package, and my results using it have been comparatively worse, meaning that it takes longer to run the analysis using foreach.

For example, I’ve tried saving the linear model objects as shown:

library(doMC)
registerDoMC()
results <- foreach(i=1:500000) %dopar% { lm(bloodLDLlevel ~ SNP + Age + Gender + BMI, data = Data) }

This takes more than twice as long as using just a regular for loop. Any advice on how to do this better or more quickly would be appreciated! I understand that using the parallel version of lapply might be an option, but don’t know how to do this either.

All the best,

Alex

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T17:03:30+00:00Added an answer on May 27, 2026 at 5:03 pm

    To give you a startup: If you use Linux, you can do the multicore approach contained within the parallel package. Whereas you needed to set up the whole thing when using eg the foreach package, that’s not necessary any more with this approach. Your code would be run on 16 cores by simply doing :

    require(parallel)
    
    mylm <- function(i){
      SNP <- Data[i]
      GenoMod  <- lm(bloodLDLlevel ~ SNP + Age + Gender + BMI, data = Data)
      #return the vector
      c(summary(GenoMod)$coefficients["Geno","Pr(>|t|)"],
        summary(GenoMod)$coefficients["Geno","Estimate"])
    }
    
    Out <- mclapply(1:500000, mylm,mc.cores=16) # returns list
    Result <- do.call(rbind,Out) # make list a matrix
    

    Here you make a function that returns a vector with the wanted quantities, and apply the indices over this. I couldn’t check this though as I don’t have access to the data, but it should work.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been playing around with using graphs to analyze big data. Its been
We are using Stata to combine and analyze data for all of our agencies
I am using the 'analyze' tool in xcode to check for potential leakages in
I have a large tar.gz file to analyze using a python script. The tar.gz
Using the build and analyze of XCode I saw i have a memory leak
I have installed Sonar and configured it to analyze our (.NET) projects (using Sonar-Runner).
I have a crash dump file that I need to analyze using windbg to
I'd like to analyze a continuous stream of data (accessed over HTTP) using a
I have tried using the leaks tool, and analyze etc to find the leak,
I am using Apple's MyGizmoClass Singleton class for program-wide session variables and loving it!

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.