Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7218641
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T21:30:35+00:00 2026-05-28T21:30:35+00:00

I tried to solve this problem in PERL, but it only works with lesser

  • 0

I tried to solve this problem in PERL, but it only works with lesser data, so I need a solution in R, which I guess is faster and easier then PERL, anyway. I got one file like this one with two positions in the genome ( first and second column) and the distance bteween them (third column)

cg00000029  cg01016459  848
cg00000029  cg02021817  38
cg00000029  cg02851944  13
cg00000029  cg02976952  238
cg00000029  cg03943270  93
cg00000029  cg07396495  604
cg00000029  cg12190057  929

And my second file is this one, with the position in the genome and one expression value in each column, for each sample ( 1 to 6)

TargetID    sample1 sample2 sample3 sample4 sample5 sample6
cg00000029  0.157   0.444   0.466   0.805   0.5489  0.448
cg01016459  0.873   0.930   0.926   0.942   0.932   0.9128  
cg03943270  0.871   0.920   0.926   0.942   0.942   0.942

In fact I have 100 samples. My idea is to get a final file for each sample with the expression values
instead the cg’s and the distance. For example, for sample 1

0.157  0.873 848
0.157  0.871  93

for sample 2

0.444   0.930 848
0.444   0.920   93

In PERL I have no problems when I got only two samples, I load the files in two estructures, hashes of arrays, and then I compare them using nested foreach loops, but it take so much time only for 2 samples, imagine 100! I tried in R, loading the data in 2 data frames and use something as

expression[rownames(expression) %in% rownames(distances),]

the problem is that I need something like a loop or apply function to iterate over the expression data using the first cpg value and then the second , if they are in pairs in expression, put the expression values and the distances.

Any ideas would be welcome

Thanks in advance

`

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T21:30:36+00:00Added an answer on May 28, 2026 at 9:30 pm

    if your first data is in dat

    structure(list(V1 = c("cg00000029", "cg00000029", "cg00000029", 
    "cg00000029", "cg00000029", "cg00000029", "cg00000029"), V2 = c("cg01016459", 
    "cg02021817", "cg02851944", "cg02976952", "cg03943270", "cg07396495", 
    "cg12190057"), V3 = c(848L, 38L, 13L, 238L, 93L, 604L, 929L)), .Names = c("V1", 
    "V2", "V3"), class = "data.frame", row.names = c(NA, -7L))
    

    and second set is in target

    structure(list(TargetID = c("cg00000029", "cg01016459", "cg03943270"
    ), sample1 = c(0.157, 0.873, 0.871), sample2 = c(0.444, 0.93, 
    0.92), sample3 = c(0.466, 0.926, 0.926), sample4 = c(0.805, 0.942, 
    0.942), sample5 = c(0.5489, 0.932, 0.942), sample6 = c(0.448, 
    0.9128, 0.942)), .Names = c("TargetID", "sample1", "sample2", 
    "sample3", "sample4", "sample5", "sample6"), class = "data.frame", row.names = c(NA, 
    -3L))
    

    match() will get you what you’re looking for. I would use reshape and plyr packages. Specifically melt and ddply but I’m sure there is a apply version too.

    target.melt <- melt(target,id.var='TargetID')
    
    my.func <- function(lookup,df) {
      cg.one <- lookup$value[match(df$V1,lookup$TargetID)]
      cg.two <- lookup$value[match(df$V2,lookup$TargetID)]
    
      return(list(cgone=cg.one,cgtwo=cg.two,distance=df$V3))
    }
    
    out <- dlply(target.melt,.(variable),my.func,df=dat)
    

    there are a bunch of NAs with your data since the second data set is incomplete but what you asked for is there:

    > na.omit(as.data.frame(out[[1]]))
      cgone cgtwo distance
    1 0.157 0.873      848
    5 0.157 0.871       93
    > 
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have tried a few things to solve this problem but I can't seem
I've tried all sorts of design approaches to solve this problem, but I just
I have tried to solve this problem before, and I've searched for a solution
I searched and tried for hours to solve this problem, unfortunately without success. I
I'm trying to solve this problem but it always fails the tests. here's my
I have tried very very hard to solve this problem. I have always been
I followed other reslted question but still unable to solve this problem. I want
i am unable to solve this problem and i can´t find any solution elsewhere.
Tried examples from 'php.net' but don't understand what's the problem. Any suggestions? <?php $_SESSION['test']
I am trying to solve this problem : https://www.spoj.pl/problems/CERC07S/ I have identified that i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.