Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7584485
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T18:55:56+00:00 2026-05-30T18:55:56+00:00

I have a data frame the first columns of which are a sample ID

  • 0

I have a data frame the first columns of which are a sample ID number and then a well position, like so:

>df[1:12,1:10]

S    W   V3   V4  
SID1 A01 <NA> <NA>
SID2 A02 <NA> <NA>
SID3 A03 <NA> <NA>
SID4 A01 <NA> <NA>
SID5 A02 <NA> <NA>
SID5 A03 <NA> <NA>

the combination of the S and W columns are unique, and must remain so, as some samples have repeated measures, but for downstream analysis reasons (not in R) cannot be placed on the same row as is usual.

I wish to insert data into the data frame based on the unique combination of these two columns.

The data I am trying to insert is from another data frame and looks like this:

>results[1:12, 1:4]

SampleID   Value    Assay           Well
SID1       0       V3       A01
SID1       0       V4       A01
SID2       1       V3       A02
SID2       2       V4       A02
SID3       0       V3       A03
SID3       1       V4       A03
SID4       0       V3       A01
SID4       0       V4       A01
SID5       1       V3       A02
SID5       2       V4       A02
SID6       0       V3       A03
SID6       1       V4       A03

so currently I am looping through the columns (V3 and V4, there are about 1000 columns in the real data set) and inserting the data for each column, one at a time based on the unique combination of sample id, well position and assay. This is slow. I want to vectorise this to make it faster by inserting all the values for V3 at the same time, based on sample id and well.

I tried

for(i in levels(result$Assay))
{
  df$V3[(df$V1 %in% results$SampleID)&(df$V2 %in% results$Well] 
  = results$Value[results$Assay==i]
}

This doesn’t work for me. I imagine because of something stupid on my part!
Any ideas?

EDIT:
Actually, Ben’s solution only almost worked. Everythings goes fine at first, but because the Assays are spread out over n files, and the samples are spread out over y files when merge tries to join the two dfs with an assay it’s already merged into df, it adds a new column and appends a “.1” onto the end.

Exactly what you’d expect merge to do I suppose. My fault for not explaining that my data is coming from separate files.

to illustrate:

I have 16 files. There 1536 samples spread out over 4 files, 384 each. There are 160 separate assays, spread out over 4 assay bundles. To run every assay for every sample I end up with 16 files.

So if I can get merge to not add a new column if the column for the current assay is already there, that would be perfect.

All suggestions are welcome,
and sorry for being crap at explaining my data!

Cheers
Davy

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T18:55:58+00:00Added an answer on May 30, 2026 at 6:55 pm

    Let’s suppose you have the file names in a vector datafiles such that files 1-4 are the data for all assays for samples 1-384, 5-8 for all assays for samples 385-768, and so on, and that you want to end up with a data frame that is 1536 rows by 162 columns.

    library(reshape)
    ## read all files into a list of data frames:
    alldata <- lapply(datafiles,read.table)
    

    Split into four chunks:

    splitdata <- split(alldata,rep(1:4,each=4))
    

    A function to take a list of n data sets, each containing m assays from k individuals (i.e. each one is k*m rows by 4 columns: SampleID, Well, Assay, Value) and combine them into a single data set that is k rows by n*m+2 columns long:

    mergefun <- function(X) {
        cdata <- lapply(X,
                       cast,
                       formula=SampleID+Well~Assay,
                       value="Value")
         ## produces data sets of the form
         ##   SampleID Well V3 V4
         ## 1     SID1  A01  0  0
         ## 2     SID2  A02  1  2
         ##  ...
         Reduce(cdata,merge)
    }
    

    Now apply this to each of the chunks:

    merged_data <- lapply(splitdata,mergefun)
    

    Now combine the chunks:

    final <- do.call(rbind,merged_data)
    

    I’m not sure this will work, but it might. You should take the pieces apart and examine what they do separately if it doesn’t work on the first try — I may have screwed up somewhere.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a data frame with two columns. First column contains categories such as
I have a data.frame called series_to_plot.df which I created by combining a number of
I have a data frame with 22239 rows & 200 columns. The first column
I have a data.frame with 20 columns. The first two are factors, and the
I have a data.frame with 2 columns: Node A, Node B. Each entry in
I have a data frame with gaps like this: Var1 Var2 Var3 1 NA
I have a data.frame in R that looks like this: score rms template aln_id
I have a data frame containing multiple time series of returns, stored in columns.
Let's say I have a data frame looking like this: Value1 Value2 1 543
I currently have a dataset which has two columns that I'd like to compare.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.