I have a data frame the first columns of which are a sample ID

Question

0

Asked: May 30, 20262026-05-30T18:55:56+00:00 2026-05-30T18:55:56+00:00

I have a data frame the first columns of which are a sample ID

0

I have a data frame the first columns of which are a sample ID number and then a well position, like so:

>df[1:12,1:10]

S    W   V3   V4  
SID1 A01 <NA> <NA>
SID2 A02 <NA> <NA>
SID3 A03 <NA> <NA>
SID4 A01 <NA> <NA>
SID5 A02 <NA> <NA>
SID5 A03 <NA> <NA>

the combination of the S and W columns are unique, and must remain so, as some samples have repeated measures, but for downstream analysis reasons (not in R) cannot be placed on the same row as is usual.

I wish to insert data into the data frame based on the unique combination of these two columns.

The data I am trying to insert is from another data frame and looks like this:

>results[1:12, 1:4]

SampleID   Value    Assay           Well
SID1       0       V3       A01
SID1       0       V4       A01
SID2       1       V3       A02
SID2       2       V4       A02
SID3       0       V3       A03
SID3       1       V4       A03
SID4       0       V3       A01
SID4       0       V4       A01
SID5       1       V3       A02
SID5       2       V4       A02
SID6       0       V3       A03
SID6       1       V4       A03

so currently I am looping through the columns (V3 and V4, there are about 1000 columns in the real data set) and inserting the data for each column, one at a time based on the unique combination of sample id, well position and assay. This is slow. I want to vectorise this to make it faster by inserting all the values for V3 at the same time, based on sample id and well.

I tried

for(i in levels(result$Assay))
{
  df$V3[(df$V1 %in% results$SampleID)&(df$V2 %in% results$Well] 
  = results$Value[results$Assay==i]
}

This doesn’t work for me. I imagine because of something stupid on my part!
Any ideas?

EDIT:
Actually, Ben’s solution only almost worked. Everythings goes fine at first, but because the Assays are spread out over n files, and the samples are spread out over y files when merge tries to join the two dfs with an assay it’s already merged into df, it adds a new column and appends a “.1” onto the end.

Exactly what you’d expect merge to do I suppose. My fault for not explaining that my data is coming from separate files.

to illustrate:

I have 16 files. There 1536 samples spread out over 4 files, 384 each. There are 160 separate assays, spread out over 4 assay bundles. To run every assay for every sample I end up with 16 files.

So if I can get merge to not add a new column if the column for the current assay is already there, that would be perfect.

All suggestions are welcome,
and sorry for being crap at explaining my data!

Cheers
Davy

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T18:55:58+00:00

Let’s suppose you have the file names in a vector datafiles such that files 1-4 are the data for all assays for samples 1-384, 5-8 for all assays for samples 385-768, and so on, and that you want to end up with a data frame that is 1536 rows by 162 columns.

library(reshape)
## read all files into a list of data frames:
alldata <- lapply(datafiles,read.table)

Split into four chunks:

splitdata <- split(alldata,rep(1:4,each=4))

A function to take a list of n data sets, each containing m assays from k individuals (i.e. each one is k*m rows by 4 columns: SampleID, Well, Assay, Value) and combine them into a single data set that is k rows by n*m+2 columns long:

mergefun <- function(X) {
    cdata <- lapply(X,
                   cast,
                   formula=SampleID+Well~Assay,
                   value="Value")
     ## produces data sets of the form
     ##   SampleID Well V3 V4
     ## 1     SID1  A01  0  0
     ## 2     SID2  A02  1  2
     ##  ...
     Reduce(cdata,merge)
}

Now apply this to each of the chunks:

merged_data <- lapply(splitdata,mergefun)

Now combine the chunks:

final <- do.call(rbind,merged_data)

I’m not sure this will work, but it might. You should take the pieces apart and examine what they do separately if it doesn’t work on the first try — I may have screwed up somewhere.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data frame the first columns of which are a sample ID

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply