I have a large data frame that Im working with, the first few lines

Question

0

Asked: May 26, 20262026-05-26T04:07:12+00:00 2026-05-26T04:07:12+00:00

I have a large data frame that Im working with, the first few lines

0

I have a large data frame that Im working with, the first few lines are as follows:

      Assay   Genotype   Sample    Result
1     001        G         1         0
2     001        A         2         1
3     001        G         3         0 
4     001        NA        4         NA
5     002        T         1         0
6     002        G         2         1
7     002        T         3         0 
8     002        T         4         0
9     003        NA        1         N
10    003        G         2         1
11    003        G         3         1 
12    003        T         4         0

In total I’ll be working with 2000 samples and 168 Assays for each sample. For each sample, Id like extract the data in ‘Result’ for each sample to create either a list or data frame that looks something like this:

Sample  Data
   1    00N
   2    111
   3    001
   4    N00

The resulting data frame (or similar preferred data structure) would thus be 2000 rows and 2 columns. The ‘Data’ line would contain 168 characters each one for each ‘Assay’.

Can somebody help me with this problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:07:13+00:00

One approach with package plyr and base function paste:

library(plyr)
ddply(dat, "Sample", summarize, Data = paste(Result, collapse = ""))

  Sample Data
1      1  00N
2      2  111
3      3  001
4      4 NA00

EDIT to address question

Probably the easiest way I can think of to change your NA to N is to use gsub on the result of ddply. Note I’m liberally borrowing the very good point provided by @Brian re: ordering. Do that, it’s a good tip!

out <- ddply(dat, "Sample", summarize, Data = paste(Result[order(Assay)], collapse = ""))

Then use gsub

out$Data <- gsub("NA", "N", out$Data)

et voila:

  Sample Data
1      1  00N
2      2  111
3      3  001
4      4  N00

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large data frame that Im working with, the first few lines

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply