I have a large data frame that Im working with, the first few lines are as follows:
Assay Genotype Sample Result
1 001 G 1 0
2 001 A 2 1
3 001 G 3 0
4 001 NA 4 NA
5 002 T 1 0
6 002 G 2 1
7 002 T 3 0
8 002 T 4 0
9 003 NA 1 N
10 003 G 2 1
11 003 G 3 1
12 003 T 4 0
In total I’ll be working with 2000 samples and 168 Assays for each sample. For each sample, Id like extract the data in ‘Result’ for each sample to create either a list or data frame that looks something like this:
Sample Data
1 00N
2 111
3 001
4 N00
The resulting data frame (or similar preferred data structure) would thus be 2000 rows and 2 columns. The ‘Data’ line would contain 168 characters each one for each ‘Assay’.
Can somebody help me with this problem?
One approach with package
plyrand base functionpaste:EDIT to address question
Probably the easiest way I can think of to change your NA to N is to use
gsubon the result ofddply. Note I’m liberally borrowing the very good point provided by @Brian re: ordering. Do that, it’s a good tip!Then use
gsubet voila: