Does anyone have a suggestion on how to extract columns from a data set based on metadata stored in a second data set? Just wondering if there is a relatively straightforward way (e.g. using “colnames” or “subset”). My original data set is quite large with more than 100 columns and more than 30,000 rows. Opening the file and selecting in Excel is a pain.
Here two example data sets:
set1 <- data.frame(ID = rnorm(5, 5000, 1000), Sample1 = rnorm(5, 50000, 2500),
Sample2 = rnorm(5, 50000, 2500), Sample3 = rnorm(5, 50000, 2500),
Sample4 = rnorm(5, 50000, 2500), Sample5 = rnorm(5, 50000, 2500))
meta.data <- data.frame(Sample_name = c("Sample1", "Sample2", "Sample3",
"Sample4", "Sample5"), Location = c("Loc1", "Loc2", "Loc3", "Loc1", "Loc1"),
Time = c("M0", "M01", "M02", "M02", "M03"),
Conc = c("lo", "hi", "lo", "lo", "lo"))
(1) How could I extract (as a new data set) all samples from Location Loc1 or all samples from Time M02?
(2) How could I extract a row that has a certain ID number and select only those samples within that row that have a Conc “lo”?
Not sure if this is the best way, with a merge possibly being more appropriate, but here is how to do some subsetting:
(1) How could I extract (as a new data set) all samples from Location Loc1…
(2) How could I extract a row that has a certain ID number and select only those samples within that row that have a Conc “lo”?
I’ve just used
set1$ID[1]as a replacement for a selectedIDhere due to the example being random numbers. Just replace it with something likeset1$ID=="idnum1"