Does anyone have a suggestion on how to extract columns from a data set

Question

0

Asked: June 7, 20262026-06-07T18:46:29+00:00 2026-06-07T18:46:29+00:00

Does anyone have a suggestion on how to extract columns from a data set

0

Does anyone have a suggestion on how to extract columns from a data set based on metadata stored in a second data set? Just wondering if there is a relatively straightforward way (e.g. using “colnames” or “subset”). My original data set is quite large with more than 100 columns and more than 30,000 rows. Opening the file and selecting in Excel is a pain.

Here two example data sets:

set1 <- data.frame(ID = rnorm(5, 5000, 1000), Sample1 = rnorm(5, 50000, 2500), 
Sample2 = rnorm(5, 50000, 2500), Sample3 = rnorm(5, 50000, 2500), 
Sample4 = rnorm(5, 50000, 2500), Sample5 = rnorm(5, 50000, 2500))

meta.data <- data.frame(Sample_name = c("Sample1", "Sample2", "Sample3", 
"Sample4", "Sample5"), Location = c("Loc1", "Loc2", "Loc3", "Loc1", "Loc1"), 
Time = c("M0", "M01", "M02", "M02", "M03"), 
Conc = c("lo", "hi", "lo", "lo", "lo"))

(1) How could I extract (as a new data set) all samples from Location Loc1 or all samples from Time M02?

(2) How could I extract a row that has a certain ID number and select only those samples within that row that have a Conc “lo”?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T18:46:31+00:00

Not sure if this is the best way, with a merge possibly being more appropriate, but here is how to do some subsetting:

(1) How could I extract (as a new data set) all samples from Location Loc1…

#get a list of the samples all from Location Loc1
as.character(meta.data$Sample_name[meta.data$Location=="Loc1"])
#use this list of samples to subset the set1 data
set1[c("ID",as.character(meta.data$Sample_name[meta.data$Location=="Loc1"]))]

        ID  Sample1  Sample4  Sample5
1 3836.499 53304.29 47720.79 49504.96
2 4620.443 49406.93 49123.49 50419.93
3 5614.903 44413.93 50387.27 48652.29
4 6676.880 52732.63 48282.92 53544.17
5 3926.077 52593.59 50204.96 49563.13

(2) How could I extract a row that has a certain ID number and select only those samples within that row that have a Conc “lo”?

I’ve just used set1$ID[1] as a replacement for a selected ID here due to the example being random numbers. Just replace it with something like set1$ID=="idnum1"

subset(set1,set1$ID==set1$ID[1])[c("ID",as.character(meta.data$Sample_name[meta.data$Conc=="lo"]))]

        ID  Sample1  Sample3  Sample4  Sample5
1 3836.499 53304.29 49706.58 47720.79 49504.96

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Does anyone have a suggestion on how to extract columns from a data set

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply