I have a matrix that I would like to subset and eventually use to make a plot. The data is a list of counts for specific blood markers for each patient in a population. It looks like this:
df <- data.frame(MarkerID=c("Class","A123","A124"),
MarkerName=c("","X","Y"),
Patient.1=c(0,1,5),
Patent.2=c(1,2,6),
Patent.3=c(0,3,7),
Patient.4=c(1,4,8))
I would like to make a data frame of all of the patients (columns 3-6) that have a class value of zero (1st row) and a second data frame of all of the patients with a class value of 1.
In the past I have used the subset function to select rows based on the values in a column, is it possible to select a subset of columns based on the values in a row?
I’ve tried this:
x <- subset(data, data[1,] == 0)
however, when I do dim(x) the number of columns is the same as dim(data) but the number of rows is different. Any ideas on how I can make this return just those columns whose value in row 1 is 0?
Roland,
Yes. You’re example df is what the data frame looks like. There are ~30,000 markers and >400 patients in the data frame so I didn’t post the dput(head(data)). Thanks for the reshaping tip, I’ll give that a try.
Your example code did work to subset the columns based on the rows
data[,c(TRUE,TRUE,data[1,-(1:2)]==1)]
on the data I was then able to get a data frame with all of the rows and only the columns with the indicated class.
Your data is nor arranged in a good way. It would be better to reshape it.
In absence of input data this is just a guess:
Here
c(TRUE,TRUE,df[1,-(1:2)]==0)creates a logical vector, which isTRUEfor the first two columns and for those columns, which have a 0 in the first row. Then I subset the columns based on this vector.This would reshape your data into a more common format (for statistical software):
You could then use
subset: