I’m a newbie to R and I have a dataframe called pedM with 203 rows and 19 columns and I wanted to code a line to delete then columns that returned mostly a value of 0 in the dataframe (say, 98% of the column is 0).
I’m using the dataframe to output a heatmap which returns an image based on gene expression, and I want to get rid of the lines that have outliers that don’t really contribute to the data.
I imagine the line of code I’m looking for is fairly simple, but I can’t wrap my head around the correct way to do it. Just to reiterate, looking for a line that would basically remove a column from a dataframe that has 98% 0’s or 2% nonzero’s, whichever way is easier.
Thanks in advance.
Vivek
I like to do this in a few steps for clarity. First define a function that checks a vector for 98% zeros and returns True or False:
Then create a boolean vector for each column using
apply:Finally drop the columns you don’t want: