I have a large data frame with 48 columns, and I want to run a function on each row of the data frame whereby columns that pass a test given by the function are set to NA’s. This test involves grabbing a number from another data frame. adply is a natural fit for this, but I am having problems getting it to give me the results I want.
Let me elucidate:
Here is an example of the data frame I want to manipulate:
>df
pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1 1 0.1 NA NA NA NA
2 2 0.2 NA NA 1.344 324.0
3 3 0.3 NA NA 0.445 167.0
4 4 0.4 1.455 354.2 0.322 321.2
And here is the small data frame from which the test will be derived:
> tcell
depth name
1 0.2 Cell1
2 0.4 Cell2
3 0.6 Cell3
4 0.8 Cell4
The whole idea is to assign NA’s to those data points of Cells that are deeper than the actual depth listed in the large data frame (i.e. in the 3rd row, the depth is 0.3 but there are two data points corresponding to Cell2, which is at 0.4 m depth, and therefore these are errors. I want to NA these).
I want to write a function which takes in a row at a time and:
1) grabs the instrument depth
2) gets a list of the column names
3) gets indices of cells that are deeper than the instrument depth
4) gets the names of those cells (i.e. Cell1, Cell2, Cell4, etc.)
5) uses a regular expression to find where in the list of column names those columns with corresponding cells (i.e. Cell1_avgdir,Cell1_avgvel, etc.)
6) using those indices, set those column values to NA.
Here’s what I have so far:
depthNA = function(x) {
depth = x$depth
nms = names(df)
ind = as.character(which(depth < tcell$depth))
c = tcell$name[ind]
patt = paste(c,collapse="|")
c_ind = grep(patt,nms)
x[,c_ind] <- NA
}
adply(df,1,depthNA)
Unfortunately this doesn’t do what I thought it would, and I’m now stuck trying to figure out why.
It gives me this:
pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir V1
1 1 0.1 NA NA NA NA NA
2 2 0.2 NA NA 1.344 324.0 NA
3 3 0.3 NA NA 0.445 167.0 NA
4 4 0.4 1.455 354.2 0.322 321.2 NA
When what I want is:
pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1 1 0.1 NA NA NA NA
2 2 0.2 NA NA NA NA
3 3 0.3 NA NA NA NA
4 4 0.4 1.455 354.2 0.322 321.2
Hopefully I have sufficiently explained my problem. Thanks to anyone who can either: 1) fix what I’ve started, or 2) tell me a better way to do it that I’m ignorant of.
-SH
Below is an answer that answers what your ideas outline but does not match your output. See my comment above about whether the output is right or not. The answer relies on
reshape2to make joining easier.First, I read your data in with:
Then address your problem: