I have a large data frame with 48 columns, and I want to run

Question

0

Asked: June 5, 20262026-06-05T05:35:51+00:00 2026-06-05T05:35:51+00:00

I have a large data frame with 48 columns, and I want to run

0

I have a large data frame with 48 columns, and I want to run a function on each row of the data frame whereby columns that pass a test given by the function are set to NA’s. This test involves grabbing a number from another data frame. adply is a natural fit for this, but I am having problems getting it to give me the results I want.

Let me elucidate:

Here is an example of the data frame I want to manipulate:

 >df
  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA        1.344        324.0
3  3   0.3           NA           NA        0.445        167.0
4  4   0.4        1.455        354.2        0.322        321.2

And here is the small data frame from which the test will be derived:

> tcell
  depth  name
1   0.2 Cell1
2   0.4 Cell2
3   0.6 Cell3
4   0.8 Cell4

The whole idea is to assign NA’s to those data points of Cells that are deeper than the actual depth listed in the large data frame (i.e. in the 3rd row, the depth is 0.3 but there are two data points corresponding to Cell2, which is at 0.4 m depth, and therefore these are errors. I want to NA these).

I want to write a function which takes in a row at a time and:
1) grabs the instrument depth
2) gets a list of the column names
3) gets indices of cells that are deeper than the instrument depth
4) gets the names of those cells (i.e. Cell1, Cell2, Cell4, etc.)
5) uses a regular expression to find where in the list of column names those columns with corresponding cells (i.e. Cell1_avgdir,Cell1_avgvel, etc.)
6) using those indices, set those column values to NA.

Here’s what I have so far:

depthNA = function(x) {
  depth = x$depth
  nms = names(df)
  ind = as.character(which(depth < tcell$depth))
  c = tcell$name[ind]
  patt = paste(c,collapse="|")
  c_ind = grep(patt,nms)
  x[,c_ind] <- NA
}

adply(df,1,depthNA)

Unfortunately this doesn’t do what I thought it would, and I’m now stuck trying to figure out why.

It gives me this:

  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir V1
1  1   0.1           NA           NA           NA           NA NA
2  2   0.2           NA           NA        1.344        324.0 NA
3  3   0.3           NA           NA        0.445        167.0 NA
4  4   0.4        1.455        354.2        0.322        321.2 NA

When what I want is:

  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA           NA           NA
3  3   0.3           NA           NA           NA           NA
4  4   0.4        1.455        354.2        0.322        321.2

Hopefully I have sufficiently explained my problem. Thanks to anyone who can either: 1) fix what I’ve started, or 2) tell me a better way to do it that I’m ignorant of.

-SH

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T05:35:52+00:00

Below is an answer that answers what your ideas outline but does not match your output. See my comment above about whether the output is right or not. The answer relies on reshape2 to make joining easier.

First, I read your data in with:

df <- read.table(text = "  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA        1.344        324.0
3  3   0.3           NA           NA        0.445        167.0
4  4   0.4        1.455        354.2        0.322        321.2", header = TRUE)

tcell <- read.table(text = " depth  name
1   0.2 Cell1
2   0.4 Cell2
3   0.6 Cell3
4   0.8 Cell4", header = TRUE)

Then address your problem:

library(reshape2)

#Melt into long format
df.m <- melt(df, id.vars = 1:2)
#Split the column into two new columns based on _
df.m[, c("Cell", "OtherCol")] <- with(df.m, colsplit(variable, "_", c("Cell", "OtherCol")))
#Merge together with tcell
df.m <- merge(df.m, tcell, by.x = "Cell", by.y = "name")
#Add a new column which sets the offending values to NA
df.m <- transform(df.m, newvalue = ifelse(value > depth.y, NA, value))
#Cast back into wide format
dcast(pt + depth.x ~ variable, value.var = "newvalue", data = df.m)

  pt depth.x Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1     0.1           NA           NA           NA           NA
2  2     0.2           NA           NA           NA           NA
3  3     0.3           NA           NA           NA           NA
4  4     0.4           NA           NA        0.322           NA

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large data frame with 48 columns, and I want to run

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply