I have a data frame df with 2 variables A and B. I would

Question

0

Asked: May 28, 20262026-05-28T03:38:11+00:00 2026-05-28T03:38:11+00:00

I have a data frame df with 2 variables A and B. I would

0

I have a data frame df with 2 variables A and B. I would like to split A in groups 1 and 2 so that mean(df$B[df$group==1]) as close as possible to mean(df$B[df$group==2])

Or just to express it otherwise, what I would like is to find a cut point (cutp) in df$A that would minimize the abs(mean(df$B[df$A<cutp])-mean(df$B[df$A>=cutp]))

Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T03:38:12+00:00

If you want to find a threshold on variable A, to split the data into two groups, so that the means of B in those two groups be similar, you can compute these means for all possible cut-points, and check when the distance between those means is minimal.

# Sample data
n <- 10
d <- data.frame(
  A = rnorm(n),
  B = rnorm(n)
)

# The quantity to minimize
# (You can use a loop instead of apply.)
d$differences <- apply(
  d, 1, 
  # Compute the difference of the means for each value of A
  function (u) { 
    i <- d$A <= u[1]; 
    abs( mean( d$B[which(i)]) - mean(d$B[which(!i)] ) )
  } 
)
# The mean of an empty vector is NaN: discard those values
d$differences[ ! is.finite( d$differences ) ] <- Inf
# Take the minimum
threshold <- d$A[ which.min( d$differences ) ]
# Build the groups
d$group <- ifelse( d$A <= threshold, "group 1", "group 2" )

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data frame df with 2 variables A and B. I would

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply