I have a data frame df with 2 variables A and B. I would like to split A in groups 1 and 2 so that mean(df$B[df$group==1]) as close as possible to mean(df$B[df$group==2])
Or just to express it otherwise, what I would like is to find a cut point (cutp) in df$A that would minimize the abs(mean(df$B[df$A<cutp])-mean(df$B[df$A>=cutp]))
Any ideas?
If you want to find a threshold on variable A, to split the data into two groups, so that the means of B in those two groups be similar, you can compute these means for all possible cut-points, and check when the distance between those means is minimal.