First without the details I have data.frame s like that one: val1 val2 val3

Question

0

Editorial Team

Asked: May 24, 20262026-05-24T17:23:34+00:00 2026-05-24T17:23:34+00:00

First without the details I have data.frame s like that one: val1 val2 val3

0

First without the details

I have data.frames like that one:

  val1 val2 val3 val4 val5
1  1.1    2  1.1  2.1  4.2
2  5.7    5  5.6  4.9  9.9
3  3.1    3  3.2  2.9  5.9
4  9.6    1  9.5  1.0  2.0

and want to get the (nearly) equal rows. The desired result would be something like

[1] "val1" "val2" "val5"

because the column val3 is almost equal to val1, val4 is almost equal to val2 and val5 is different.

Details:

What does “nearly” equal mean (just one of the options listed below):
- the absolute difference of the values is smaller than a fixed number (0.2 for the sample above)
- the relative difference of the values is smaller than a fixed number (~11% for the sample)
- other metrics which make sense 😉
a listing of linearly dependent columns would be even better (but I think that’s way more complicated) (that would mean that val5 is also part of the group which is formed by val2 and val4 since it’s roughly twice the value)
it has not to be really fast, O(n^2) would be okay. (my frames are only about 12 rows and 300 columns)
if that should not be possible, a list of exactly equal columns would somehow work, too. Then I would apply the round() function before

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T17:23:35+00:00

It’s not quite well-defined how to choose which rows are equal; for instance, you could have three columns where A and B are “equal” and B and C are “equal” but A and C are not. What to do then? One way around that might be to use hierarchical clustering, maybe like this:

Using the data from Andrie’s answer, first transpose it and make it into a matrix; I’ll also standardize each row (what was a column) as a start at finding linear combinations; this will group rows that are exact multiple of each other but not more complex combinations.

d <- t(as.matrix(d))
s <- rowSums(d)
ds <- sweep(d, 1, s, `/`)

We now make a tree, and for interest, plot it. This uses the default distance function (Euclidean) but others are possible.

tree <- hclust(dist(ds))
plot(tree)

plot of tree from hclust

We then choose where to cut the tree into groups (this is where you choose how close two have to be to be “equal”); I output it together with the sum of values to see if any are multiples of another.

> grp <- cutree(tree, h=0.1)
> cbind(grp, s)

     grp    s
val1   1 19.5
val2   2 11.0
val3   1 19.4
val4   2 10.9
val5   2 22.0

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

First without the details I have data.frame s like that one: val1 val2 val3

First without the details

Details:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply