Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6328067
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T17:23:34+00:00 2026-05-24T17:23:34+00:00

First without the details I have data.frame s like that one: val1 val2 val3

  • 0

First without the details

I have data.frames like that one:

  val1 val2 val3 val4 val5
1  1.1    2  1.1  2.1  4.2
2  5.7    5  5.6  4.9  9.9
3  3.1    3  3.2  2.9  5.9
4  9.6    1  9.5  1.0  2.0

and want to get the (nearly) equal rows. The desired result would be something like

[1] "val1" "val2" "val5"

because the column val3 is almost equal to val1, val4 is almost equal to val2 and val5 is different.

Details:

  • What does “nearly” equal mean (just one of the options listed below):
    • the absolute difference of the values is smaller than a fixed number (0.2 for the sample above)
    • the relative difference of the values is smaller than a fixed number (~11% for the sample)
    • other metrics which make sense 😉
  • a listing of linearly dependent columns would be even better (but I think that’s way more complicated) (that would mean that val5 is also part of the group which is formed by val2 and val4 since it’s roughly twice the value)
  • it has not to be really fast, O(n^2) would be okay. (my frames are only about 12 rows and 300 columns)
  • if that should not be possible, a list of exactly equal columns would somehow work, too. Then I would apply the round() function before
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T17:23:35+00:00Added an answer on May 24, 2026 at 5:23 pm

    It’s not quite well-defined how to choose which rows are equal; for instance, you could have three columns where A and B are “equal” and B and C are “equal” but A and C are not. What to do then? One way around that might be to use hierarchical clustering, maybe like this:

    Using the data from Andrie’s answer, first transpose it and make it into a matrix; I’ll also standardize each row (what was a column) as a start at finding linear combinations; this will group rows that are exact multiple of each other but not more complex combinations.

    d <- t(as.matrix(d))
    s <- rowSums(d)
    ds <- sweep(d, 1, s, `/`)
    

    We now make a tree, and for interest, plot it. This uses the default distance function (Euclidean) but others are possible.

    tree <- hclust(dist(ds))
    plot(tree)
    

    plot of tree from hclust

    We then choose where to cut the tree into groups (this is where you choose how close two have to be to be “equal”); I output it together with the sum of values to see if any are multiples of another.

    > grp <- cutree(tree, h=0.1)
    > cbind(grp, s)
    
         grp    s
    val1   1 19.5
    val2   2 11.0
    val3   1 19.4
    val4   2 10.9
    val5   2 22.0
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

When I installed perl from the source the first nice surprise was that without
every night i have a trigger that executes asp.net page with method that first
I have two tables kinda like this: // Person.Details Table: PersonID int [PK] |
I have some method that is invoked on Application_Start. And it starts on first
I have two tables. One has some product details, the other holds various photos
Is it possible to learn C# as your first computer language without any knowledge
I'm creating my first class, mainly guided by Overland's C++ Without Fear. I've made
When I invoke my Perl scripts in the Windows environment without invoking perl first,
First off, I am using Windows XP. I have multiple hard drives and it
First of all, I know how to build a Java application. But I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.