Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6744533
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T12:05:43+00:00 2026-05-26T12:05:43+00:00

I have two sets of data in this form: x | y | z

  • 0

I have two sets of data in this form:

x   |  y  |  z        x1   |   y1   |  z1
ab1 |  1  |  2        ab1  |   1    |  2
ab1 |  2  |  3        ab1  |   1.8  |  2
ab2 |  2  |  3        ab1  |   1.8  |  2

The number of columns can change between 1 to 30. The number of rows of the two sets is likely to be different.
The average amount of rows per set can change between few hundreds to few millions.
For each column a different matching rule will be applied, for example:

x: perfect match
y: +/- 0.1
z: +/- 0.5

Two rows are equivalent when all the criterias are satisfied.
My final goal is to find the rows in the first set with no match in second set.

The naive algorithm could be:

foreach a in SetA
{
    foreach b in SetB
    {
        if (a == b)
        {
            remove b from SetB
            process the next element in SetA
        }
    }
    log a is not in SetB
}

At this stage I am not very interested in the efficiency of the algorithm. I am sure I could do better and I could reduce the complexity.
I am more concern about the correctness of the result. Let’s try with a very simple example.
Two sets of number:

A       B
1.6    1.55
1.5    1.45
4      3.2

And two elements are equal if:

b + 0.1 >= a >= b - 0.1

Now, if I run the naive algorithm I will find 2 matches.
However the result of the algorithm depends on the order of the two sets. For example:

A       B
1.5    1.55
1.6    1.45
4      3.2

The algorithm will find only one match.

I would like to find the maximum number of matching rows.

I reckon in the real world data one of the columns will store an id, so the number of possible multiple matches will be a much smaller subset of the original set.
I know I can try to face this problem with a post processing after the first scan.
However, I don’t want reinventing the wheel and I am wondering if my problem is equivalent to some famous, well known and already solved problem.

PS: I have tagged the question also as C++, C# and Java because I am going to use one of these languages to implement it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T12:05:44+00:00Added an answer on May 26, 2026 at 12:05 pm

    It can be cast as a graph theory problem. Let X be a set that contains one node for each row in your first set. Let Y be another set which contains one node for each row in your second set.

    The edges in the graph are defined by: for a given x in X and a given y in Y, there is an edge (x,y) if the row corresponding to x matches the row corresponding to y.

    Once you have built this graph you can run the “maximum-bipartite-matching” algorithm on it and you will be done.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two sets of data (3 columns: x=categorical, y = numerical, l =
I have two sets of data, (Ax, Ay; Bx, By). I'd like to plot
I have two types of data sets. Both are in same size. One contains
I have two constructors for an objects, which use two different sets of data.
Have two sets of data (two tables) for patient records, one 1999-2003, the other
I have two large data sets and I am attempting to reformat the older
Assume that we have two data sets A, B that have m to n
I have a two data sets as lists, for example: xa = [1, 2,
I have two sets of data points that both relate to the same primary
I have two sets of data which I need to join, but there is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.