Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9000641
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T00:20:10+00:00 2026-06-16T00:20:10+00:00

I am having trouble developing a matching algorithm in SQL. I have one table

  • 0

I am having trouble developing a matching algorithm in SQL. I have one table subjects. Each of these needs to be matched to the same number of rows in the table controls (for the sake of this question let’s say two rows or controls need to be selected for each subject). The location of the selected controls must match exactly, and the controls selected should have a value in match_field that is as close as possible to the subjects.

Here is some sample data:

Table subjects:

id   location    match_field
1    1           190
2    2           2000
3    1           100

Table controls:

id   location    match_field
17    1          70
11    1          180
12    1          220
13    1          240
14    1          500
15    1          600
16    1          600
10    2          30
78    2          1840
79    2          2250

Here would be the optimum result from the sample data:

subject_id control_id  location    match_field_diff
1          12          1           30
1          13          1           50
2          78          2           160
2          79          2           250
3          17          1           30
3          11          1           80

It gets tricky, because, for example, control 11 is the closest match to subject 1. However, in the optimum solution control 11 is matched to subject 3.

I believe the Hungarian Algorithm is close to the “correct” solution to this problem. However, there is not an equal number of subjects and controls, nor will all controls be used (I have a few thousand subjects and a few million potential controls).

It is not necessary to obtain the absolute optimum results; a pretty good approximation would be fine with me.

It seems that there should be a nice set-based solution to this problem, but I can’t think of how to do it. Here is some code that assigns an equal number of controls to each subject based on location only:

select * from (
    select   subject.id, 
             control.id,
             subject.location,
             row_number() over (
                 partition by subject.location
                 order by subject.id, control.id
             ) as rn,
             count(distinct control.id)     over (
                 partition by subject.location
             ) as controls_in_loc
         from subjects
         join controls on control.location = subject.location
    )
    where mod(rn,controls_in_loc+1) = 1

However, I can’t figure out how to add the fuzzy matching component. I am using DB2 but can convert an algorithm into DB2 if you are using something else.

Thanks in advance for your help!

Update: I am mostly convinced that SQL is not the right tool for this job. However, just to be sure (and because it is an interesting problem), I am offering a bounty to see if a working SQL solution is possible. It needs to be a set-based solution. It can use iteration (looping over the same query multiple times to achieve the result) but number of iterations needs to be far less than the number of rows for a large table. It should not loop over each element in the table or use cursors.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T00:20:10+00:00Added an answer on June 16, 2026 at 12:20 am

    Although the Hungarian Algorithm is going to work, a much simpler algorithm can be used in your case. Your implicit cost matrix is a symmetric matrix of a special form:

    ABS(SUBJ.match_field-CTRL.match_field)
    

    Therefore, you can relatively easily prove that in an optimal assignment {SUBJi, CTRLj} ordered by SUBJ.match_field the values of CTRL.match_field will be ordered as well.

    Proof: Consider an assignment {SUBJi, CTRLj} ordered by SUBJ.match_field that is not ordered by CTRL.match_field. Then you have at least one inversion, i.e. a pair of assignments {SUBJi1, CTRLj1} and {SUBJi2, CTRLj2} such that

    SUBJ.match_fieldi1 < SUBJ.match_fieldi2, but

    CTRL.match_fieldj1 > CTRL.match_fieldj2

    Then you can replace the inverted pair with a non-inverted one

    {SUBJi1, CTRLj2} and {SUBJi2, CTRLj1}

    of a cost that is less than or equal to the cost of the inverted assignment for all six relative placements of SUBJ.match_field(i1, i2) and CTRL.match_field(j1, j2) (link to Wolfram Alpha). :Proof

    With this observation in hand, it is easy to prove that the dynamic programming algorithm below comes up with the optimal assignment:

    • Make N duplicates of each subject; order by match_field
    • Order controls by match_field
    • Prepare an empty array assignments of size N * subject.SIZE
    • Prepare an empty 2D array mem of size N * subject.SIZE by control.SIZE for memoization; set all elements to -1
    • Call Recursive_Assign defined in pseudocode below
    • The assignments table now contains N assignments for each subject i at positions between N*i, inclusive, and N*(i+1), exclusive.

    FUNCTION Recursive_Assign
        // subjects contains each original subj repeated N times
        PARAM subjects : array of int[subjectLength]
        PARAM controls: array of int[controlLength]
        PARAM mem : array of int[subjectLength,controlLength]
        PARAM sp : int // current subject position
        PARAM cp : int // current control position
        PARAM assign : array of int[subjectLength]
    BEGIN
        IF sp == subjects.Length THEN RETURN 0 ENDIF
        IF mem[sp, cp] > 0 THEN RETURN mem[sp, cp] ENDIF
        int res = ABS(subjects[sp] - controls[cp])
                + Recursive_Assign(subjects, controls, mem, sp + 1, cp + 1, assign)
        assign[sp] = cp
        IF cp+1+subjects.Length-sp < controls.Length THEN
            int alt = Recursive_Assign(subjects, controls, mem, sp, cp + 1, assign)
            IF alt < res THEN
                res = alt
            ELSE
                assign[sp] = cp
            ENDIF
        ENDIF
        RETURN (mem[sp, cp] = res)
    END
    

    Here is an implementation of the above pseudocode using C# on ideone.

    This algorithm is ready to be re-written as set-based in SQL. Trying to fit it into the original problem setting (with grouping by locations and making multiple copies of the subject) would add unnecessary layer of complexity to a procedure that is already rather complex, so I am going to simplify things quite a bit by using table-valued parameters of SQL Server. I am not sure if DB2 provides similar capabilities, but if it does not, you should be able to replace them with temporary tables.

    The stored procedure below is a nearly direct transcription of the above pseudocode into SQL Server’s syntax for stored procedures:

    CREATE TYPE SubjTableType AS TABLE (row int, id int, match_field int)
    CREATE TYPE ControlTableType AS TABLE (row int, id int, match_field int)
    CREATE PROCEDURE RecAssign (
        @subjects SubjTableType READONLY
    ,   @controls ControlTableType READONLY
    ,   @sp int
    ,   @cp int
    ,   @subjCount int
    ,   @ctrlCount int
    ) AS BEGIN
        IF @sp = @subjCount BEGIN
            RETURN 0
        END
        IF 1 = (SELECT COUNT(1) FROM #MemoTable WHERE sRow=@sp AND cRow=@cp) BEGIN
            RETURN (SELECT best FROM #MemoTable WHERE sRow=@sp AND cRow=@cp)
        END
        DECLARE @res int, @spNext int, @cpNext int, @prelim int, @alt int, @diff int, @sId int, @cId int
        SET @spNext = @sp + 1
        SET @cpNext = @cp + 1
        SET @sId = (SELECT id FROM @subjects WHERE row = @sp)
        SET @cId = (SELECT id FROM @controls WHERE row = @cp)
        EXEC @prelim = RecAssign @subjects=@subjects, @controls=@controls, @sp=@spNext, @cp=@cpNext, @subjCount=@subjCount, @ctrlCount=@ctrlCount
        SET @diff = ABS((SELECT match_field FROM @subjects WHERE row=@sp)-(SELECT match_field FROM @controls WHERE row=@cp))
        SET @res = @prelim + @diff
        IF 1 = (SELECT COUNT(1) FROM #Assignments WHERE sRow=@sp) BEGIN
            UPDATE #Assignments SET cId=@cId, sId=@sId, diff=@diff WHERE sRow=@sp
        END
        ELSE BEGIN
            INSERT INTO #Assignments(sRow, sId, cId, diff) VALUES (@sp, @sId, @cId, @diff)
        END
        IF @cp+1+@subjCount-@sp < @ctrlCount BEGIN
            EXEC @alt = RecAssign @subjects=@subjects, @controls=@controls, @sp=@sp, @cp=@cpNext, @subjCount=@subjCount, @ctrlCount=@ctrlCount
            IF @alt < @res BEGIN
                SET @res = @alt
            END
            ELSE BEGIN
                UPDATE #Assignments SET cId=@cId, sId=@sId, diff=@diff WHERE sRow=@sp
            END
        END
        INSERT INTO #MemoTable (sRow, cRow, best) VALUES (@sp, @cp, @res)
        RETURN @res
    END
    

    Here is how you call this stored procedure:

    -- The procedure uses a temporary table for memoization:
    CREATE TABLE #MemoTable (sRow int, cRow int, best int)
    -- The procedure returns a table with assignments:
    CREATE TABLE #Assignments (sRow int, sId int, cId int, diff int)
    
    DECLARE @subj as SubjTableType
    INSERT INTO @SUBJ (row, id, match_field) SELECT ROW_NUMBER() OVER(ORDER BY match_field ASC)-1 AS row, id, match_field FROM subjects
    DECLARE @ctrl as ControlTableType
    INSERT INTO @ctrl (row, id, match_field) SELECT ROW_NUMBER() OVER(ORDER BY match_field ASC)-1 AS row, id, match_field FROM controls
    DECLARE @subjCount int
    SET @subjCount = (SELECT COUNT(1) FROM subjects)
    DECLARE @ctrlCount int
    SET @ctrlCount = (SELECT COUNT(1) FROM controls)
    DECLARE @best int
    EXEC @best = RecAssign
        @subjects=@subj
    ,   @controls=@ctrl
    ,   @sp=0
    ,   @cp=0
    ,   @subjCount=@subjCount
    ,   @ctrlCount=@ctrlCount
    SELECT @best
    SELECT sId, cId, diff FROM #Assignments
    

    The call above assumes that both subjects and controls have been filtered by location, and that N copies of subjects has been inserted into the table-valued parameter (or the temp table in case of DB2) before making the call.

    Here is a running demo on sqlfiddle.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm having some trouble developing an algorithm to determine the minimum of a list
I am having some trouble developing a suitably fast binning algorithm in Mathematica. I
I'm having trouble wrapping my mind around developing this SQL query. Given the following
I'm developing a wall and I'm having trouble with the following code. As you
Having trouble with each function... Will try to explain by example... In my code,
Having trouble understanding how to filter an images table by tag information in a
I am having trouble developing some queries on the fly for our clients and
I am having trouble accessing the database while I am developing on the phone.
I am developing a site using kohana 2.3 and am having trouble with a
I am developing an application in C# (.NET), and am having trouble dealing with

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.