Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9150491
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T11:37:05+00:00 2026-06-17T11:37:05+00:00

I need to write a function which will compare 2-5 files (well really 2-5

  • 0

I need to write a function which will compare 2-5 “files” (well really 2-5 sets of database rows, but similar concept), and I have no clue of how to do it. The resulting diff should present the 2-5 files side by side. The output should show added, removed, changed and unchanged rows, with a column for each file.

What algorithm should I use to traverse rows so as to keep complexity low? The number of rows per file is less than 10,000. I probably won’t need External Merge as total data size is in the megabyte range. Simple and readable code would of course also be nice, but it’s not a must.

Edit: the files may be derived from some unknown source, there is no “original” to which the other 1-4 files can be compared to; all files will have to be compared to the others in their own right somehow.

Edit 2: I, or rather my colleague, realized that the contents may be sorted, as the output order is irrelevant. This solution means using additional domain knowledge to this part of the application, but also that diff complexity is O(N) and less complicated code. This solution is simple and I’ll disregards any answers to this edit when I close the bounty. However I’ll answer my own question for future reference.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T11:37:06+00:00Added an answer on June 17, 2026 at 11:37 am

    If all of the n files (where 2 <= n <= 5 for the example) have to be compared to the others, then it seems to me that the number of combinations to compare will be C(n,2), defined by (in Python, for instance) as:

    def C(n,k): 
        return math.factorial(n)/(math.factorial(k)*math.factorial(n-k))
    

    Thus, you would have 1, 3, 6 or 10 pairwise comparisons for n = 2, 3, 4, 5 respectively.

    The time complexity would then be C(n,2) times the complexity of the pairwise diff algorithm that you chose to use, which would be an expected O(ND), in the case of Myers’ algorithm, where N is the sum of the lengths of the two sequences to be compared, A and B, and D is the size of the minimum edit script for A and B.

    I’m not sure about the environment in which you need this code but difflib in Python, as an example, can be used to find the differences between all sorts of sequences – not just text lines – so it might be useful to you. The difflib documentation doesn’t say exactly what algorithm it uses, but its discussion of its time complexity makes me think that it is similar to Myers’.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to write a TSQL user defined function which will accept a string
I need to write a function which will randomize some words of my string.
I need to write a Mysql function which will check for the feature entry
I need to write a function which will detect if the input contains at
I need to write a templated function replace_all in C++ which will take a
I need some help in getting this right, problem Write a function which takes
Problem: I need to write a function which returns a value for a input
I want to write a function which will take a series of fields and
i'm need to write a function that will flip all the characters of a
I need to write a function that can take an if statement at runtime

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.