Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7765709
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T15:13:54+00:00 2026-06-01T15:13:54+00:00

The diff program, in its various incarnations, is reasonably good at computing the difference

  • 0

The diff program, in its various incarnations, is reasonably good at computing the difference between two text files and expressing it more compactly than showing both files in their entirety. It shows the difference as a sequence of inserted and deleted chunks of lines (or changed lines in some cases, but that’s equivalent to a deletion followed by an insertion). The same or very similar program or algorithm is used by patch and by source control systems to minimize the storage required to represent the differences between two versions of the same file. The algorithm is discussed here and here.

But it falls down when blocks of text are moved within the file.

Suppose you have the following two files, a.txt and b.txt (imagine that they’re both hundreds of lines long rather than just 6):

a.txt   b.txt
-----   -----
1       4
2       5
3       6
4       1
5       2
6       3

diff a.txt b.txt shows this:

$ diff a.txt b.txt 
1,3d0
< 1
< 2
< 3
6a4,6
> 1
> 2
> 3

The change from a.txt to b.txt can be expressed as “Take the first three lines and move them to the end”, but diff shows the complete contents of the moved chunk of lines twice, missing an opportunity to describe this large change very briefly.

Note that diff -e shows the block of text only once, but that’s because it doesn’t show the contents of deleted lines.

Is there a variant of the diff algorithm that (a) retains diff‘s ability to represent insertions and deletions, and (b) efficiently represents moved blocks of text without having to show their entire contents?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T15:13:56+00:00Added an answer on June 1, 2026 at 3:13 pm

    Since you asked for an algorithm and not an application, take a look at “The String-to-String Correction Problem with Block Moves” by Walter Tichy. There are others, but that’s the original, so you can look for papers that cite it to find more.

    The paper cites Paul Heckel’s paper “A technique for isolating differences between files” (mentioned in this answer to this question) and mentions this about its algorithm:

    Heckel[3] pointed out similar problems with LCS techniques and proposed a
    linear-lime algorithm to detect block moves. The algorithm performs adequately
    if there are few duplicate symbols in the strings. However, the algorithm gives
    poor results otherwise. For example, given the two strings aabb and bbaa,
    Heckel’s algorithm fails to discover any common substring.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Within a java program, I want to calculate the diff between two versions of
git diff master..lab It will produce the diff between the tips of the two
I need to diff two log files but ignore the time stamp part of
My program generates two strings and I want them compared by the external diff
I need to write a C program for finding the differences between two folders,
I have two files from a C program. I guess both belong to quite
I have a program that takes two files as an argument. The first file
Does anyone know of a diff viewer or comparison program that can do paragraph-based
How do I make diff ignore temporary files like foo.c~ ? Is there a
I'm trying to diff two strings to determine whether or not they solely vary

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.