Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8629113
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T08:43:11+00:00 2026-06-12T08:43:11+00:00

A description of what I’m going to accomplish: Input 2 (N is not essential)

  • 0

A description of what I’m going to accomplish:

  • Input 2 (N is not essential) HTML documents.
  • Standardize the HTML format
  • Diff the two documents — external styles are not important but anything inline to the document will be included.
  • Determine delta at the HTML Block Element level.

Expanding the last point:

Imagine two pages of the same site that both share a sidebar with what was probably a common ancestor that has been copy/pasted. Each page has some minor changes to the sidebar. The diff will reveal these changes, then I can “walk up” the DOM to find the first common block element shared by them, or just default to <body>. In this case, I’d like to walk it up and find that, oh, they share a common <div id="sidebar">.

I’m familiar with DaisyDiff and the application is similar — in the CMS world.

I’ve also begun playing with the google diff-patch library.

I wanted to give ask this kind of non-specific question to hopefully solicit any advise or guidance that anybody thinks could be helpful. Currently if you put a gun to my head and said “CODE IT” I’d rewrite DaisyDiff in Python and add-in this block-level logic. But I thought maybe there’s a better way and the answers to Anyone have a diff algorithm for rendered HTML? made me feel warm and fuzzy.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T08:43:12+00:00Added an answer on June 12, 2026 at 8:43 am

    If you were going to start from scratch, a useful search term would be “tree diff”.

    There’s a pretty awesome blog post here, although I just found it by googling “daisydiff python” so I bet you’ve already seen it. Besides all the interesting theoretical stuff, he mentions the existence of Logilab’s xmldiff, an open-source XML differ written in Python. That might be a decent starting point — maybe less correct than trying to wrap or reimplement DaisyDiff, but probably easier to get up and running quickly.

    There’s also html-tree-diff on pypi, which I found via this Quora link: http://www.quora.com/Is-there-any-good-Python-implementation-of-a-tree-diff-algorithm

    There’s some theoretical stuff about tree diffing at efficient diff algorithm for trees and Levenshtein distance on cstheory.stackexchange.

    BTW, just to clarify, you are talking about diffing two DOM trees, but not necessarily rendering the diff/merge back into any particular HTML, right? (EDIT: Right.) A lot of the similarly-worded questions on here are really asking “how can I color deleted lines red and added lines green” or “how can I make matching paragraphs line up visually”, skipping right over the theoretical hard part of “how do I diff two DOM trees in the first place” and the practical hard part of “how do I parse possibly malformed HTML into a DOM tree even before that”. 🙂

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Problem description : - Step 1: Take input FILE_NAME from user at main thread.
Description: The application attempted to perform an operation not allowed by the security policy.
Description of ConcurrentBag on MSDN is not clear: Bags are useful for storing objects
Description I have two android application and here are the packages in which they
DESCRIPTION I have two datasets with information that I need to merge. The only
Description: Obtain output from an executable Note: Will not compile, due to fgets() declaration
Description for Assert.Equals() from the MSDN Documentation: Do not use this method. That's it,
description_id description -------------- ------------- 1 Ampicillin Oral 1 Ofloxacin Oral 1 Sulfamoxole+Trimethoprim Oral 2
Description: If we use java objects jruby get permgen too: System.out.println(Initialazing..); //Spring applicaton context
Description : I am creating customized title bar. The codes for customization is as

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.