Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 98579
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T00:13:22+00:00 2026-05-11T00:13:22+00:00

You may have noticed that we now show an edit summary on Community Wiki

  • 0

You may have noticed that we now show an edit summary on Community Wiki posts:

community wiki
220 revisions, 48 users

I’d like to also show the user who ‘most owns’ the final content displayed on the page, as a percentage of the remaining text:

community wiki
220 revisions, 48 users
kronoz 87%

Yes, there could be top (n) ‘owners’, but for now I want the top 1.

Assume you have this data structure, a list of user/text pairs ordered chronologically by the time of the post:

 User Id     Post-Text -------     --------- 12          The quick brown fox jumps over the lazy dog. 27          The quick brown fox jumps, sometimes. 30          I always see the speedy brown fox jumping over the lazy dog. 

Which of these users most ‘owns’ the final text?

I’m looking for a reasonable algorithm — it can be an approximation, it doesn’t have to be perfect — to determine the owner. Ideally expressed as a percentage score.

Note that we need to factor in edits, deletions, and insertions, so the final result feels reasonable and right. You can use any stackoverflow post with a decent revision history (not just retagging, but frequent post body changes) as a test corpus. Here’s a good one, with 15 revisions from 14 different authors. Who is the ‘owner’?

https://stackoverflow.com/revisions/327973/list

Click ‘view source’ to get the raw text of each revision.

I should warn you that a pure algorithmic solution might end up being a form of the Longest Common Substring Problem. But as I mentioned, approximations and estimates are fine too if they work well.

Solutions in any language are welcome, but I prefer solutions that are

  1. Fairly easy to translate into c#.
  2. Free of dependencies.
  3. Put simplicity before efficiency.

It is extraordinarily rare for a post on SO to have more than 25 revisions. But it should ‘feel’ accurate, so if you eyeballed the edits you’d agree with the final decision. I encourage you to test your algorithm out on stack overflow posts with revision histories and see if you agree with the final output.


I have now deployed the following approximation, which you can see in action for every new saved revision on Community Wiki posts

  • do a line based diff of every revision where the body text changes
  • sum the insertion and deletion lines for each revision as ‘editcount’
  • each userid gets sum of ‘editcount’ they contributed
  • first revision author gets 2x * ‘editcount’ as initial score, as a primary authorship bonus
  • to determine final ownership percentage: each user’s edited line count total divided by total number of edited lines in all revisions

(There are also some guard clauses for common simple conditions like 1 revision, only 1 author, etcetera. The line-based diff makes it fairly speedy to recalc for all revisions; in a typical case of say 10 revisions it’s ~50ms.)

This works fairly well in my testing. It does break down a little when you have small 1 or 2 line posts that several people edit, but I think that’s unavoidable. Accepting Joel Neely’s answer as closest in spirit to what I went with, and upvoted everything else that seemed workable.

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T00:13:23+00:00Added an answer on May 11, 2026 at 12:13 am

    Saw your tweet earlier. From the display of the 327973 link, it appears you already have a single-step diff in place. Based on that, I’ll focus on the multi-edit composition:

    1. A, the original poster owns 100% of the post.

    2. When B, a second poster, makes edits such that e.g. 90% of the text is unchanged, the ownership is A:90%, B:10%.

    3. Now C, a third party, changes 50% of the text. (A:45%, B:5%, C:50%)

      In other words, when a poster makes edits such that x% is changed and y = (100-x)% is unchanged, then that poster now owns x% of the text and all previous ownership is multiplied by y%.

      To make it interesting, now suppose…

    4. A makes a 20% edit. Then A owns a ‘new’ 20%, and the residual ownerships are now multiplied by 80%, leaving (A:36%, B:4%, C:40%). The ‘net’ ownership is therefore (A:56%, B:4%, C:40%).

    Applying this to your specimen (327973) with everything rounded to the nearest percent:

    Version 0: The original post.

    • Paul Oyster: 100%

    Version 1: Your current diff tool shows pure addition of text, so all those characters belong to the second poster.

    • Paul Oyster: 91%
    • onebyone: 9%

    Version 2: The diff shows replacement of a word. The new word belong to the third poster, and the remaining text belongs to the prior posters.

    • Paul Oyster: 90%
    • onebyone: 9%
    • Blogbeard: 1%

    Version 3: Tag-only edit. Since your question was about the text, I’m ignoring the tags.

    • Paul Oyster: 90%
    • onebyone: 9%
    • Blogbeard: 1%

    Version 4: Addition of text.

    • Paul Oyster: 45%
    • onebyone: 4%
    • Blogbeard: 1%
    • Mark Harrison: 50%

    I hope that’s enough to give the sense of this proposal. It does have a couple of limitations, but I’m sliding these in under your statement that an approximation is acceptable. 😉

    1. It brute-forcedly distributes the effect of change across all prior owners. If A posts, B does a pure addition, and C edits half of what B added, this simplistic approach just applies C’s ownership across the entire post, without trying to parse out which prior ownership was changed the most.

    2. It accounts for additions or changes, but doesn’t give any ownership credit for deletion, because the deleter adds 0% to the remaining text. You can either regard this as a bug or a feature. I chose door number 2.

    Update: A bit more about issue #1 above. I believe that fully-tracking the ownership of the part of a post that is edited would require one of two things (The margin of the web page is not big enough for a formal proof ;-):

    • Changing the way text is stored to reflect ownership of individual portions of the text (e.g. A owns words 1-47, B owns words 48-59, A owns words 60-94,…), applying the ‘how much remains’ approach in my proposal to each portion, and updating the portion-ownership data.

    • Considering all versions from first to current (in effect, recomputing the portion-ownership data on the fly).

    So this is a nice example of a trade-off between a quick-and-dirty approximation (at the cost of precision), a change to the entire database (at the cost of space), or every calculation having to look at the entire history (at the cost of time).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This question may not directly relate to programming. I have noticed that the technology
To be precise, you guys may have noticed that facebook has this component placed
If you have IE8, you may have noticed a really handy feature that MS
You may have heard of PawSense , a Windows-only utility that prevents keystrokes from
my application may have more then one users on the same phone, and i
I'm sure this is a simple issue, but I have noticed that when I
One thing I have noticed on some sites is that they use one BIIIIIIIG
So I know I MAY have a null List (ArrayList, specifically). Now a simple
I have a php add classified page where users may fill in details in
I've been trying to debug this for a while now, and I may have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.