Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4114838
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T22:30:21+00:00 2026-05-20T22:30:21+00:00

Everyone knows, if you want to thread emails you use Jamie Zawinski’s algorithm .

  • 0

Everyone knows, if you want to thread emails you use Jamie
Zawinski’s algorithm
. But it’s a new century, and there’s a
new messaging service.

What’s the best algorithm for threading status updates posted on
Twitter?

Things I’d definitely like it to cope with:

  • The easy part: using in_reply_to_status_id,
    in_reply_to_user_id and in_reply_to_screen_name.
    (Incidentally, finding proper documentation of these values
    would be useful in itself! Such documentation isn’t
    obviously linked to from
    here,
    for example.)

  • Good heuristics for inferring a “reply” relationship from
    messages that mention a user with the @ convention but aren’t
    explicitly in reply to a particular message. These
    “mentions” are provided in the “entities” element of
    statuses now

    if you request that. These heuristics might take into
    account (a) the time between two status updates, (b) whether
    there are subsquent replies between the two users, etc.
    (Replies that consist of an old-style retweet with an
    additional comment, as mentioned by user85509
    below

    are just an instance of this style of reply.)

  • Conversations that take place between more than two users.

  • Working with a set of tweets given to the algorithm, or all
    tweets on Twitter.

… but perhaps you can think of more.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T22:30:22+00:00Added an answer on May 20, 2026 at 10:30 pm

    Since there’s only been one answer, and the bounty deadline is approaching soon, I thought I should add a baseline answer so the bounty isn’t automatically awarded to an answer that doesn’t add much beyond what’s in the question.

    The obvious first step is to take your original set of tweets and follow all in_reply_to_status_id links to build many directed acyclic graphs. These relationships you can be nearly 100% sure about. (You should follow the links even through tweets that aren’t in the original set, adding those to the set of status updates that you’re considering.)

    Beyond that easy step, one has to do deal with the “mentions”. Unlike in email threading, there’s nothing helpful like a subject line that one can match on – this is inevitably going to be very error prone. The approach I would take is to create a feature vector for every possible relationship between status IDs that might be represented by mentions in that tweet, and then train a classifier to guess the best option, including a “no reply” option.

    To work out the “every possible relationship” bit, start by considering every status update that mentions one or more other users and doesn’t contain an in_reply_to_status_id. Suppose an example of one of these tweets is: 1

    @a @b no it isn't lol  RT @c Yes, absolutely. /cc @stephenfry
    

    … you would create a feature vector for the relationship between this update and every update with an earlier date in the timelines of @a, @b, @c, and @stephenfry for the last week (say) and one between that update and a special “no reply” update. Then you have to create a feature vector – you can add to this whatever you would like, but I would at least suggest adding:

    • The time that elapsed between the two updates – presumably replies are more likely to be to recent updates.
    • The proportion of the way through the tweet in terms of words that a mention occurs. e.g. if this is the first word, this would be a score of 0 and that’s probably more likely to indicate a reply than mentions later in the update.
    • The number of followers of the mentioned user – celebrities are presumably more likely to be spam-mentioned.
    • The length of the longest common substring between the updates, which might indicate direct quoting.
    • Is the mention preceded by “/cc” or other signifiers that indicate that this isn’t directly a reply to that person?
    • The following / followed ratio for the author of the original update.
    • etc.
    • etc.

    The more of these one can come up with the better, since the classifier will only use those that turn out to be useful. I’d suggest trying a random forest classifier, which is conveniently implemented in Weka.

    Next one needs a training set. This can be small at first – just enough to get a service that identifies conversations up-and-running. To this basic service, one would have to add a nice interface for correcting mismatched or falsely linked updates, so that users can correct them. Using this data one can build a bigger training set and a more accurate classifier.

    1 … which might be typical of the level of discourse on Twitter 😉

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hello everyone I'm trying to work with jboss messaging, does anyone knows the default
everyone knows that we are restricted to use the windows native fonts like arial,
I have HTML table in which user make GRUD operations. But everyone knows that
I think everyone knows this site http://pinterest.com/ and I don't want to create site
Everyone knows the = sign. SELECT * FROM mytable WHERE column1 = column2; However,
as everyone knows Windows does paths with backslashes where Unix does paths with forward
As everyone know the UINavigationController push a ViewController from Left To Right, is there
Ok I know everyone is going to tell me not to use RegEx for
Is there a way to turn ON UAC programmatically with C#? I know, everyone
Just want to let everyone know this is the first app I am trying

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.