Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 168305
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T12:25:46+00:00 2026-05-11T12:25:46+00:00

I want to determine the similarity of the content of two news items, similar

  • 0

I want to determine the similarity of the content of two news items, similar to Google news but different in the sense that I want to be able determine what the basic topics are then determine what topics are related.

So if an article was about Saddam Hussein, then the algorithm might recommend something about Donald Rumsfeld’s business dealings in Iraq.

If you can just throw around key words like k-nearest neighbours and a little explanation about why they work (if you can) I will do the rest of the reseach and tweak the algorithm. Just looking for a place to get started, since I know someone out there must have tried something similar before.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T12:25:47+00:00Added an answer on May 11, 2026 at 12:25 pm

    First thoughts:

    • toss away noise words (and, you, is, the, some, …).
    • count all other words and sort by quantity.
    • for each word in the two articles, add a score depending on the sum (or product or some other formula) of the quantities.
    • the score represent the similarity.

    It seems to be that an article primarily about Donald Rumsfeld would have those two words quite a bit, which is why I weight them in the article.

    However, there may be an article mentioning Warren Buffet many times with Bill Gates once, and another mentioning both Bill Gates and Microsoft many times. The correlation there would be minimal.

    Based on your comment:

    So if an article was about Saddam Hussein, then the algorithm might recommend something about Donald Rumsfeld’s business dealings in Iraq.

    that wouldn’t be the case unless the Saddam article also mentioned Iraq (or Donald).

    That’s where I’d start and I can see potential holes in the theory already (an article about Bill Gates would match closely with an article about Bill Clinton if their first names are mentioned a lot). This may well be taken care of by all the other words (Microsoft for one Bill, Hillary for the other).

    I’d perhaps give it a test run before trying to introduce word-proximity functionality since that’s going to make it very complicated (maybe unnecessarily).

    One other possible improvement would be maintaining ‘hard’ associations (like always adding the word Afghanistan to articles with Osama bin Laden in them). But again, that requires extra maintenance for possibly dubious value since articles about Osama would almost certainly mention Afghanistan as well.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 73k
  • Answers 73k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • added an answer They are consistent. Internally, controls that are children of a… May 11, 2026 at 2:00 pm
  • added an answer You get the ding because you left the ESC in… May 11, 2026 at 2:00 pm
  • added an answer The color of the text cursor in an input on… May 11, 2026 at 2:00 pm

Related Questions

Is there a way to determine the device running an application. I want to
I want to determine if a certain window is visible to the user or
I want to programatically create an NSTextView. How can I determine the correct frame
I want to determine whether two different child nodes within an XML document are
I want to be able to pass something into an SQL query to determine
I've seen a few questions here related to determining the similarity of files, but
What I want to do is take a certain stock pattern (defined as a
I have a number of tracks recorded by a GPS, which more formally can
Update I summarized the question and its answers here My objective is to detect

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.