I am writing a software to compare articles. I am looking for an efficient

Question

0

Asked: June 17, 20262026-06-17T15:44:13+00:00 2026-06-17T15:44:13+00:00

I am writing a software to compare articles. I am looking for an efficient

0

I am writing a software to compare articles. I am looking for an efficient and accurate algorithm to calculate the difference (variation) between two articles. The variation should completely depend on words and not letters. I have tried levenshtein() but it has a time complexity of O(n*m) which is very expensive when performed on big texts like an article. I have also tried similar_text() which has a higher time complexity of O(n*m*3). Moreover, levenshtein() and similar_text() calculates the number of operations needed to transform one string to another which is not an accurate way to calculate the difference between two big articles.

What other options do I have?

EDIT:

I am trying to calculate the variation approximately from the point of view of a search engine (Google).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T15:44:14+00:00

In my case, I needed to calculate the variation between two articles. So, I found that very simple solution working for me very well. It works by simply calculating the similarity as the common words between the two articles divided by max(number of words in article A, number of words in article B). The variation then is calculated by subtracting the similarity from 100 to get the variation percentage. The code below explains it all.

function get_variation($article1,$article2){

      $wordsA = array_unique(preg_split('@[\W]+@', $article1));
      $wordsB = array_unique(preg_split('@[\W]+@', $article2));
      $intersection = array_intersect($wordsA, $wordsB);
      $similarity = (count($intersection)/ (max(count($wordsA),count($wordsB))) * 100);
      $similarity =  number_format($similarity, 2, '.', '');
      $variation = 100-$similarity;
      return $variation;
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a software to compare articles. I am looking for an efficient

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply