Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9079697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T19:53:46+00:00 2026-06-16T19:53:46+00:00

I just found the similar_text function and was playing around with it, but the

  • 0

I just found the similar_text function and was playing around with it, but the percentage output always suprises me. See the examples below.

I tried to find information on the algorithm used as mentioned on php: similar_text()Docs:

<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match

similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//9.5238095238095 
//5 out of 100 > not 5% ?


//Example from PHP.net
//Why is turning the strings around changing the result?

similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727

similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818

?>

Can anybody explain how this actually works?

Update:

Thanks to the comments I found that the percentage is actually calculated using the number of similar charactors * 200 / length1 + lenght 2

Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);

So that explains why the percenatges are higher then expected. With a string with 5 out of 95 it turns out 10, so that I can use.

similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>"; 
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10

But I still cant figure out why PHP returns a different result on turning the strings around. The JS code provided by dfsq doesn’t do this. Looking at the source code in PHP I can only find a difference in the following line, but i’m not a c programmer. Some insight in what the difference is, would be appreciated.

In JS:

for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);

In PHP: (php_similar_str function)

for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);

Source:

/* {{{ proto int similar_text(string str1, string str2 [, float percent])
   Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
  char *t1, *t2;
  zval **percent = NULL;
  int ac = ZEND_NUM_ARGS();
  int sim;
  int t1_len, t2_len;

  if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
    return;
  }

  if (ac > 2) {
    convert_to_double_ex(percent);
  }

  if (t1_len + t2_len == 0) {
    if (ac > 2) {
      Z_DVAL_PP(percent) = 0;
    }

    RETURN_LONG(0);
  }

  sim = php_similar_char(t1, t1_len, t2, t2_len);

  if (ac > 2) {
    Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
  }

  RETURN_LONG(sim);
}
/* }}} */ 


/* {{{ php_similar_str
 */
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
  char *p, *q;
  char *end1 = (char *) txt1 + len1;
  char *end2 = (char *) txt2 + len2;
  int l;

  *max = 0;
  for (p = (char *) txt1; p < end1; p++) {
    for (q = (char *) txt2; q < end2; q++) {
      for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
      if (l > *max) {
        *max = l;
        *pos1 = p - txt1;
        *pos2 = q - txt2;
      }
    }
  }
}
/* }}} */


/* {{{ php_similar_char
 */
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
  int sum;
  int pos1, pos2, max;

  php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);

  if ((sum = max)) {
    if (pos1 && pos2) {
      sum += php_similar_char(txt1, pos1,
                  txt2, pos2);
    }
    if ((pos1 + max < len1) && (pos2 + max < len2)) {
      sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                  txt2 + pos2 + max, len2 - pos2 - max);
    }
  }

  return sum;
}
/* }}} */

Source in Javascript: similar text port to javascript

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T19:53:47+00:00Added an answer on June 16, 2026 at 7:53 pm

    It would indeed seem the function uses different logic depending of the parameter order. I think there are two things at play.

    First, see this example:

    echo similar_text('test','wert'); // 1
    echo similar_text('wert','test'); // 2
    

    It seems to be that it is testing “how many times any distinct char on param1 is found in param2”, and thus result would be different if you swap the params around. It has been reported as a bug, which has been closed as “working as expected”.

    Now, the above is the same for both PHP and javascript implementations – paremeter order has an impact, so saying that JS code wouldn’t do this is wrong. This is argued in the bug entry as intended behaviour.

    Second – what doesn’t seem correct is the MYSQL/PHP word example. With that, javascript version gives 3 irrelevant of the order of params, whereas PHP gives 2 and 3 (and due to that, percentage is equally different). Now, the phrases “PHP IS GREAT” and “WITH MYSQL” should have 5 characters in common, irrelevant of which way you compare: H, I, S and T, one each, plus one for empty space. In order they have 3 characters, ‘H’, ‘ ‘ and ‘S’, so if you look at the ordering, correct answer should be 3 both ways. I modified the C code to a runnable version, and added some output, so one can see what is happening there (codepad link):

    #include<stdio.h>
    
    /* {{{ php_similar_str
     */
    static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
    {
      char *p, *q;
      char *end1 = (char *) txt1 + len1;
      char *end2 = (char *) txt2 + len2;
      int l;
    
      *max = 0;
      for (p = (char *) txt1; p < end1; p++) {
        for (q = (char *) txt2; q < end2; q++) {
          for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
          if (l > *max) {
            *max = l;
            *pos1 = p - txt1;
            *pos2 = q - txt2;
          }
        }
      }
    }
    /* }}} */
    
    
    /* {{{ php_similar_char
     */
    static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
    {
      int sum;
      int pos1, pos2, max;
    
      php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);
    
      if ((sum = max)) {
        if (pos1 && pos2) {
          printf("txt here %s,%s\n", txt1, txt2);
          sum += php_similar_char(txt1, pos1,
                      txt2, pos2);
        }
        if ((pos1 + max < len1) && (pos2 + max < len2)) {
          printf("txt here %s,%s\n", txt1+ pos1 + max, txt2+ pos2 + max);
          sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
                      txt2 + pos2 + max, len2 - pos2 - max);
        }
      }
    
      return sum;
    }
    /* }}} */
    int main(void)
    {
        printf("Found %d similar chars\n",
            php_similar_char("PHP IS GREAT", 12, "WITH MYSQL", 10));
        printf("Found %d similar chars\n",
            php_similar_char("WITH MYSQL", 10,"PHP IS GREAT", 12));
        return 0;
    }
    

    the result is output:

    txt here PHP IS GREAT,WITH MYSQL
    txt here P IS GREAT, MYSQL
    txt here IS GREAT,MYSQL
    txt here IS GREAT,MYSQL
    txt here  GREAT,QL
    Found 3 similar chars
    txt here WITH MYSQL,PHP IS GREAT
    txt here TH MYSQL,S GREAT
    Found 2 similar chars
    

    So one can see that on the first comparison, the function found ‘H’, ‘ ‘ and ‘S’, but not ‘T’, and got the result of 3. The second comparison found ‘I’ and ‘T’ but not ‘H’, ‘ ‘ or ‘S’, and thus got the result of 2.

    The reason for these results can be seen from the output: algorithm takes the first letter in the first string that second string contains, counts that, and throws away the chars before that from the second string. That is why it misses the characters in-between, and that’s the thing causing the difference when you change the character order.

    What happens there might be intentional or it might not. However, that’s not how javascript version works. If you print out the same things in the javascript version, you get this:

    txt here: PHP, WIT
    txt here: P IS GREAT,  MYSQL
    txt here: IS GREAT, MYSQL
    txt here: IS, MY
    txt here:  GREAT, QL
    Found 3 similar chars
    txt here: WITH, PHP 
    txt here: W, P
    txt here: TH MYSQL, S GREAT
    Found 3 similar chars
    

    showing that javascript version does it in a different way. What the javascript version does is that it finds ‘H’, ‘ ‘ and ‘S’ being in the same order in the first comparison, and the same ‘H’, ‘ ‘ and ‘S’ also on the second one – so in this case the order of params doesn’t matter.

    As the javascript is meant to duplicate the code of PHP function, it needs to behave identically, so I submitted bug report based on analysis of @Khez and the fix, which has been merged now.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I just search through the forum, but haven't found a similar problem. My problem
Just found out that the video output of the iPad is not a system
I just found out how to loop an array based on another array, but
I tried to find a similar question here, but found something different. I prefer
I've seen a few examples of similar situations on here, but haven't found anything
I just found Data::Section and I got interested in it. Unfortunately, I simply cannot
I just found out this weird behavior, is this a bug or what am
I just found out that every time onclick event for my <button> placed inside
I just found out about ie7-js ; IE7 is a JavaScript library to make
I just found out about superfish and currently using it on an asp.net. The

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.