In an attempt to rewrite PHP’s similar_text algorithm I have tried a few different approaches. All have been moderately successful but ultimately failed.
First attempt: I tried just rewriting it from the PHP source code. C’s elegant use of pointers makes the same exact implementation seemingly impossible to make in Scala and be clean.
Second attempt: I tried rewriting it from a Java function someone posted on PHP similar_text() in java. Unfortunately that function doesn’t work in Java so nevermind porting it over to Scala.
Third (current) attempt: I’m currently attempting to translate this JavaScript implementation into Scala: http://phpjs.org/functions/similar_text/. I’ve used it before in JavaScript and it seems to function properly. My translation (below) into Scala is not functioning properly. It gets you within 1 or 2 similarity indexes but it is typically not 100% to the results of it’s PHP counterpart.
def similartext(first:String,second:String) : Int = {
if (first == null || second == null) {
0
}
var pos1:Int = 0
var pos2:Int = 0
var max:Int = 0
var sum:Int = 0
var l:Int = 0
val firstLength:Int = first.length
val secondLength:Int = second.length
for (p <- 0 until firstLength) {
for (q <- 0 until secondLength) {
while(p+l < firstLength && q+l < secondLength && (first.charAt(p+l) == second.charAt(q+l))) {
if (l > max) {
println("[" + p + "," + q + "," + l + "]" + first.charAt(p+l) + " | " + second.charAt(q+l))
max = l
pos1 = p
pos2 = q
}
l += 1
}
}
}
sum = max;
if (sum > 0) {
if (pos1 > 0 && pos2 > 0) {
sum += similartext(first.substring(0, pos2), second.substring(0, pos2))
}
if ((pos1 + max < firstLength) && (pos2 + max < secondLength)) {
sum += similartext(first.substring(pos1 + max, (pos1 + max) + (firstLength - pos1 - max)), second.substring(pos2 + max, (pos2 + max) + (secondLength - pos2 - max)))
}
}
sum;
}
Tests:
(Scala)val st = similartext("apple","aple") Yields 3
(PHP)$similar = similar_text("apple","aple"); Yields 4
(Scala)val st = similartext("starbucks","stharducks") Yields 8
(PHP)$similar = similar_text("starbucks","stharducks"); Yields 8
(Scala)val st = similartext("hello earth!","hello world!") Yields 10
(PHP)$similar = similar_text("hello earth!","hello world!"); Yields 8
Does anyone have any ideas on what is going wrong here?
Here’s a hint: look very closely at line 28 of the JavaScript version—particularly the last character of the line. That’s where your implementation differs. (You also don’t reset
lto zero for every pair of indices, but that’s not the most important problem.)Here’s a
var-free Scala version, by the way:This is fairly off-the-cuff—I’m sure you could make it more concise and efficient pretty easily.
And for extra credit: there’s a bug in the JavaScript version—try for example
"aabcd"and"abcabcd"and the result won’t be the same as PHP’s (or mine).