I’m working on an application that will require the Levenshtein algorithm to calculate the

Question

0

Asked: June 11, 20262026-06-11T09:09:59+00:00 2026-06-11T09:09:59+00:00

I’m working on an application that will require the Levenshtein algorithm to calculate the

0

I’m working on an application that will require the Levenshtein algorithm to calculate the similarity of two strings.

Along time ago I adapted a C# version (which can be easily found floating around in the internet) to VB.NET and it looks like this:

Public Function Levenshtein1(s1 As String, s2 As String) As Double
    Dim n As Integer = s1.Length
    Dim m As Integer = s2.Length

    Dim d(n, m) As Integer
    Dim cost As Integer
    Dim s1c As Char

    For i = 1 To n
        d(i, 0) = i
    Next
    For j = 1 To m
        d(0, j) = j
    Next

    For i = 1 To n
        s1c = s1(i - 1)

        For j = 1 To m
            If s1c = s2(j - 1) Then
                cost = 0
            Else
                cost = 1
            End If

            d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
        Next
    Next

    Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100
End Function

Then, trying to tweak it and improve its performance, I ended with version:

Public Function Levenshtein2(s1 As String, s2 As String) As Double
    Dim n As Integer = s1.Length
    Dim m As Integer = s2.Length

    Dim d(n, m) As Integer
    Dim s1c As Char
    Dim cost As Integer

    For i = 1 To n
        d(i, 0) = i
        s1c = s1(i - 1)

        For j = 1 To m
            d(0, j) = j

            If s1c = s2(j - 1) Then
                cost = 0
            Else
                cost = 1
            End If

            d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost)
        Next
    Next

    Return (1.0 - (d(n, m) / Math.Max(n, m))) * 100
End Function

Basically, I thought that the array of distances d(,) could be initialized inside of the main for cycles, instead of requiring two initial (and additional) cycles. I really thought this would be a huge improvement… unfortunately, not only does not improve over the original, it actually runs slower!

I have already tried to analyze both versions by looking at the generated IL code but I just can’t understand it.

So, I was hoping that someone could shed some light on this issue and explain why the second version (even when it has fewer for cycles) runs slower than the original?

NOTE: The time difference is about 0.15 nano seconds. This don’t look like much but when you have to check thousands of millions of strings… the difference becomes quite notable.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T09:10:01+00:00

It’s because of this:

 For i = 1 To n
        d(i, 0) = i
        s1c = s1(i - 1)

        For j = 1 To m
            d(0, j) = j 'THIS LINE HERE

You were just initializing this array at the beginning, but now you are initializing it n times. There is a cost involved with accessing memory in an array like this, and you are doing it an extra n times now. You could change the line to say: If i = 1 Then d(0, j) = j. However, in my tests, you still basically end up with a slightly slower version than the original. And that again makes sense. You’re performing this if statement n*m times. Again there is some cost. Moving it out like it is in the original version is a lot cheaper It ends up being O(n). Since the overall algorithm is O(n*m), any step you can move out into an O(n) step is going to be a win.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on an application that will require the Levenshtein algorithm to calculate the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply