I recently posted a question about optimizing the algorithm to compute the Levenshtein Distance,

Question

0

Asked: May 15, 20262026-05-15T16:52:42+00:00 2026-05-15T16:52:42+00:00

I recently posted a question about optimizing the algorithm to compute the Levenshtein Distance,

0

I recently posted a question about optimizing the algorithm to compute the Levenshtein Distance, and the replies lead me to the Wikipedia article on Levenshtein Distance.

The article mentioned that if there is a bound k on the maximum distance a possible result can be from the given query, then the running time can be reduced from O(mn) to O(kn), m and n being the lengths of the strings. I looked up the algorithm, but I couldn’t really figure out how to implement it. I was hoping to get some leads on that here.

The optimization is #4 under “Possible Improvements”.

The part that confused me was the one that said that we only need to compute a diagonal stripe of width 2k+1, centered on the main diagonal (the main diagonal is defined as coordinates (i,i)).

If someone could offer some help/insight, I would really appreciate it. If needed, I can post the complete description of the algorithm in the book as an answer here.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T16:52:43+00:00

I’ve done it a number of times. The way I do it is with a recursive depth-first tree-walk of the game tree of possible changes. There is a budget k of changes, that I use to prune the tree. With that routine in hand, first I run it with k=0, then k=1, then k=2 until I either get a hit or I don’t want to go any higher.

char* a = /* string 1 */;
char* b = /* string 2 */;
int na = strlen(a);
int nb = strlen(b);
bool walk(int ia, int ib, int k){
  /* if the budget is exhausted, prune the search */
  if (k < 0) return false;
  /* if at end of both strings we have a match */
  if (ia == na && ib == nb) return true;
  /* if the first characters match, continue walking with no reduction in budget */
  if (ia < na && ib < nb && a[ia] == b[ib] && walk(ia+1, ib+1, k)) return true;
  /* if the first characters don't match, assume there is a 1-character replacement */
  if (ia < na && ib < nb && a[ia] != b[ib] && walk(ia+1, ib+1, k-1)) return true;
  /* try assuming there is an extra character in a */
  if (ia < na && walk(ia+1, ib, k-1)) return true;
  /* try assuming there is an extra character in b */
  if (ib < nb && walk(ia, ib+1, k-1)) return true;
  /* if none of those worked, I give up */
  return false;
}

Added to explain trie-search:

// definition of trie-node:
struct TNode {
  TNode* pa[128]; // for each possible character, pointer to subnode
};

// simple trie-walk of a node
// key is the input word, answer is the output word,
// i is the character position, and hdis is the hamming distance.
void walk(TNode* p, char key[], char answer[], int i, int hdis){
  // If this is the end of a word in the trie, it is marked as
  // having something non-null under the '\0' entry of the trie.
  if (p->pa[0] != null){
    if (key[i] == '\0') printf("answer = %s, hdis = %d\n", answer, hdis);
  }
  // for every actual subnode of the trie
  for(char c = 1; c < 128; c++){
    // if it is a real subnode
    if (p->pa[c] != null){
      // keep track of the answer word represented by the trie
      answer[i] = c; answer[i+1] = '\0';
      // and walk that subnode
      // If the answer disagrees with the key, increment the hamming distance
      walk(p->pa[c], key, answer, i+1, (answer[i]==key[i] ? hdis : hdis+1));
    }
  }
}
// Note: you have to edit this to handle short keys.
// Simplest is to just append a lot of '\0' bytes to the key.

Now, to limit it to a budget, just refuse to descend if hdis is too large.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I recently posted a question about optimizing the algorithm to compute the Levenshtein Distance,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply