I am looking for an algorithm, preferably in Python that would help me locate

Question

0

Asked: May 18, 20262026-05-18T07:27:57+00:00 2026-05-18T07:27:57+00:00

I am looking for an algorithm, preferably in Python that would help me locate

0

I am looking for an algorithm, preferably in Python that would help me locate substrings, N characters long, of exisiting strings that are closest to a target string N character long.

Consider the target string, that is, say, 4 characters long, to be:

targetString -> '1111'

Assume this is the string I have available with me ( I will generate substrings of this for “best alignment” matching ):

nonEmptySubStrings -> ['110101']

Substrings of the above that are 4 characters long:

nGramsSubStrings -> ['0101', '1010', '1101']

I want to write/use a “Magic Function” that would select the string closest to targetString :

someMagicFunction -> ['1101']

Some more examples:

nonEmptySubStrings -> ['101011']
nGramsSubStrings -> ['0101', '1010', '1011']

someMagicFunction -> ['1011']

nonEmptySubStrings -> ['10101']
nGramsSubStrings -> ['0101', '1010']

someMagicFunction -> ['0101', '1010']

Is this “Magic Function” a well known substring problem?

I really want to find the min. number of changes in nonEmptySubStrings so that it would have targetString as a substring.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T07:27:58+00:00

Base on OP’s comment to question, this is what is desired

import functools

def edit_distance(str1, str2): 
    #implement it here

f = functools.operator(edit_distance, target_string)
return min(f(s) for s in slices(string_))   # use slices from below

This will return the minimum edit distance of any substring to the target string. It will not indicate which string that is or what its index is. It could be easily modified to do
so though.

The naive way, which can be the best way, is

import functools

def diff(str1, str2):
    # However you test the distance gets defined here. e.g. Hamming distance, 
    # Levenshtein distance, etc.


def slices(string_, L):
    for i in xrange(len(string_) - L + 1)):
        yield string_[i:i+L]

best_match = min(slices(string_), key=functools.partial(diff, target_string))

This wont return the index at which the substring occurs though. Of course you didn’t specify that you need it in your question 😉

If you want to get better than this, it will depend on how you’re measuring the distance and will basically boil down to avoiding checking some substrings by infering that you would have to change at least x chars to get a better match than you already have. At that point, you might as well just change x chars by jumping ahead x chars.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am looking for an algorithm, preferably in Python that would help me locate

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply