Reciently I’ve looked through several implementation of bitap algorithm but what all of them do is finding the beginning point of fuzzy match. What I need is to find a match. There’s an example:
Say we have following text: abcdefg
and a pattern: bzde
and we want to find all occurence of a pattern in text with at most 1 error (Edit distance is concidered).
So I need that the algorithm returns: bcde.
Is there a simple (or not simple =) ) way to do it?
The original artical about this algorithm doens’t answer the question.
Thank you for your help.
For a simple start you could approach it with a series of regular expressions where in every expression you replace 1 character with the
.wildcard. Combining those expressions into one using the( | )construct to create one big regular expression.Another way would be to scan the string keeping an errorcount and incrementing the offset you match at when too many errors are encountered.