Paul Masurel offers a great article about the Levenstein automata : [link](http://fulmicoton.com/posts/levenshtein/)↵
↵
He offers an algorithm to compute if the levenstein distance between two strings is less than or equal to D.↵
↵
The naive algorithm is fairly straightforward : ↵
↵
↵
~~~~~↵
def levenshtein(s1, s2, D=2):↵
"""↵
Returns True iff the two string↵
s1 and s2 is lesser or equal to D↵
"""↵
if D == -1:↵
return False↵
if len(s1) < len(s2):↵
return levenshtein(s2, s1)↵
if len(s2) == 0:↵
return len(s1) <= D↵
return (levenshtein(s1[1:], s2[1:], D-1) # substitution\↵
or levenshtein(s1, s2[1:], D-1) # insertion\↵
or levenshtein(s1[1:], s2, D-1) # deletion\↵
or (↵
# character match↵
(s1[0] == s2[0]) and \↵
levenshtein(s1[1:], s2[1:], D)↵
))↵
~~~~~↵
↵
This basically tries all possibilities which is with exponential complexity if we don't save any of the calls.↵
↵
↵
He offers a better algorithm but for the last week I have been trying to understand how it works, here it is : ↵
↵
↵
↵
~~~~~↵
↵
def levenshtein(s1, s2, D=2):↵
"""↵
Returns True iff the edit distance between↵
the two strings s1 and s2 is lesser or↵
equal to D↵
"""↵
if len(s1) == 0:↵
return len(s2) <= D↵
if len(s2) == 0:↵
return len(s1) <= D↵
# assuming s1[0] is NOT used to build s2,↵
if D > 0:↵
if levenshtein(s1[1:], s2, D - 1):↵
# deletion↵
return True↵
if levenshtein(s1[1:], s2[1:], D - 1):↵
# substitution↵
return True↵
# assuming s1[0] is used to build s2↵
for d in range(min(D+1, len(s2))):↵
# d is the position where s1[0]↵
# might be used.↵
# it is also the number of character↵
# that are required to be inserted before↵
# using s1[d].↵
if s1[0] == s2[d]:↵
if levenshtein(s1[1:], s2[d+1:], D - d):↵
return True↵
return False↵
~~~~~↵
↵
It would be great if anyone can shed some light on how this works! Thanks!
↵
He offers an algorithm to compute if the levenstein distance between two strings is less than or equal to D.↵
↵
The naive algorithm is fairly straightforward : ↵
↵
↵
~~~~~↵
def levenshtein(s1, s2, D=2):↵
"""↵
Returns True iff the two string↵
s1 and s2 is lesser or equal to D↵
"""↵
if D == -1:↵
return False↵
if len(s1) < len(s2):↵
return levenshtein(s2, s1)↵
if len(s2) == 0:↵
return len(s1) <= D↵
return (levenshtein(s1[1:], s2[1:], D-1) # substitution\↵
or levenshtein(s1, s2[1:], D-1) # insertion\↵
or levenshtein(s1[1:], s2, D-1) # deletion\↵
or (↵
# character match↵
(s1[0] == s2[0]) and \↵
levenshtein(s1[1:], s2[1:], D)↵
))↵
~~~~~↵
↵
This basically tries all possibilities which is with exponential complexity if we don't save any of the calls.↵
↵
↵
He offers a better algorithm but for the last week I have been trying to understand how it works, here it is : ↵
↵
↵
↵
~~~~~↵
↵
def levenshtein(s1, s2, D=2):↵
"""↵
Returns True iff the edit distance between↵
the two strings s1 and s2 is lesser or↵
equal to D↵
"""↵
if len(s1) == 0:↵
return len(s2) <= D↵
if len(s2) == 0:↵
return len(s1) <= D↵
# assuming s1[0] is NOT used to build s2,↵
if D > 0:↵
if levenshtein(s1[1:], s2, D - 1):↵
# deletion↵
return True↵
if levenshtein(s1[1:], s2[1:], D - 1):↵
# substitution↵
return True↵
# assuming s1[0] is used to build s2↵
for d in range(min(D+1, len(s2))):↵
# d is the position where s1[0]↵
# might be used.↵
# it is also the number of character↵
# that are required to be inserted before↵
# using s1[d].↵
if s1[0] == s2[d]:↵
if levenshtein(s1[1:], s2[d+1:], D - d):↵
return True↵
return False↵
~~~~~↵
↵
It would be great if anyone can shed some light on how this works! Thanks!