intptr's blog

By intptr, history, 5 years ago, In English

Paul Masurel offers a great article about the Levenstein automata : link

He offers an algorithm to compute if the levenstein distance between two strings is less than or equal to D.

The naive algorithm is fairly straightforward :

def levenshtein(s1, s2, D=2):
    """
    Returns True iff the two string
    s1 and s2 is lesser or equal to D
    """
    if D == -1:
        return False
    if len(s1) < len(s2):
        return levenshtein(s2, s1)
    if len(s2) == 0:
        return len(s1) <= D
    return (levenshtein(s1[1:], s2[1:], D-1)   # substitution\
        or levenshtein(s1, s2[1:], D-1)       # insertion\
        or levenshtein(s1[1:], s2, D-1)       # deletion\
        or (
            # character match
            (s1[0] == s2[0]) and \
            levenshtein(s1[1:], s2[1:], D)
        ))

This basically tries all possibilities which is with exponential complexity if we don't save any of the calls.

He offers a better algorithm but for the last week I have been trying to understand how it works, here it is :


def levenshtein(s1, s2, D=2): """ Returns True iff the edit distance between the two strings s1 and s2 is lesser or equal to D """ if len(s1) == 0: return len(s2) <= D if len(s2) == 0: return len(s1) <= D # assuming s1[0] is NOT used to build s2, if D > 0: if levenshtein(s1[1:], s2, D - 1): # deletion return True if levenshtein(s1[1:], s2[1:], D - 1): # substitution return True # assuming s1[0] is used to build s2 for d in range(min(D+1, len(s2))): # d is the position where s1[0] # might be used. # it is also the number of character # that are required to be inserted before # using s1[d]. if s1[0] == s2[d]: if levenshtein(s1[1:], s2[d+1:], D - d): return True return False

It would be great if anyone can shed some light on how this works! Thanks!

 
 
 
 
  • Vote: I like it
  • +5
  • Vote: I do not like it

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by intptr (previous revision, new revision, compare).

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

import java.util.Scanner; public class MED {

public static void main(String[] args) {

    Scanner sc = new Scanner(System.in);
    int t = sc.nextInt();
    sc.nextLine();
    for (int i = 0; i < t; i++) {

    String s1 =sc.nextLine();
    String s2 = sc.nextLine();
    System.out.println(minimumEditDistance(s1,s2));
    }
}
private static int minimumEditDistance(String s1,String s2)
{
    int[][] dp = new int[s1.length()+1][s2.length()+1];

    for (int i = 0; i < s1.length(); i++) {
        dp[i][0] = i;
    }
    for (int i = 0; i < s2.length(); i++) {
        dp[0][i] = i;
    }

    for (int i = 1; i <= s1.length(); i++) {
        for (int j = 1; j <= s2.length(); j++) {
            if(s1.charAt(i-1) == s2.charAt(j-1))
                dp[i][j] = dp[i-1][j-1];
            else
                dp[i][j] = min(dp[i-1][j-1],dp[i][j-1],dp[i-1][j]) + 1;
        }
    }
    return dp[s1.length()][s2.length()];
}

private static int min(int a,int b,int c)
{
    int min = Math.min(a, b);
    return Math.min(min, c);
}

}