Day 21 - Python

Minimum Edit Distance

Problem Statement:

You are given two strings, word1 and word2. Your task is to implement a function min_edit_distance(word1, word2) that calculates the minimum number of single-character edits required to change one word into another. The allowed edits are insertions, deletions, or substitutions of a single character. This program implements a modified version of the Levenshtein distance algorithm, which calculates the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into another.

Function Signature:

def min_edit_distance(word1: str, word2: str) -> int:

Input:

Two strings, word1 and word2, each consisting of lowercase English letters.

Output:

An integer representing the minimum number of single-character edits required to change word1 into word2.

Constraints:

1 ≤ length of word1, word2 ≤ 100

Algorithm:

The function min_edit_distance(word1, word2) uses dynamic programming to calculate the minimum edit distance between word1 and word2:

Initialize a 2D matrix dp with dimensions (len(word1) + 1) x (len(word2) + 1).

Initialize the first row and column of dp with values representing the lengths of prefixes of word1 and word2.

Iterate through each cell (i, j) in the matrix, where i represents the index in word1 and j represents the index in word2.

If the characters word1[i - 1] and word2[j - 1] are equal, set dp[i][j] equal to dp[i - 1][j - 1].

Otherwise, set dp[i][j] equal to the minimum of dp[i - 1][j - 1], dp[i - 1][j], and dp[i][j - 1], incremented by 1.

Return the value in the bottom-right cell of the matrix, which represents the minimum edit distance between word1 and word2.

Example:

Sample Input:

kitten

sitting

Sample Output:

3

Explanation:

To change "kitten" into "sitting", you need to perform the following edits:

Replace 'k' with 's'

Replace 'e' with 'i'

Add 'g' at the end.

So, the minimum number of edits required is 3.

Test Cases:

Test Case 1: silent, listen --> 4

Test Case 2: algorithm, logarithm --> 3

Test Case 3: intention, execution --> 5

Test Case 4: house, mouse --> 1

Test Case 5: apple, mango --> 5 [All 5 to be changed]

Test Case 6: abcde, abcde --> 0 [No change]

Test Case 7: abcdef, abcde --> 1 [Deletion]

Test Case 8: abcde, fghij --> 5 [All 5 to be changed]

Test Case 9: "", test --> 4 [4 Insertion]

Test Case 10: test, "" --> 4 [4 Deletion]

Ensure that your solution produces the expected output for each of these test cases to verify its correctness.


SOLUTION:

def min_edit_distance(w1, w2):

    l1, l2 = len(w1), len(w2)

    

    dp = [[0] * (l2 + 1) for _ in range(l1 + 1)]

    

    for i in range(l1 + 1):

        dp[i][0] = i

    for j in range(l2 + 1):

        dp[0][j] = j

    

    for i in range(1, l1 + 1):

        for j in range(1, l2 + 1):

            if w1[i - 1] == w2[j - 1]:

                dp[i][j] = dp[i - 1][j - 1]

            else:

                dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1][j], dp[i][j - 1]) + 1

    

    return dp[l1][l2]


word1 = input().strip()

word2 = input().strip()


result = min_edit_distance(word1, word2)

print(result)


Insights:

  • The min_edit_distance function utilizes dynamic programming to efficiently compute the minimum edit distance between two given words.
  • It initializes a 2D matrix dp with dimensions (len(w1) + 1) x (len(w2) + 1) to store the minimum edit distances for substrings of w1 and w2.
  • The function sets the values in the first row and column of the matrix to represent the lengths of prefixes of w1 and w2, respectively as Base Cases.
  • It iterates through each cell (i, j) in the matrix, where i represents the index in w1 and j represents the index in w2. For each cell, it computes the minimum edit distance based on three possible operations: substitution, insertion, and deletion.
  • If the characters at positions i - 1 in w1 and j - 1 in w2 are equal, it assigns the value of the top-left diagonal cell to the current cell, indicating no edit is needed.
  • If the characters are not equal, it selects the minimum value among the top-left diagonal cell (substitution), the cell above (deletion), and the cell to the left (insertion), and increments it by 1 to represent the minimum edit distance.
  • The function builds the solution bottom-up, leveraging the optimal substructure property of the problem, where the optimal solution to a problem can be constructed from optimal solutions to its subproblems.
  • It optimizes space complexity by using a 2D matrix of size (len(w1) + 1) x (len(w2) + 1) to store the minimum edit distances, allowing for efficient storage of intermediate results.
  • The time complexity of the function is O(n * m), where n is the length of w1 and m is the length of w2, as it iterates through the entire matrix to fill in the values.
Happy Coding! :)

Comments