You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
Solution: Two Dimensional Dynamic Programming
Remember when a problem is to find the optimal value, DP is a possible solution.
This is a classic DP problem.
dp[i][j] refers to edit distance between X[0...i] and Y[0...j], so
Think about it. For the last step, we can either:
a. insert a letter of Y[len2-1] to X
b. delete a letter X[len1-1] from X
c. substitue X[len1-1] with Y[len2-1]
Basic cases: dp[0][0] = 0; dp[i][0] = i, i delete operations, similarly (empty string to word[0..i]), dp[0][j] = j, j insert operations
2. Optimal functions:
dp[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+0 or 1))
d[i-1][j] last step is to delete X[i] (X[1.i-1] to Y[1..j] is already optimal)
d[i][j-1] last step is to insert Y[j] ( X[1..i] to Y[1..j-1] is already optimal)
d[i-1][j-1] last step is to substitute X[i] with Y[j], if(X[i] == Y[j]) plus 0, else plus 1
这里是中文解释(改进版)
但对于编辑距离的话,当我们要计算d(i,j)时,即计算A(i)到B(j)之间的编辑距离,此时,设A(i)形式是somestr1c;B(i)形如somestr2d的话,
将somestr1变成somestr2的编辑距离已知是d(i-1,j-1)
将somestr1c变成somestr2的编辑距离已知是d(i,j-1) 将somestr1变成somestr2d的编辑距离已知是d(i-1,j) 那么利用这三个变量,就可以递推出d(i,j)了: 如果c==d,显然编辑距离和d(i-1,j-1)是一样的 如果c!=d,情况稍微复杂一点,
如果将c替换成d,编辑距离是somestr1变成somestr2的编辑距离 + 1,也就是d(i-1,j-1) + 1
d(i,j-1) 意味着已经将somestr1c变成somestr2的编辑距离, 现在只要再insert 字母d即可, 即d(i,j-1) + 1
d(i-1,j) 意味着已经将somestr1变成somestr2d了,即已经将somestr1c变成(somestr2d)c了 现在只需把A的最后一个字母c delete即可,d(i-1,j) + 1
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
int minDistance(string word1, string word2) { | |
int len1 = word1.size(); | |
int len2 = word2.size(); | |
if(len1 == 0 || len2 == 0) return len1+len2; | |
vector<vector<int>> dp = vector<vector<int>>(len1+1, vector<int>(len2+1, 0)); | |
for(int i =0; i<=len1; i++){ | |
dp[i][0] = i;//from X[0..i] to emtpy string | |
} | |
for(int i=0; i<=len2; i++){ | |
dp[0][i] = i;//from empty string to Y[0..i] | |
} | |
for(int i=0; i<len1; i++) | |
for(int j=0; j<len2; j++){ | |
dp[i+1][j+1] = min(dp[i][j+1], dp[i+1][j]) + 1; | |
if(word1[i] == word2[j]) | |
dp[i+1][j+1] = min(dp[i+1][j+1], dp[i][j]); | |
else | |
dp[i+1][j+1] = min(dp[i+1][j+1], dp[i][j]+1);// replace X[i] with Y[j] | |
} | |
return dp[len1][len2]; | |
} |
Experience:
DP array, use dp[m][n] to represent the result, so have to assign (m+1)*(n+1).
Using dp[m-1][n-1] is not recommended here.
No comments:
Post a Comment