You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
Solution: Two Dimensional Dynamic Programming
Remember when a problem is to find the optimal value, DP is a possible solution.
This is a classic DP problem.
dp[i][j] refers to edit distance between X[0...i] and Y[0...j], so
Think about it. For the last step, we can either:
a. insert a letter of Y[len2-1] to X
b. delete a letter X[len1-1] from X
c. substitue X[len1-1] with Y[len2-1]
Basic cases: dp[0][0] = 0; dp[i][0] = i, i delete operations, similarly (empty string to word[0..i]), dp[0][j] = j, j insert operations
2. Optimal functions:
dp[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+0 or 1))
d[i-1][j] last step is to delete X[i] (X[1.i-1] to Y[1..j] is already optimal)
d[i][j-1] last step is to insert Y[j] ( X[1..i] to Y[1..j-1] is already optimal)
d[i-1][j-1] last step is to substitute X[i] with Y[j], if(X[i] == Y[j]) plus 0, else plus 1
这里是中文解释(改进版)
但对于编辑距离的话,当我们要计算d(i,j)时,即计算A(i)到B(j)之间的编辑距离,此时,设A(i)形式是somestr1c;B(i)形如somestr2d的话,
将somestr1变成somestr2的编辑距离已知是d(i-1,j-1)
将somestr1c变成somestr2的编辑距离已知是d(i,j-1) 将somestr1变成somestr2d的编辑距离已知是d(i-1,j) 那么利用这三个变量,就可以递推出d(i,j)了: 如果c==d,显然编辑距离和d(i-1,j-1)是一样的 如果c!=d,情况稍微复杂一点,
如果将c替换成d,编辑距离是somestr1变成somestr2的编辑距离 + 1,也就是d(i-1,j-1) + 1
d(i,j-1) 意味着已经将somestr1c变成somestr2的编辑距离, 现在只要再insert 字母d即可, 即d(i,j-1) + 1
d(i-1,j) 意味着已经将somestr1变成somestr2d了,即已经将somestr1c变成(somestr2d)c了 现在只需把A的最后一个字母c delete即可,d(i-1,j) + 1
Experience:
DP array, use dp[m][n] to represent the result, so have to assign (m+1)*(n+1).
Using dp[m-1][n-1] is not recommended here.
No comments:
Post a Comment