Here I will give a few simple methods to implement the Levenshtein distance (Edit Distance) algorithm. This a good algorithm to implement for beginners in java or rather a programming language in general.
To know more about what this algorithm does and how it works, see my previous post here.
The first method takes the input the two string s to be compared, calls the matrixGenerator method and finally this returns the editDistance value.
the editMatrix Generator creates the matrix using the algorithm and generates the matrix.
Finally to get a proper confidence value we send the editDistance value obtained, along with the input to the edSimilarityCal method to get a value between [0,1].
float edSimilarityCal(String string1, String string2, int editDistance) {
float maxLen=string1.length();
if (maxLen < string2.length())
maxLen = string2.length();
if (maxLen == 0.0F)
return 1.0F;
else
return 1.0F - editDistance/maxLen;
}
Well that's about it.
Cheers!
To know more about what this algorithm does and how it works, see my previous post here.
The first method takes the input the two string s to be compared, calls the matrixGenerator method and finally this returns the editDistance value.
int editDistValue(String str1, String str2) {
int editRow = str1.length() + 1; //initialize row
int editCol = str2.length() + 1; //initialize column
int[][] editMatrix = new int[editRow][editCol];
editMatrix = editMatrixGenerator(str1, str2, editMatrix, editRow,
editCol);
int editDistance = 999;
editDistance = editMatrix[editRow - 1][editCol - 1];
return editDistance;
}
int editRow = str1.length() + 1; //initialize row
int editCol = str2.length() + 1; //initialize column
int[][] editMatrix = new int[editRow][editCol];
editMatrix = editMatrixGenerator(str1, str2, editMatrix, editRow,
editCol);
int editDistance = 999;
editDistance = editMatrix[editRow - 1][editCol - 1];
return editDistance;
}
the editMatrix Generator creates the matrix using the algorithm and generates the matrix.
int[][] editMatrixGenerator(String str1, String str2,
int[][] editMatrix, int editRow, int editCol) {
int i, j;
char[] rowString = str1.toCharArray();
char[] colString = str2.toCharArray();
for (i = 0; i < editRow; i++) {
for (j = 0; j < editCol; j++) {
if (i == 0) {
editMatrix[0][j] = j;
}
if (j == 0) {
editMatrix[i][0] = i;
}
if (i != 0 && j != 0) {
if (rowString[i - 1] == colString[j - 1]) {
editMatrix[i][j] = editMatrix[i - 1][j - 1];
} else {
editMatrix[i][j] = (Math.min(editMatrix[i - 1][j - 1],
Math.min(editMatrix[i][j - 1],
editMatrix[i - 1][j]))) + 1;
}
}
}
}
return editMatrix;
}
int[][] editMatrix, int editRow, int editCol) {
int i, j;
char[] rowString = str1.toCharArray();
char[] colString = str2.toCharArray();
for (i = 0; i < editRow; i++) {
for (j = 0; j < editCol; j++) {
if (i == 0) {
editMatrix[0][j] = j;
}
if (j == 0) {
editMatrix[i][0] = i;
}
if (i != 0 && j != 0) {
if (rowString[i - 1] == colString[j - 1]) {
editMatrix[i][j] = editMatrix[i - 1][j - 1];
} else {
editMatrix[i][j] = (Math.min(editMatrix[i - 1][j - 1],
Math.min(editMatrix[i][j - 1],
editMatrix[i - 1][j]))) + 1;
}
}
}
}
return editMatrix;
}
Finally to get a proper confidence value we send the editDistance value obtained, along with the input to the edSimilarityCal method to get a value between [0,1].
float edSimilarityCal(String string1, String string2, int editDistance) {
float maxLen=string1.length();
if (maxLen < string2.length())
maxLen = string2.length();
if (maxLen == 0.0F)
return 1.0F;
else
return 1.0F - editDistance/maxLen;
}
Well that's about it.
Cheers!