Saturday, 27 July 2013

Easy Java Implementation of the Levenshtein distance (Edit Distance) Algorithm

Here I will give a few simple methods to implement the Levenshtein distance (Edit Distance) algorithm. This a good algorithm to implement for beginners in java or rather a programming language in general.
To know more about what this algorithm does and how it works, see my previous post here.

The first method takes the input the two string s to be compared, calls the matrixGenerator method and finally this returns the editDistance value.

int editDistValue(String str1, String str2) {
       
        int editRow = str1.length() + 1; //initialize row
        int editCol = str2.length() + 1; //initialize column

        int[][] editMatrix = new int[editRow][editCol];
        editMatrix = editMatrixGenerator(str1, str2, editMatrix, editRow,
                editCol);

        int editDistance = 999;
        editDistance = editMatrix[editRow - 1][editCol - 1];

        return editDistance;
    }

the editMatrix Generator creates the matrix using the algorithm and generates the matrix.

int[][] editMatrixGenerator(String str1, String str2,
            int[][] editMatrix, int editRow, int editCol) {
       
        int i, j;

        char[] rowString = str1.toCharArray();
        char[] colString = str2.toCharArray();

        for (i = 0; i < editRow; i++) {
         for (j = 0; j < editCol; j++) {
           if (i == 0) {
             editMatrix[0][j] = j;
            }
           if (j == 0) {
             editMatrix[i][0] = i;
            }
           if (i != 0 && j != 0) {
             if (rowString[i - 1] == colString[j - 1]) {
              editMatrix[i][j] = editMatrix[i - 1][j - 1];
             } else {
              editMatrix[i][j] = (Math.min(editMatrix[i - 1][j - 1],
                                   Math.min(editMatrix[i][j - 1],
                                   editMatrix[i - 1][j]))) + 1;
                    }
                }
            }
        }
        return editMatrix;
    }

Finally to get a proper confidence value we send the editDistance value obtained, along with the input to the edSimilarityCal method to get a value between [0,1].

float edSimilarityCal(String string1, String string2, int editDistance) {
       
        float maxLen=string1.length();
        if (maxLen < string2.length())
        maxLen = string2.length();
        if (maxLen == 0.0F)
        return 1.0F;
        else
        return 1.0F - editDistance/maxLen;  
    }


Well that's about it.
Cheers!

No comments:

Post a Comment