Monthly Archives: May 2014
In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question.
In bioinformatics, it can be used to quantify the similarity of macromolecules such as DNA, which can be viewed as strings of the letters A, C, G and T.
Several definitions of edit distance exist, using different sets of string operations. One of the most common variants is called Levenshtein distance, named after the Soviet Russian computer scientist Vladimir Levenshtein. In this version, the allowed operations are the removal or insertion of a single character, or the substitution of one character for another. Levenshtein distance may also simply be called “edit distance”, although several variants exist.
Levenshtein distance algorithm gives shortest path to reach to the destination string but that is not unique. So, The algorithm will give reliable minimum cost path only if all the operations i.e Insertion, Deletion and Substitution have same cost.
In this text I will analyze and provide an algorithm using decision tree and list for getting minimum cost path in a string transformation where all operations have varying cost depending on the character on which the operation is performed.
In the text I will also present source code in Objective-C language for the implementation of the algorithm.