Minimum Cost String Transformation Problem with varying cost of operations

Overview

In computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.

Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question.

In bioinformatics, it can be used to quantify the similarity of macromolecules such as DNA, which can be viewed as strings of the letters A, C, G and T.

Several definitions of edit distance exist, using different sets of string operations. One of the most common variants is called Levenshtein distance, named after the Soviet Russian computer scientist Vladimir Levenshtein. In this version, the allowed operations are the removal or insertion of a single character, or the substitution of one character for another. Levenshtein distance may also simply be called “edit distance”, although several variants exist.

Levenshtein distance algorithm gives shortest path to reach to the destination string but that is not unique. So, The algorithm will give reliable minimum cost path only if all the operations i.e Insertion, Deletion and Substitution have same cost.

In this text I will analyze and provide an algorithm using decision tree and list for getting minimum cost path in a string transformation where all operations have varying cost depending on the character on which the operation is performed.

In the text I will also present source code in Objective-C language for the implementation of the algorithm.

Problem Statement

Values

Let ‘1’ = 1, ‘2’ = 2, … ‘9’ = 9, ‘0’ = 10 and

‘a’ = ‘A’ = 11, ‘b’ = ‘B’ = 12, ‘c’ = ‘C’ = 13 …. so on till … ‘z’ = ‘Z’ = 36

null or empty character = 0

Rules

  • Distance between two characters is the absolute difference of their numeric value as per above table.

  • Normal Distance between two strings is the difference of total numeric value of each string.

  • Minimum Distance between two strings is the numeric cost of changing one string to another, one character at a time.

  • If you change one valid character [0-9, a-z, A-Z] to another, then it will cost the difference in their numeric value (distance between target and source character).

  • Dropping an existing or adding a any new character, would cost the value of dropped or added character (difference between that character and ’empty character’).

Example

Source string : ‘agx1’

Target string : ‘1agx’

It can be changed like : ‘agx1’ => ‘1gx1’=> ‘1ax1’ => ‘1ag1’ => ‘1agx’

or

by dropping and adding ‘1’ : ‘agx1’ => ‘agx’ => ‘1agx’

Note that second path needs less ‘distance’ (2) than the first (66). Similarly there can be many variations.

In this text we will create a algorithm and write a program to change one given string to another, changing one character at a time, with minimum distance.

                  Proposed Algorithm

The proposed algorithm is based on optimized decision tree and list where every node will have 3 children corresponding to operations Insertion , Deletion and Substitution.

The structure of the node will be like follows :

  1. Source string will hold modified source string from the parent.

  2. Target string will be constant and will hold the target value.

  3. Sequence of operations will hold all previous operations performed on parents and operation performed on node.

  4. Cost of operation will hold the cumulative cost of the chain used to reach the node.

  5. Dead end is a bool value which indicates that node need not to be processed further.

Node

The algorithm works as follows :

  1. The source node will be added in to a list. The source node will have original source and target strings. Cost of operation will be set to 0

  1. Next, 3 children will be created to the node after performing Insertion, Deletion and Substitution operations on the parent.

Tree

  1. Add children to the list and remove parent node.

  1. Check siblings if the target string is reached or not.

  1. If target string is reached in a sibling then make the cost as benchmark and compare the value with siblings.

  1. If all siblings have the cost greater than the benchmark sibling, terminate loop and benchmark sibling would give the minimum cost conversion result.

  1. If siblings are not reached till final point and also the cost is less than benchmark sibling, create decision branches for siblings and add to the list removing the corresponding parents until either a new minimum cost node is found in which case it will become the new benchmark node or all nodes in the list have more cost than the benchmark node.

  1. Any node for which dead end flags get’s set will not be added in the list and thus will not be processed. Following are the conditions which leads the node to a dead end.

  • Case of Insertion : If the index value at which source and target strings differ is greater than the index of last character in the target string. In this case target string can not be reached by inserting any character in the source string.

  • Case of Deletion : If the index value at which source and target strings differ is greater than the index of last character in the source string. In this case target string can not be reached by deleting any character in the source string.

  • Case of Substitution : Substitution can only be done if 2 characters are at the same index so, If the index value at which source and target strings differ is greater than the index of last character in the source string or in target string. In this case target string can not be reached by substituting any character in the source string.

 

Source Code in Objective-C

Modal Class for Node

Header file :

@interface ModalNode : NSObject

{

}

@property(nonatomic,retain)NSString* sourceString;

@property(nonatomic,retain)NSString* targetString;

@property(nonatomic,retain)NSString* sequenceOfOperation;

@property(atomic)NSUInteger costOfOperation;

@property(atomic)BOOL reachedDeadEnd;

+(ModalNode *)nodeDataWithNode:(ModalNode *)node;

@end

Implementation file :

#import “ModalNode.h”

@implementation ModalNode

@synthesize sourceString, targetString, sequenceOfOperation, costOfOperation, reachedDeadEnd;

-(id)init

{

if(!self)

{

  self = [superinit];

}

if(self)

{

  //Init

  self.sourceString = @””;

  self.targetString = @””;

  self.sequenceOfOperation = @””;

  self.costOfOperation = 0;

  self.reachedDeadEnd = NO;

 }

return self;

}

-(id)initWithNode:(ModalNode *)node

{

if(!self)

{

self = [superinit];

}

if(self)

{

  self.sourceString = node.sourceString;

  self.targetString = node.targetString;

  self.sequenceOfOperation = node.sequenceOfOperation;

  self.costOfOperation = node.costOfOperation;

 }

return self;

}

+(ModalNode *)nodeDataWithNode:(ModalNode *)node

{

  return [[[ModalNodealloc] initWithNode:node] autorelease];

}

-(void)dealloc

{

  [sourceStringrelease];

  [targetStringrelease];

  [sequenceOfOperationrelease];

  [superdealloc];

}

@end

Implementation of Algorithm

  1. Function for cost Calculation :

//Return cost of character as per problem statement on which operation is performed

– (int)costOfCharacter:(unichar)character

{

  int cost = 0;

  if((character > 47) && (character < 58))

  {

    if(character == 48)

    {

      //value for 0, as per problem statement set it to 10

      cost = 10;

    }

    else

   {

    //Cost of integers

    cost = character – 48;

   }

}

elseif((character > 96) && (character < 123))

{

//Cost of characters

cost = character – 97 + 11;

}

 return cost;

}

  1. Function for Insertion Operation :

– (ModalNode *)insertionOperationOnParent:(ModalNode *)parentNode

{

ModalNode* node = [ModalNodenodeDataWithNode:parentNode];

//Create a mutable string from the source string

NSMutableString* sourceString = [NSMutableStringstringWithString:node.sourceString];

NSString* targetString = node.targetString;

NSInteger index = 0;

//Always run the loop until break statement is encountered

while(YES)

{

if((index < sourceString.length) && (index < targetString.length))

{

//If index is smaller than length of both stings

unichar targetChar = [targetString characterAtIndex:index];

unichar sourceChar = [sourceString characterAtIndex:index];

if(sourceChar != targetChar)

{

//If characters in target and source string on same index are not

//equal, perform insertion of targetcharacter at the index on source //string

node.sequenceOfOperation = [node.sequenceOfOperationstringByAppendingFormat:@”\ninsert at index %ld”,index];

NSMutableString* tempstring = [NSMutableStringstringWithString:node.sourceString];

[tempstring insertString:[NSStringstringWithFormat: @”%C”, targetChar] atIndex:index];

node.sourceString = tempstring;

node.costOfOperation = node.costOfOperation + [selfcostOfCharacter:targetChar];

break;

}

}

elseif(index == sourceString.length)

{

//If index is greater than the index of last character in source string,

//insert all remaining charactersfrom target string in source string

if(index < targetString.length)

{

node.sequenceOfOperation = [node.sequenceOfOperationstringByAppendingFormat:@”\ninsert from index %ld”,index];

for(NSInteger i = index; i < targetString.length; ++i)

{

unichar addedChar = [targetString characterAtIndex:index];

node.costOfOperation = node.costOfOperation + [selfcostOfCharacter:addedChar];

}

}

node.sourceString = node.targetString;

break;

}

elseif(index == targetString.length)

{

//If index is greater than the index of last character in target string,

//then a dead end for insertion isreached i.e target string can not be

//reached by inserting any character in source string now

node.reachedDeadEnd = YES;

break;

}

index = index + 1;

}

return node;

}

  1. Function for Deletion Operation :

– (ModalNode *)deletionOperationOnParent:(ModalNode *)parentNode

{

ModalNode* node = [ModalNodenodeDataWithNode:parentNode];

//Create a mutable string from the source string

NSMutableString* sourceString = [NSMutableStringstringWithString:node.sourceString];

NSString* targetString = node.targetString;

NSInteger index = 0;

//Always run the loop until break statement is encountered

while(YES)

{

if((index < sourceString.length) && (index < targetString.length))

{

//If index is smaller than length of both stings

unichar targetChar = [targetString characterAtIndex:index];

unichar sourceChar = [sourceString characterAtIndex:index];

if(sourceChar != targetChar)

{

//If characters in target and source string on same index are not

//equal, perform deletion of target character at the index on source

//string.

node.sequenceOfOperation = [node.sequenceOfOperationstringByAppendingFormat:@”\nDelete at index %ld”,index];

NSMutableString* tempstring = [NSMutableStringstringWithString:node.sourceString];

[tempstring deleteCharactersInRange:NSMakeRange(index, 1)];

node.sourceString = tempstring;

node.costOfOperation = node.costOfOperation + [selfcostOfCharacter:sourceChar];

break;

}

}

elseif(index == sourceString.length)

{

//If index is greater than the index of last character in source string,

//then a dead end for deletion is reached i.e target string can not be

//reached by deleting any character in source string now

node.reachedDeadEnd = YES;

break;

}

elseif(index == targetString.length)

{

//If index is greater than the index of last character in target string,

//delete all remaining charactersfrom source string

if(index < sourceString.length)

{

for(NSInteger i = index; i < sourceString.length; ++i)

{

unichar deletedChar = [sourceString characterAtIndex:index];

node.costOfOperation = node.costOfOperation + [selfcostOfCharacter:deletedChar];

}

node.sequenceOfOperation = [node.sequenceOfOperationstringByAppendingFormat:@”\nDelete characters from index %ld”,index];

}

node.sourceString = node.targetString;

break;

}

index = index + 1;

}

return node;

}

  1. Function for Substitution Operation :

– (ModalNode *)substitutionOperationOnParent:(ModalNode *)parentNode

{

ModalNode* node = [ModalNodenodeDataWithNode:parentNode];

//Create a mutable string from the source string

NSMutableString* sourceString = [NSMutableStringstringWithString:node.sourceString];

NSString* targetString = node.targetString;

NSInteger index = 0;

//Always run the loop until break statement is encountered

while(YES)

{

if((index < sourceString.length) && (index < targetString.length))

{

//If index is smaller than length of both stings

unichar targetChar = [targetString characterAtIndex:index];

unichar sourceChar = [sourceString characterAtIndex:index];

if(sourceChar != targetChar)

{

//If characters in target and source string on same index are not

//equal, perform substitution of target character at the index on

//source string.

node.sequenceOfOperation = [node.sequenceOfOperationstringByAppendingFormat:@”\nsubstituting at index %ld”,index];

NSMutableString* tempstring = [NSMutableStringstringWithString:node.sourceString];

[tempstring replaceCharactersInRange:NSMakeRange(index, 1) withString:[NSStringstringWithFormat: @”%C”, targetChar]];

node.sourceString = tempstring;

node.costOfOperation = node.costOfOperation + abs([selfcostOfCharacter:targetChar] – [selfcostOfCharacter:sourceChar]);

break;

}

}

elseif((index == targetString.length) || (index == sourceString.length))

{

//Substitution can only be done if 2 characters are at the same index so //if index is greater than theindex of last character in target string

//or in source string, it’s a dead end for substitution i.e target string

//can not be reached by substituting any character. The only way is

//insertion or deletion

node.reachedDeadEnd = YES;

break;

}

index = index + 1;

}

return node;

}

  1. Minimum Cost calculation function:

– (ModalNode *)minimumCostConversion:(NSString *)sourceString targetString:(NSString *)targetString

{

//Create source Node

ModalNode* sourceNode = [[ModalNodealloc] init];

sourceNode.sourceString = sourceString;

sourceNode.targetString = targetString;

if([sourceString isEqualToString:targetString])

{

//If source and target strings are equal, retun source node.

return sourceNode;

}

//Add Source node in the list

NSMutableArray* nodesArray = [[NSMutableArrayalloc] init];

[nodesArray addObject:sourceNode];

//Create final node with dummy data. This node will contain minimum cost node at the end

ModalNode* finalNode = [[ModalNodealloc] init];

//NSUIntegerMax = 18446744073709551615 on 64 bit Mac machine

finalNode.costOfOperation = NSUIntegerMax;

//Process list until it is empty

while(nodesArray.count)

{

//Source node will be the first parent

ModalNode* parentNode = [nodesArray objectAtIndex:0];

if(parentNode.costOfOperation > finalNode.costOfOperation)

{

//If Cost of reaching the node is greater than final node, do not process

//it’s children

[nodesArray removeObjectAtIndex:0];

continue;

}

//Get 3 children of parent based on 3 decisions i.e Insertion,Deletion and

//Substitution

ModalNode* insertNode = [selfinsertionOperationOnParent:parentNode];

ModalNode* deleteNode = [selfdeletionOperationOnParent:parentNode];

ModalNode* substituteNode = [selfsubstitutionOperationOnParent:parentNode];

//Remove parent from the list

[nodesArray removeObjectAtIndex:0];

//Process Insertion branch

if([insertNode.sourceStringisNotEqualTo:targetString])

{

if((!insertNode.reachedDeadEnd) && (finalNode.costOfOperation > insertNode.costOfOperation))

{

//If Insertion operation is not possible on node or the cost is

// greater than cost of previous final node, remove from the list

// without further processing it.

[nodesArray addObject:insertNode];

}

}

else

{

if(finalNode.costOfOperation > insertNode.costOfOperation)

{

//If source to target string conversion is complete and cost of

//operation is also less than previous final node, this node will

//become final.

finalNode = insertNode;

}

}

//Process Delete branch

if([deleteNode.sourceStringisNotEqualTo:targetString])

{

if((!deleteNode.reachedDeadEnd) && (finalNode.costOfOperation > deleteNode.costOfOperation))

{

//If Deletion operation is not possible on node or the cost is

//greater than cost of previous final node, remove from the list

//without further processing it.

[nodesArray addObject:deleteNode];

}

}

else

{

if(finalNode.costOfOperation > deleteNode.costOfOperation)

{

//If source to target string conversion is complete and cost of

//operation is also less than previous final node, this node will

// become final.

finalNode = deleteNode;

}

}

//Process Substitution branch

if([substituteNode.sourceStringisNotEqualTo:targetString])

{

if((!substituteNode.reachedDeadEnd) && (finalNode.costOfOperation > substituteNode.costOfOperation))

{

//If Substitution operation is not possible on node or the cost is

//greater than cost of previous final node, remove from the list

//without further processing it.

[nodesArray addObject:substituteNode];

}

}

else

{

if(finalNode.costOfOperation > substituteNode.costOfOperation)

{

//If source to target string conversion is complete and cost of

//operation is also less than previous final node, this node will

//become final.

finalNode = substituteNode;

}

}

}

//Return node which contains converted data and is reached through minimum cost

//path

return finalNode;

}

Output of Program

otput1

output2

 

Written By: HEM DUTT, Sr. Engineer/Tech Lead (Mac OSX development), Mindfire Solutions

 

About HEM DUTT

Seasoned Mac OS X developer. Expertise in Mac OSX application development. knowledge of MFC and IOS

Posted on May 2, 2014, in General Algorithm, Objective-C and tagged , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: