Category Archives: Strings

Finding the edit distance between two strings

Given two strings as input, write a program to find the edit distance between them. 
The edit distance is defined as the minimum number of modifications to do on one string to transform it into the other string. A modification can be inserting a character, deleting a character, or replacing an existing character.
For example, consider the two strings “saturday” and “sunday”. The edit distance is 3. There is no way, we can change one string to the other in less than 3 modifications.
The 3 modifications that can be done on “saturday” are
remove ‘a’ –> sturday
remove ‘t’ –> surday
replace ‘r’ with ‘n’ –> sunday

The following is the algorithm based on dynamic programming.

Let us assume string1 is of length N and string2 is of length M. We create a table of size (N+1) X (M+1).

The entry table[i][j] gives the edit distance between prefix of string1 ending with ith character and prefix of string2 ending with jth character.

0 <= i <= N and 0 <= j <= M

We fill this table in a bottom-up manner. This is essentially solving the problem by combining the results of it’s sub-problems. (Dynamic programming). The entries are filled using the following formula

table[0][i] = i
table[i][0] = i

Explanation:
Assume that 0th character is empty, the edit distance between an empty string and a string of length ‘i’ is always ‘i’ because by deleting all the ‘i’ characters, it becomes an empty string.

table[i][j] = table[i-1][j-1] if string1[i] = string2[j]

Explanation:
If string1[i] and string2[j] are equal, we don’t need any modifications. Hence it is as same as table[i-1][j-1]

table[i][j] = Min(table[i-1][j],table[i][j-1],table[i-1][j-1]) + 1

Explanation:
If string1[i] and string2[j] are not equal, we have three choices.

  • Modify the differing character string1[i] to match string2[j]. This requires a cost of table[i-1][j-1] + 1
  • Delete string1[i] and try to match the remaining with string2. This requires a cost of table[i-1][j] + 1
  • Insert string1[i] into string2. This requires  a cost of table[i][j-1] + 1.

Finally table[N][M] contains the required answer.

The time and space complexity for this algorithm is O(n * m).

Following is the C++ implementation of the above algorithm.

Checking if any anagram of a given string is palindrome or not

Given a string, how do we check if any anagram of it can be a palindrome?
For example let us consider the string “AAC”. An anagram of it is “ACA” which is a palindrome. We have to write a method which takes a string and outputs true if we can form a palindrome from any anagram of the given string. Otherwise outputs false.
We can immediately think of a solution where we can generate all anagrams of the given string and check if there is any palindrome exists. But this approach is very lengthy and has exponential time complexity (O(n!)).
The key to solve this problem efficiently is to understand how a palindrome is formed. We all know that a palindrome reads the same when you read from left to right or right to left. 

A palindrome can be broken into two halves. where in the first half is the reverse of second half. In case of odd length of strings, we have to ignore the middle character.

So for every character in the first half, there is a corresponding matching character in the second half. This indicates that all the characters in a string must occur even number of times except for one which appears in the middle of the string.

Hence if we check if there is at most one character with odd occurrences in the string we can say that we can form a palindrome from any anagram.

Here is the Java code which implements this algorithm. For simplicity, I assumed that the input string contains only lowercase English alphabets.

Reversing the words in the string

Given a string, how do we reverse the words inside the string?

For example if the input string is “this is a test” the output should be “test a is this”.

The algorithm is simple. This can be done in two steps. The first step is to reverse the entire string. In the second step we traverse the string to find the individual words separated by spaces and reverse them.

In the following program I have used C++ STL string instead of a C-style null terminated string. To read a string including spaces I have used getline(cin,str) function. I have also used reverse(str.begin(), str.end()) as a helper function from standard template library for simplicity. 

Here is the C++ program.

Removing a given character from the string

Given a string and a character, Write a function to remove all instances of the character and return the modified string.
One easy method is to iterate through each character of the string, Once a restricted character is seen, we move all the characters appearing after it by one position. This approach involves a lot of shift operations and in worst case it takes O(n2) time.

We can be more efficient if we follow the approach given below.

Take two indices; one running index (i) and one result index (r). All the valid characters are copied to beginning of the string and both indices are incremented. If a prohibited character is seen, we skip that character and just increment i. At the end, we re-size the string so that the characters beyond the index ‘r’ are removed.

Let us understand this with a simple example.

Let the input string be ‘PROGRAM’ and we want to remove the letter ‘R’. The following table illustrates the behavior of the algorithm by iteration.

0
1
2
3
4
5
6
P
R
O
G
R
A
M
P
R
O
G
R
A
M
P
R
O
G
R
A
M
P
O
O
G
R
A
M
P
O
G
G
R
A
M
P
O
G
G
R
A
M
P
O
G
A
R
A
M
P
O
G
A
M
A
M

The shaded part indicates the result string that is formed in-place without using the extra space.

Print a string in reverse using recursion

The following function is simply based on recursion mechanism. We write a function printReverse(). Initially we call this function by passing a pointer to the first character of the string. We keep calling this function recursively by incrementing the pointer until we reach the null character (End of the string). If we reach null, the function returns and we print the character at the current pointer.

Since the function calling mechanism uses a stack to keep track the calling sequence. While unwinding the stack, we are able to print the characters in reverse.

For example printReverse(“hello”) function calling sequence looks like the following.
printReverse(“hello”)
  printReverse(“ello”)
    printReverse(“llo”)
      printReverse(“lo”)
        printReverse(“o”)
          printReverse(“”)
            return
        print “O”
      print “l”
    print “l”
  print “e”
print “h”

Here is the C++ code for the same

#include <iostream>

using namespace std;

void printReverse(char *str)
{
if( *str )
{
printReverse(str+1);
cout<<(*str);
}
}
int main()
{
char str[] = "reverse this";
printReverse(str);
return 0;
}

How to check if two strings are anagrams of each other

Anagram of a word is nothing but rearrangement of it’s letters to form another word.

For example ‘listen’ is an anagram of ‘silent’ and ‘enlist’.

We have to write a function to check if two given strings are anagrams are not.

We can solve this problem in two different ways. The first way is to sort the two strings and compare them. If they are equal, they are anagrams; otherwise they are not. The intuition behind this is by sorting the letters in the string, it forms a unique pattern. All anagrams must transform to this unique pattern when they are sorted.


For example when we sort the letters in the string ‘listen’ or ‘silent’ it becomes
‘eilnst’ 

Here is the Java function which implements the first method. Here I have used separate strings to hold the sorted sequence so that the original strings are not modified. If the original strings can be modified, we can avoid using the extra String variables.

 public static boolean areAnagrams(String strFirst, String strSecond )
{
char [] charArray = strFirst.toCharArray();
Arrays.sort(charArray);
String strFirstSort = new String( charArray );
charArray = strSecond.toCharArray();
Arrays.sort(charArray);
String strSecondSort = new String( charArray );

return strFirstSort.equals(strSecondSort);
}


Method#2:
The second method is to count the letters in both the strings and match the counts. If the letter count of both the strings is same, then we can conclude that they are anagrams.

In the following Java implementation we do not count the frequencies of letters in both the strings, but we calculate the counts for just one string.

public static boolean areAnagrams(String strFirst, String strSecond )
{
//Assume ASCII charset of size 256 alphabets
int[] charCount = new int[256];

getCharCount(strFirst,charCount);

for( int i = 0; i < strSecond.length(); i++ )
{
char ch = strSecond.charAt(i);
// If count of ch is more in second string, return false
if( charCount[ch] <= 0 )
return false;
else
charCount[ch]--;
}

for( int i = 0; i < 256; i++ )
{
if( charCount[i] > 0 )
return false;
}
return true;
}
public static void getCharCount(String strInput, int[] charCount)
{
for( int i = 0; i < strInput.length(); i++ )
{
charCount[ strInput.charAt(i)]++;
}
}

Then we iterate through the characters of the second string, we try to decrement the frequency of each character. If we found a zero, that means that particular character has appeared more number of times in the second string, then we can say that they are not anagrams. Otherwise we decrement the count of that character.

Finally we check the count array if they have any non-zero entries. If they are anagrams, we have all zero entries in count array. 

Note: We have considered the space also as a significant character in the above programs. So ‘school master’ can not be anagram of ‘the classroom’  even though they contain same letters. Also the input strings are case-sensitive.

Detele characters from input string given in another string

Given two strings, How do we delete all the characters in the first string which are present in the second string?

Find the character set of the string. Create a Boolean array of size equal to the character set size. If the character set is ASCII, the size is 256. This array will be indexed by the character itself.

For example, array[‘c’] stores whether character ‘c’ is present in the string are not.

Iterate through each character of the second string, update the corresponding Boolean flag in the array to true.

Now loop through all the characters from the input string and retain the only characters not present in the mask.

Following is the C++ program to do it. This code runs in O(n) time and constant space. I have modified the input string itself without creating an auxiliary array.



#include <iostream>
#include <string>

using namespace std;

void removeChars( char * input, char * mask)
{
//initialize all the chars to false
bool charPresent[256] = {false};
int i;
//mark all the mask characters to true.
for( i = 0; mask[i]; ++i)
{
charPresent[mask[i]] = true;
}

int resInd = 0; //to track result string
int ind = 0;

while( input[ind] )
{
//Add the character only if it is not present in mask
if( !charPresent[ input[ind] ] )
{
input[resInd++] = input[ind];
}
ind++;
}
input[resInd] = ''; //Null terminator to end the string
}
int main()
{
char inputStr[] = "This is a test";
char maskStr[] = "aeiou";
removeChars( inputStr, maskStr);
cout<<inputStr;
return 0;
}

Finding the first non repeating character in a string

Given a string as input, how do we find the first non-repeating character?

We have to do this in linear time – O(n) and constant extra space- O(1)

Assuming that the character set is ASCII, the number of characters possible are 256. This will differ if the string contains non-ASCII characters like Unicode.

First we declare a count array of size 256 which will keep track of the count of each character appearing in the given string. We use the character itself as an index into the count array. This will give the O(1) time complexity for accessing the count of that character.

For example we store the count of a character ‘a’ (ASCII value= 98) in count[98]. 

In the first iteration, we iterate over each character of the string and update it’s count in the array.

In the next iteration, we go through each character of the array and checking the count of it from the array. The first character that we encounter with the count 1 will be the required answer.

For example if the given input is “this is a test”

the count array will be 
a – 1
e – 1
h – 1
i – 2
s – 3
t – 3

The first character with count=1 is h.

Here is the Java implementation of the above algorithm. 


/**
* Created with IntelliJ IDEA.
* User: Ravi
* Date: 8/25/13
* Time: 4:22 PM
*/
import java.util.Scanner;
public class Main {
public static void main(String[] args)
{
Scanner reader = new Scanner(System.in);
String input = reader.nextLine();
int ind = getFirstNonRepeating(input);
if( ind != -1)
{
System.out.println(input.charAt(ind));
}
else
{
System.out.println("All characters are repeated");
}
}
public static int getFirstNonRepeating(String input)
{
int[] charCount = new int[256];
int i;
for( i = 0; i < input.length(); i++)
{ 
             //we use the character itself as an index.
             charCount[ input.charAt(i) ]++;
}
for( i = 0; i < input.length(); i++)
{
if( charCount[ input.charAt(i) ] == 1)
return i;
}
return -1;
}
}

How to find the frequecy of a word in a given sentence

Given some text and a word. How do we find the frequency of the word?

In this post we will see a Java program which finds it with just a few lines of code using regular expression matching.

Java provides utilities for matching the regular expressions using java.util.regex package. In our program we utilize two of them namely Pattern, and Matcher. These are the fundamental classes in this package.

In the following code reader is an object of class Scanner which is used for reading the input from the keyboard. The Pattern class does not have a public constructor, we have to create an object of that class using the compile() method by passing the pattern. Similarly the Matcher class also does not have a public constructor, and it’s object should be returned by matcher() method of the Pattern object.

The matcher class provides a method called find() which tries to find the next instance of the pattern in the given text. If it finds the pattern it returns true otherwise it returns false.

Here is the Java code.

import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args)
{
Scanner reader = new Scanner(System.in);
String sentence = reader.nextLine();
String word = reader.nextLine();
Pattern p = Pattern.compile(word);
Matcher m = p.matcher(sentence);
int count = 0;
while( m.find())
count++;
System.out.println("The word '" + word + "' occurred " + count + " time(s)");
}
}

simple pattern matching

Pattern matching algorithms are very important in computer science. In this post we will discuss the simplest algorithm for pattern matching. Given some text and a pattern, we have to find where in the text, the pattern is found.

In this brute-force algorithm, we start matching the pattern from the beginning of the text. If a match is found we return the index. Otherwise we slide the matching window by one character and repeat the same process. For example if we are trying to find the pattern “ababb” in the text “abbabaababb

The following depiction gives the steps involved in this algorithm
   a b b a b a a b a b b
   a b a
     a
       a
         a b a b
           a
             a b
               a b a b b — pattern found

The following is the C++ implementation of the above algorithm. Note that this algorithm finds only the first occurrence of the pattern. This can be modified to find all the occurrences also.


#include <iostream>
#include <cstring>
#define MAXLEN 100
using namespace std;

int findPattern(char *t, char *p)
{
int n = strlen(t); //length of the text
int m = strlen(p); //length of the pattern
int i,j; //loop counters

//outer loop tries to match the pattern from index 0 to (n-m)
for( i = 0 ; i <= n-m ; i++ )
{
//inner loop tries to match jth char in pattern
//with (i+j)th char in text
for( j = 0 ; j < m ; j++)
{
if( t[i+j] != p[j])
break;
}
//pattern found; return the index
if( j == m)
return i;
}
//pattern not found
return -1;
}

int main()
{
char text[MAXLEN];
char pattern[MAXLEN];

cin>>text;
cin>>pattern;
cout<<"pattern found at: "<<findPattern(text,pattern);
return 0;
}