Category Archives: set

Number of characters appearing in all given strings

Given a set of n strings, we have to write a program to find out the number of characters appearing in all the strings.

For example consider the following strings
{“India”, “China”, “United states”}, the letters {a,i,n} appear in all the strings.

Let us think of the following solution.
  • We first add all the characters in the first string to a set. 
  • For each string we do the following.
    • For each letter in the set, we check if it can be found the string.
    • If it is not found, we delete it from the set
  • Finally the set contains only the elements which appear in all the given strings.

Here is the C++ code which implements the above approach. For simplicity this program assumes that all the strings contains lower case alphabets only.

Finding the number of duplicates in a list of numbers

Given a list of numbers, we have to find how many duplicate entries are present in it?

For example consider the list of numbers {5, 1, 9, 5, 2, 1, 6, 4, 5, 8}
It has 3 duplicate numbers.

There are many ways to do this. In this post, I present one of the simplest ways using C++ STL set data structure, and Java HashSet data structure
 
The insert() method of the set returns a pair of type 
pair< set<type>::iterator, bool >
 

The second value in the pair is true if a new element is inserted. It is false if there is already an existing element.

The add() method of HashSet returns true if new element is inserted, and false if no new element is inserted(duplicate)

We can make use of this property to count the number of duplicate elements.

Here is the simple C++ implementation.

Here is the Java Implementation.

Program to find the powerset of a given set

Given a set of numbers, how do we generate its power set?

For example for the set {1,2} the power set is given below.

{  {},  {1}, {2}, {1,2} }

A power set is nothing but all the possible subsets of a given set. If a set contains N elements, it’s power set contains 2N elements. This is evident from the above example.

Here is how our method works.
All the subsets can be identified by an integer. The binary representation of that integer says which elements from the original set present in that subset.

For example let us consider a set of two elements {1,2}, the power set of this can be represented by four integers

0 = 0000 – Empty subset
1 = 0001 – Fist element is present
2 = 0010 – Second element is present
3 = 0011 – First and Second elements are present

In the following C++ program, the outer loop runs from 0 to 2n where n is the size of the set. For each integer, the inner loop checks for set bit positions and accordingly add those numbers to the current subset. 

Since we have to generate all the possible subsets, the complexity of this method is O(2^n).

#include <iostream>
#include <set>
#include <vector>
using namespace std;

//Generates the power set of integers provided in inputSet,
//output in pSet
void generatePowerSet( vector<int> &inputSet, set< set<int> > &pSet )
{
int i = 0;
int n = inputSet.size();

//There will be 2^n subsets; generate them
while ( i < (2 << n) )
{
set<int> subSet;

int j = 0;
int k = i;

while( k&(k-1) ) //Until k contains at-least one set bit
{
//If jth bit is set, add the jth element to the subset
if( k & 1)
{
subSet.insert( inputSet[j] );
}

k = k >> 1;
j++;
}
//Add the subset to the powerset
pSet.insert(subSet);
i++;
}
}
//Utility to print the powerset
void printPowerSet( set< set<int> > &pSet)
{
set< set<int> >::iterator setIter;
set<int>::iterator intIter;

for( setIter = pSet.begin(); setIter != pSet.end(); ++setIter )
{
cout<<"{ ";
for( intIter = setIter->begin(); intIter != setIter->end(); intIter++ )
{
cout<<*intIter<<" ";
}
cout<<"}"<<endl;
}
}
int main()
{
int n; // number of elements
cin>>n;
vector<int> intSet;
int i;
//Read the set elements
for( i = 0; i < n; i++ )
{
int x;
cin>>x;
intSet.push_back(x);
}
set< set<int> > powerSet;
generatePowerSet( intSet, powerSet );
printPowerSet( powerSet );
return 0;
}

Longest Increasing Subsequence problem

Given an array of numbers, we have to find the longest sub sequence of that array which is strictly increasing.

For example in the following array
[8 1 4 2 7 9 5]

{8,9}
{1,4,7,9}
{1,2,7,9}

{1,2,5}
{1,2,7}
{1,2,9}

are some of the increasing sub-sequences. The second and third are the longest increasing sub sequences with length 4. Our job is to find the length of the longest increasing sub sequence.
There are various ways of solving this problem. In this post we will use the following algorithm to find the length of the longest increasing sub sequence.

We take one by one element from the array and insert them into a sorted set data structure. (TreeSet class in Java is one such implementation of sorted set. Please see the program details below)
Once the element is inserted, we will see if it is appended at the end. If it is, we proceed to the next element in the array. If it is not, we will delete the next element to newly inserted element.
At the end, the size of the sorted set is the length of the longest increasing sub sequence.

To understand this further, let us look at the contents of the sorted set in each iteration

Iteration 1: {8} – element added at the end
Iteration 2: {1} – 1 added before 8, so 8 got deleted
Iteration 3: {1,4} – 4 added at the end
Iteration 4: {1,2} – 2 added before 4, 4 got deleted
Iteration 5: {1,2,7} – 7 added at the end
Iteration 6: {1,2,7,9} – 9 added at the end
Iteration 7: {1,2,5,9} – 5 added before 7, 7 got deleted

So the size{1,2,5,9} = 4 is the required answer. However, the sequence may not reflect the longest sequence. It will only indicate the length of it.

The following Java program implements the above algorithm. In this program you can see how the TreeSet data struture is used. This data structure maintains the sorted order when the elements are inserted. It does not allow duplicates to be inserted. The add() method inserts the element if there is no such element exists in the set and it returns true if the element is newly added, otherwise it returns false.

The last() method returns the last element in the set. The higher() method returns the least element which is greater than the given element. All these methods are used in the following program.

 
import java.util.TreeSet;

public class LIS {
public static void main(String[] args)
{
Test1();
Test2();
Test3();
Test4();
Test5();
}
public static void Test1()
{
int [] array = {8,1,4,2,7,9,5};
System.out.println("LIS{8,1,4,2,7,9,5}: " + getLengthOfLIS(array));
}
public static void Test2()
{
int [] array = {1};
System.out.println("LIS{1}: " + getLengthOfLIS(array));
}
public static void Test3()
{
int [] array = {2,1};
System.out.println("LIS{2,1}: " + getLengthOfLIS(array));
}
public static void Test4()
{
int [] array = {1,2,3,4,5};
System.out.println("LIS{1,2,3,4,5}: " + getLengthOfLIS(array));
}
public static void Test5()
{
int [] array = {5,4,3,2,1};
System.out.println("LIS{5,4,3,2,1}: " + getLengthOfLIS(array));
}
public static int getLengthOfLIS(int[] array)
{
TreeSet<Integer> s = new TreeSet<Integer>();
for( int i = 0 ; i < array.length ; i++)
{
//if array[i] is newly added?
if( s.add(array[i]) )
{
//if array[i] is not the last element?
if( array[i] != s.last() )
{
//remove it's next element; s.higher() gives the least element
//which is greater than array[i]
s.remove(s.higher(array[i]));
}
}
}
return s.size();
}
}

This method runs in linear time (O(n)) and O(n) extra space for the sorted set.

Common elements between two sorted arrays

Given two sorted arrays, how do you efficiently find the common elements between those two?

Method#1:
Simple and Brute-force approach
Iterate through one array, and try to find the element in another array using linear search. If it is found, then print it. This approach takes O(n^2) time as for each element in the first array we doing a linear search in the second array which takes O(n). So for n elements it would be O(n^2).

Method#2:
Better approach with binary search
This approach is simliar to the first one except that it uses binary search instead of linear search for finding an element in the second array as the arrays are sorted. Binary search takes O(log n) for each element. So for n elements the time complexity would be O(n log n).

Method#3:
Efficient
This method fully utilizes the fact that both arrays are sorted. Start with two indices pointing to beginning of the arrays. If the elements pointed by these indices are same, then it is a common element, and we advance both the pointers. If one element is smaller, then advance that pointer (in hope of finding the next equal element). Continue this until any one of the two indices reaches their end.

Here is the C++ code which implements the third approach.

#include <iostream>
#include <string>
#include <vector>

using namespace std;

//prints the intersection of two sorted arrays
//duplicates are allowed
void printIntersect(vector<int>& s1, vector<int>& s2)
{
//i and j start 0 i.e first element
int i = 0 , j= 0;

//while either of the two indices reaches end
while ( i < s1.size() && j < s2.size() )
{
//if first array element is lesser, advance that index by one
if( s1[i] < s2[j] )
{
i++;
}
//otherwise advance second index
else if( s1[i] > s2[j] )
{
j++;
}
//both elements are same, print it, and advance both the pointers
else
{
cout<<s1[i]<<" ";
i++;
j++;
}
}
}

int main()
{
vector<int> set1, set2;
int n1,n2; //size of the sets
cin>>n1>>n2;

int i;
//read two arrays
for( i = 0 ; i < n1; i++ )
{
int d;
cin>>d;
set1.push_back(d);
}
for( i = 0 ; i < n2; i++ )
{
int d;
cin>>d;
set2.push_back(d);
}
printIntersect(set1,set2);
return 0;
}

Finding duplicates from given set of numbers

Given a list of numbers. How do we write a program to find the duplicates among them?
A simple algorithm is to compare each possible pair in the list to see if there are any duplicates. But this algorithm runs in O(n2) time.
We can use the following simple algorithm using the set data structure. This data structure supports insert/add, find operations. We read one number at a time and try to find it in the set. If the set already contains that element, we declare that element as a duplicate, otherwise we insert that element into that set. The following is a Java implementation of the same.

 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import java.io.*;
import java.util.HashSet;
/**
*
* @author Ravi
*/
public class DuplicateNums {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
try
{
//Read from the standard input
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String input = br.readLine();
String [] strNums = input.split(" ");

//create the hash set to store the unique numbers
HashSet<Integer> hSet = new HashSet<Integer>();

for( int i = 0 ; i < strNums.length ; i++)
{
//try to add the number to the set,
//if add method returns false, the number is a duplicate
if( !hSet.add( Integer.parseInt(strNums[i]) ) )
{
System.out.print(strNums[i] + " ");
}
}
System.out.println();
}
catch(IOException ioe)
{
System.out.println("Exception while reading the input" + ioe.getMessage());
}
catch(NumberFormatException nfe)
{
System.out.println("Invalid Input "+ nfe.getMessage());
}
}
}

We have used HashSet data structure from the Java standard library. This provides O(1) time for insert and find operations in an average case. So the average time complexity of the above program is O(n). Since we are storing the elements in the set, the space complexity is O(n) in the worst case (No duplicates in the list).