Category Archives: Sorting

How does a radix sort work?

Radix sort is an integer sorting algorithm. As it’s name indicates, it sorts a list of numbers based on the digits of the numbers. 

It’s a linear time sorting algorithm which is different from the comparison based sorting algorithms like quick sort or heap sort. Counting sort is also a linear time sorting algorithm which is explained in my previous post.

Let us understand this algorithm using an example. 
Consider the array [235, 10, 784, 69, 51]

We first sort the numbers based on the least significant digit, i.e the digit at 1’s place. It becomes [10, 51, 784, 235, 69]. Observe that the these numbers are now sorted according to it’s last digit ([0,1,4,5,9]).

Next we consider the digits at 10’s place and sort them. It transforms to
[10, 235, 51, 69, 784]. These are sorted according to the last but one digits ([1,3,5,6,8])

Similarly in the next step it becomes [10, 51, 69, 235, 784]

Now the entire array is sorted! We no longer need to sort array because we have at most 3 digits in any number.

Implementation details:
Suppose there are at most D digits in any number, we call the counting sort D times for each place.

Let us briefly know how counting sort works. For more details, you can check my previous post.

It first groups the numbers into buckets. The result of this is a count array of size 10 (assuming the numbers are in the range 0-9). 

Calculate the prefix array of this, we get the sorted positions of given numbers.

Then we use a temporary array to arrange the numbers in sorted order.

An alternative implementation could be to maintain a list of numbers for each bucket, and merging them to a sorted array. But this approach takes extra memory for the list of buckets.

The following code shows  the implementation of the above algorithm.
Exercise: Try to solve this SPOJ problem using radix sort.

Minimum number of chages to make equal groups

Given an array of numbers, how do we find the minimum number of changes(additions/subtractions) to the elements so that each number in the array is same or equal.

For example given the array {1, 2, 3, 4}, the minimum number of changes are 3. i.e we can choose any of the four elements and make the other three elements equal to the chosen element.

Similarly given the array {56, 29, 112, 29}, the minimum number of changes to make is 2. We can choose 29 as the common element and change the other two elements.

This problem is from recently concluded Codechef contest. Click on this link to read the complete problem statement.

The solution is evident from the second example. This is the problem of finding the number of occurrences of a most frequently appearing number (mode) and subtracting it from the total number of elements.

There are at least two different approaches to implement the solution. 
One is to sort the array (takes O(n log n) time) first and find the maximum frequency in O(n) time.
The other is a map based approach to store the frequencies of elements while iterating through all the elements and find the maximum among them. This will take O(n) time bust consumes O(n) extra space.

Below is the C++ implementation of the first strategy. Read my previous post for the implementation of map based method.

Finding a sub list with least max-min difference – Codeforces puzzle

This is a problem from Codeforces. Please follow this link to solve this problem on your own.

The abridged problem statement is as follows.

You are given an list of numbers, and sub-list size as input. The challenge is to find a sub-list such that the difference between the maximum and minimum numbers in that sub-list is minimized.

For example consider the numbers {36, 6, 12, 42, 24, 50}, and we have to find a sub-list of size 4. By choosing {36, 42, 24, 50}, we can have a least difference of 26 i.e 50-24.

The solution is to first sort the numbers in ascending order, and move a sliding window of sub-list size from begin to end while keeping track of the minimum difference between first and last numbers of the sub-list.

Tracing with the above example, after sorting the list becomes {6, 12, 24, 36, 42, 50}
We start with 6, 36, the difference is 30
move on to 12, 42, the difference is 30
move on to 24, 50, the difference is 26
So the minimum difference is 26.

This algorithm runs in O(n*log n) time because we have to sort the list. the actual algorithm takes O(n), linear time.

Here is the C++ code for the same.

Joining an array of numbers to form the biggest number

Given an array numbers, we have to join them in such a way that it forms the biggest number.

For example consider the simple three element array [10,3,2], they can be joined in following ways
1032, 1023, 2103, 2310, 3102, 3210. Among these numbers 3210 is the biggest number.

To solve this problem we definitely have to sort the given numbers using some criteria to form the biggest number. 
Considering the above example, if we simply sort them in descending order, the array becomes [10,3,2] which does not form the biggest number.

Similarly we can also sort numbers by comparing the numbers string comparison. This does not work in cases like [1,10]. Since “10” > “1” the array in sorted as [10,1], but 110 is the biggest number.

We should think of a custom comparison function which guarantees us the biggest number when joined. Let us join the two numbers to be compared in two ways and compare them.

Considering the above example, they become 110 (“1″+”10”), 101( “10”+”1″). Since 110 is higher we place 1 first in sorting order before 10.

Here is the simple python implementation of the above algorithm. Also see my post on geeksforgeeks for C++ implementation of the same algorithm.

Counting triplets with given sum

Given an array of distinct numbers, how do we count the number of triplets with given sum?
For example let us consider the array {5, 2, 7, 8, 4, 3} and the sum is 14, 
There are 3 triplets with the given sum
(2,5,7)
(2,4,8)
(3,4,7)

A brute-force algorithm will take O(n3) time. But this problem can also be solved based on 2 Sum problem. Here is the approach.

  • Sort the array first.
  • Fix the first element in the triplet, and search for a pair of elements whose sum is (Sum – Fixed element)
  • Repeat the above step for all the elements
We are essentially searching for a pair of elements for each fixed element. So the time complexity of this solution would be O(n2).
One variation/follow-up of this problem could be to find out the number of triplets with sum less than or equal to the given value.
Considering the first example again, there are 8 triplets with sum less than or equal to 14.
2 3 4
2 3 5
2 3 7
2 3 8
2 4 5
2 4 7
3 4 5
3 4 7
 
The solution for this is similar to the first problem and uses the trick discussed in my previous post

The code for both is given in the following Java program.


Number of pairs with sum less than or equal to given value

Given an array of numbers, Count the number of pairs with sum less than or equal to K

For example consider the array {5, 1, 12, 3, 6}

pairs with sum <= 10 are {(1,3), (1,5), (1,6), (3,5), (3,6)}
So there are 5 pairs with the given constraint.

In one of the previous posts, we have seen how to find a pair with given sum in the given array. This problem is a variation of that.


Let us look at the algorithm.
  1. First sort the array in ascending order.
  2. Take two indices one(b) pointing to the start of the array, and the other(e) pointing to the last element of the array.
  3. If A[b]+A[e] <= K
    1. A[b] + A[i] <= K for b < i < e. Hence (e-b) elements satisfy the given property.
    2. Increment b and decrement e and repeat step 3
  4. Otherwise decrement e
  5. Repeat step 3 until b and e meet each other.

A[0] A[1] A[2]… A[N]
b–>            <–e

Considering the above example 

{1, 3, 5, 6, 12} let b = 0, e = 3 and K = 10

1  3  5  6  12
b        e
 
If we add 6 or 5 or 3 to 1, the sum is less than 10. At this point, we can increment the pair count in one shot.


Here is Java implementation of the above algorithm. This solution runs in O(n log n).


Finding all Pythagorean triples in an array

Given an array of unique elements, Write a program to print all Pythagorean triples in it.
 

A Pythagorean triple is a set of three numbers such that square of one number is the sum of squares of the other two.
 

For example (3,4,5) is Pythagorean triple because 52 = 32 + 42
 

Consider the array {5, 6, 3, 10, 4, 2, 1 , 8, 7} 
It contains two Pythagorean triples (3,4,5) and (6,8,10)

An obvious solution to this problem could be examine all possible triples in the array and check if it satisfies the above property.
But this solution takes O(n3) time.

We can do better than this if we first square all the elements and sort them in descending order.

Then this problem is simplified to finding three indices in the array i,j,k such that arr[i] = arr[j]+arr[k]

The algorithm is as follows.

For each array[i] 0 <= i < n-2
     index1 <- i+1
     index2 <- n-1
            
     While index1 < index 2
                 
        If array[i] = array[index1] + array[inde2]
           Print Triple
           index1++
           index2–
        If array[i] <  array[index1] + array[inde2]
           index1++
        Otherwise
           index2–

Here is the implementation of the above in C++. It takes O(n log n) for sorting and O(n2) for the actual algorithm. So the effective time complexity for this problem is O(n2)

Solution 2:
Alternatively you can use a Hash set to store all the square values. Then examine all possible pairs in the array to check if sum of their squares exists in the set.
This approach also takes O(n2) time but uses O(n) extra space. 
Note: The hash_set implementation is available only in Visual studio 10 compiler and above. In C++ 11 standard it is mentioned as unordered_set. If we use normal set, the time complexity will become O(n2 log n).

Maximum number of overlapping intervals

Given a set of intervals, how do we find the maximum number of intervals overlapping at any point of time.

For example let us consider the following intervals.

{ (0,2), (3, 7), (4,6), (7,8), (1,5) }

The maximum number of intervals overlapped is 3 during (4,5).

This can be asked in many forms in a programming competition. This CodeChef problem is one such an example. Go and test your knowledge if you have some idea about how to solve this.

Let us look at the solution approaches.

We first consider all the start and end point as a single entity and sort them in ascending order according the following criteria.
  • If the times are different, just sort according to time.
  • If the times are equal, we first place the end of interval.
Then we take a counter and a maximum counter. 
Iterate through all the points
  • If the point is start of interval we increment the counter.
    • Keep track of maximum counter
  • Otherwise we decrement the counter
At the end of this, we have the required result in maximum counter.

Another alternative solution could be to use the merge procedure used in merge sort. The steps are similar to the merge procedure.

  • Sort all the starting points and ending points in ascending order
  • Try to merge them into a single list
  • Use two variables overlap, and max_overlap to keep track of the current overlap and maximum overlap we have seen so far.
  • As long as we take element from starting point of the interval, increment overlap and update max_overlap.
  • If we are taking en element from ending point of the interval, decrement overlap.
At the end of this procedure, we have the answer in max_overlap.

Below is the simple C++ implementation of the above algorithms.

Sorting a linked list using insertion sort

In one of my previous posts, we have seen how a linked list can be sorted using bubble sort approach. In this post we will see how we can sort a linked list using insertion sort algorithm. Like bubble sort this approach will also take O(n2) time.

Here is the strategy for linked list insertion sort. We maintain two lists; one sorted and another unsorted. Initially the sorted list is empty and the unsorted list will be the original list. In each iteration, we remove a node from the unsorted list and add it the sorted list in proper position. I re-used the procedure for inserting data into a sorted linked list from this post.

The following diagram should give you some understanding of the procedure.

Here is the C++ implementation of the above algorithm.

C++ STL Algorithms – Sort – Part-2

In the last post, the basic use of STL sort() method is explained. In this post, I will discuss some more options available with this method.
We have seen in the previous post that sort takes the beginning and ending of the array as arguments.
sort( array.begin(), array.end() );
In addition to those parameters, it can also take a third parameter (predicate) which decides the ordering of the elements. If we do not pass the third parameter, the default ordering is assumed. For integer data types such as int,  float, char etc, ascending order is followed. For strings alphabetical order is followed. To sort an array of integers in descending order, simply call the sort routine like this.

sort( array.begin(), array.end(), greater<int>() );

std::greater is a built in comparator object provided by STL. we have to pass the type of objects to be sorted. similarly we have std::less, std::less_equal, std::greater_equal etc…

So far we are sorting only built in data types such as int, double and library class string. We can also sort objects of user defined classes also.

For example in the following program I have defined a class called Pair, which contains two fields. To sort user defined/custom data types we also have to write the comparison functions to decide the ordering. We can sort Pairs either according to the first value or the second value. I have written two comparator functions and passed these while sorting. 

Here is the code which explains these concepts.