Sorting Techniques and their Importance
Any modern day computing involves sorting the data in one fashion or the other, based on the some sort of parameter such as price, rating, location or date. This places sorting as the first place where we attempt to "reConnect" the software industry with the techniques that are defined by computer science.Simple Sorting involves techniques that takes up a amount of time that is proportional to the square of the number of data elements that the technique is trying to sort. For a reasonably sized data set, simple sorting techniques offer the advantage of simplicity and in-place sorting at the cost of efficiency.
In other words, if there are N number of elements, then it takes $N^2$ amount of time, as a proportion to sort those elements. It is not the literal value of $N^2$ that matters, rather than the rate of its growth relative to the rate of growth of input N.
Translating this to actual computer terms, an average modern day computer has a speed of at least 1 G Hz, which is $10^9$ operations per second. Comparison of two values is one such operation that can be performed in 1/$10^9$
However, since real world comparisons involve more than simple numbers, and involve multiple comparisons (ex. comparing two distinct names ), let us assume that one such comparison involves 10000, or $10^4$ simple operations.
So, 1 comparison = $10^4$ simple operations
A modern PC can perform $10^5$ such comparisons per second.
On such a computer system that takes 1/$10^5$ sec to compare 2 items, to sort a array of 5000 objects , a simple sorting techniques described below require an time proportional to :
$${(5000^2)} x {1}/{10^5} = {(25 x 10^6) x {1}/{10^5} = 250 secs = 4.16 minutes }$$
Since the efficiency is measured as the rate of growth of time consumed, as the input data (number of elements that are being sorted) increases, let us calculate the time it takes to sort a array of 50000 objects , using the simple sorting techniques similar to the calculation above
$${(50000^2)} x {1}/{10^5} = {(25 x 10^8) x {1}/{10^5} = 25000 secs = 416 minutes = 7 hours }$$
As demonstrated above, it quickly becomes evident, that for higher values of N, having an efficiency of $N^2$ is extremely detrimental to performance, unless one has 7 hours to burn !
However, it also demonstrates that for a reasonable size data set simplest of sorting techniques can be effectively employed to achieve the results in a acceptable timely fashion.
Now. what are the sort techniques are the simplest of sort techniques ? They are
- Bubble sort, where the biggest value bubbles to the end of list.
- Selection sort, where the smallest value is selected and placed at the start of the list
- Insertion sort, where the value is rightly inserted in a sorted list to the left of a marker.
One way of comparing these 3 techniques is measuring the the time it took to measure how long these 3 techniques sort 100,000 randomly generated numbers. Here are the results:
Technique
|
Time to sort 100k random values (in msec)
|
Time to sort 100k random values (in sec)
|
Insertion sort
|
1865msec
|
1 sec
|
Selection Sort
|
8892 msec
|
8 sec
|
Bubble Sort
|
20904 msec
|
20 sec
|
Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are in reverse order:
Technique
|
Time to sort 100k reverse values (in msec)
|
Time to sort 100k reverse values (in sec)
|
Insertion sort
|
3607msec
|
3 sec
|
Selection Sort
|
7464 msec
|
7 sec
|
Bubble Sort
|
12044 msec
|
12 sec
|
Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are already sorted:
Technique
|
Time to sort 100k sorted values (in msec)
|
Time to sort 100k sorted values (in sec)
|
Insertion sort
|
4ms
|
0 sec
|
Selection Sort
|
2182 msec
|
2 sec
|
Bubble Sort
|
7088 msec
|
7 sec
|
So based on all of the above results, we can see that Insertion sort is consistently better performing among all of the 3 techniques, followed by Selection Sort and bubble sort in that order.