Chapter 3.9
Sorting Algorithm

Knowledge Base > Algorithm



Overview

Sorting algorithm is one of the fundamental problems of computer science, other than searching algorithm. Sorting algorithm always involve with the tradeoff between time and space, which is a situation that more memory used can reduce at the cost of slower program executed. For instance, Insertion Sort uses less space other than the array being sorted than what it costs for Merge Sort. However, Merge Sort performs better asymptotic running time, Θ(n lgn), than Insertion sort, Θ(n2).

Sorting algorithms provides a gentle introduction to a variety of core algorithm concepts, such as computational complexity, divide-and-conquer algorithms, data structures, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and etc.

Stability

A sorting method is said to be stable if it preserves the relative order of items with duplicated keys in the file. That is, a sorting algorithm is stable if whenever two elements with the same key appear to be in the same order of occurrence after the list is sorted. In other words, a unstable sorting algorithm may change the relative order of elements with duplicated keys.

Comparison Based Sort

Most of sorting algorithms nowadays are performed through comparison operations, called Comparison Sort. That is, we performs comparisons between elements to gain the order information about an input sequence , say (a1,a2,,an). For each given two elements, ai and aj, we perform one of the five tests to determine their relative order:

In addition, any comparison sort can be represented as a decision tree, with each internal node being a comparison and each leaf node being an ordering of (a1,a2,,an). Since each permutation leads to another different ordering of (a1,a2,,an), so, there are n! leaf nodes (permutations). From that point of view, the height of decision tree corresponds to the worst-case number of comparison. This implies that the lower bound on the height of all decision tree is a lower bound of running time of any comparison sort algorithm. The graph below shows a decision tree of insertion sort on (a1,a2,,an),

                                          a : a
                                           1  2            >
                               ≤
                a  : a                                                 a : a
                 2   3                                                  1  3
           ≤             >                                        ≤             >

(a1,a2,a3)                    a1 : a3                   (a2,a1,a3)                   a2 : a3
                         ≤            >                                        ≤             >

               (a1,a3,a2)                (a3,a1,a2)                    (a2,a3,a1)                (a3,a2,a1)

Theorem 3.9.0.1. Lower Bound of Comparison Sort
The lower bound on the height of the decision tree is a lower bound of running time of any comparison-based sorting algorithm.
Proof>
Let h be the height of decision tree, and the tree can have at most 2h leaf nodes (in the case of Complete Binary Tree).
From permutational point of view, the decision tree must have at least n! leaf nodes.
Thus,

   n! ≤ 2h
⇒  h ≥ lg(n!)(of equation 1.2.24)

⇒  h = Ω(nlg(n))                         (3.9.1)

Table of Sorting Methods

Method Worst TimeAverage Time# of comparison# of swaps Space Stable Note
Bubble Sort O(n2) O(n2) O(n2) O(n2) O(1) Yes Times are for best variant.
Selection Sort Θ(n2) Θ(n2) Θ(n2) O(n) O(1) No Outperform on small list of huge data set, when in case that writes are more expensive than reads.
Insertion Sort O(n2) O(n2) O(n) O(1) Yes Good for small data set.
Shell Sort No
Merge Sort Θ(n lg n) Θ(n lg n) O(n lg n) N.A Yes Best choice for sorting a sequential access data set, like linked list or tape drive
Quick Sort O(n2) Θ(n lg n) O(n lg n) O(n lg n) No The most practical general-purpose sorting method.
Heap Sort O(n lg n) O(n lg n) O(n lg n) O(n lg n) O(1) No Always runs at O(n lg n) time
Counting Sort O(n + k) O(n + k) N.A N.A O(2n + k) Yes Suitable when the keys have small range.
(LSB) Radix Sort O(n k) O(n k) N.A N.A O(n) Yes Suitable when either keys are small, or with a lexicographic sequence.
Bucket Sort Yes Assume keys to be uniformly distributed.