Knowledge Base > Algorithm > Sorting
Overview
Merge Sort is a O(n⋅lgn) comparison-based sorting algorithm, and is an example of divide and conquer strategy (of method 3.7.3.1).
Algorithm
Pseudocode
|
|
Conceptually, the algorithm recursively splits the list to be sorted into two groups, until we have each sub-list size of length 1, in which case the list itself is returned. Next,merges the two sublists into a sorted sequence.
Loop Invariant
Statement
At the start of each iteration of the loop in lines 12-17, the subarray A[p…k - 1]
contains the k - p smallest elements of L and R in sorted order. Also L[i] and R[j]
are the smallest elements of L and R that have not been copied back into
A.
Initialization
Before the 1st iteration of the lop, we have k = p. So the subarray A[p…k - 1]
contains 0 smallest element of L and R. And since, i = j = 1, both L[i] and R[j] are
the smallest elements of the arrays that have not been copied into A. And both L are
R are sorted.
Maintenance
Suppose L[i] ≤ R[j], then L[i] is the smallest element not copied into A. Since
A[p…k - 1] is k - p smallest elements, L[i] is (k - p + 1)th smallest element. So, line
14 copies L[i] into A[k],A[p…k] contains k - p + 1 smallest elements. Incrementally,
k + i give loop invariant. If L[i] > R[j], then lines 16, 17 perform the appropriate
action to maintain loop invariant.
Termination
When, k = r + 1, A[p…k - 1] = A[p…r] contains k - p smallest element in sorted
order. This gives, k - p = r + 1 - p smallest elements in sorted order. The arrays L
and R together contains n1 + n2 + 2 = r -p + 3 elements, all but the two largest ∞
have been copied back into A.
Thus, prove the correctness of Merge-Sort.
Time Complexity
If the running time of merge sort for a list of length n is T(n), then the recurrence function is defined as follow:
Let us first see how to illustrate this recursive relation in tree model:
The total is cn ⋅ lg n + cn
T(n) = n ⋅ T(1) + cn ⋅ lg n = Θ(n ⋅ lg n)
Next, we apply divide and conquerer method (of method 3.7.3.1) to solve the
recurrence,
![T(n) = 2⋅T (n-)+ cn
2
= 2⋅[2⋅T(n-)+ c⋅ n-]+cn
n 4 2
= 4⋅T (-)+ 2⋅cn
4 n n
= 4⋅(2⋅T (8)+ c⋅-4)+ 2⋅cn
n-
= 8⋅T (8)+ 3⋅cn
= ...
= 2k ⋅T(-n )+ k⋅cn
2k n
= 2lgn ⋅T (-lgn)+ lgn ⋅cn (2k = n ⇒ k = log2n)
= ... 2
= n⋅T (1) +cn ⋅lg(n)
= Θ(n ⋅lgn)](kb120x.png)
Space Complexity
Merge sort does not sort in place, meaning that the memory size of the input must be allocated for the sorted output to be stored in whenever the merge operation is performed. For each level of divide-and-conquer procedure, an extra space of O(2n) is needed, and the height of the recursive tree is lg n. Thus, a total of O(n⋅ lg n) extra space is used. However, sorting in-place is possible but complicated, and will offer little performance gains in practice (even it still runs at Θ(n ⋅ lg n) time).
Stability
Merge Sort is stable, meaning it preserves the input order of equal elements in the sorted output. After comparison, this method put the current element, left[i] or right[j], into a right position, A[k], in sorted list. This preserve the same order of occurrence for two duplicated elements after the list is sorted, because the identical elements in the left sublist has is created earlier than elements in the right sublist. However, if the relational test remove equality relation, the order of occurrence for the same elements is reversed.
Source Code