Putting more thoughts on heap and heapify23 Feb 2017
In a recent conversation, I was asked what the time complexity of
heapify is. I said O(N * Log N) confidently. After putting more thoughts on it, I found it’s not.
I read through the source code of Python3’s builtin lib
heapq, which provides a handful of methods such as
heappush. I found there is a bit more complicated math behind it, and a trade-off that the author made in the implementation.
Just be clear, in this article, heap is binary min-heap.
What is the complexity of heapify?
The builtin method
heapify converts a given list into a min-heap. A min-heap means the first element is the smallest, while any children has larger value than their parent node. The detail of building a heap is very well elaborated in Wikipedia and CLRS.
If you frequently use heap and know a little bit internal operation of heap like me, you might think confidently that
heapify is an N*logN operation(at least I did). But after an in-depth mathematical analysis, the conclusion is it could be O(N) or N*LogN.
There are two important operations when building the heap:
sift_down is the operation that moves the element towards the root node if its value smaller than parent. Here is the code.
def sift_down(heap, bottom, i): ele = heap[i] while i > bottom: parent_of_i = (i - 1) >> 1 if heap[i] < heap[parent_of_i]: heap[i] = heap[parent_of_i] i = parent_of_i else: break heap[i] = ele
The idea is fairly simple, and the complexity is quite obvious: O(Log N). Not that in some text book like CLRS, the sifting process swaps values.
sift_up is the operation that moves the element towards the leaf if its value larger than any child. There are two ideas on
The first one is straight forward as described above:
- Comparing it’s value with child(if any).
- Swap with the child if its value larger than element.(Choose the smaller one if both children have larger value)
- Until reach leaf.
The code will be:
def sift_up(heap, pos): while pos < len(heap): left, right = pos * 2 + 1, pos * 2 + 2 smaller = pos if left < len(heap) and heap[left] < heap[smaller]: smaller = left if right < len(heap) and heap[right] < heap[smaller]: smaller = right if smaller != pos: heap[pos], heap[smaller] = heap[smaller], heap[pos] pos = smaller else: break
The complexity is O(Log N) at first glance, but if we look at it closely, the complexity depends on which level of the element.
The second idea of
- Moving the element all the way up to leaf, choosing the path of smaller child.
- Sift down to the level no lower than original position.
The code here is copied from Python 3 lib:
def _siftup(heap, pos): endpos = len(heap) startpos = pos newitem = heap[pos] childpos = 2*pos + 1 while childpos < endpos: rightpos = childpos + 1 if rightpos < endpos and heap[rightpos]<heap[childpos]: childpos = rightpos heap[pos] = heap[childpos] pos = childpos childpos = 2*pos + 1 heap[pos] = newitem _siftdown(heap, startpos, pos)
This idea looks cumbersome since it almost doubled the moving path, but we’ll get back to it later.
Anyway, either idea could be used in building a heap, and we now get back to the topic: what is the time complexity of
heapify is fairly simple since the
sift_down are introduced.
def heapify(heap): for i in reversed(range(len(heap)//2)): _siftup(x, i)
People like me would say the time complexity is so obvious: O(N*Log N). But let’s do some math here.
Assuming there are
h levels of the heap. At the leaf leve, there are
2^h nodes, but
heapify does not operate on this level. At the next level, nodes number is
2^(h-1), and each node might sift 1 level. At 3rd level there are
2^(h-2) nodes and each might move 2 level.
Therefore, we get the total moves for the heap is:
This result looks a bit frustrating, but if we can recall some basic knowledge from middle school: the sum of geometric series whose common ratio smaller than 1:
Then we take derivative on both sides with respect to x:
If we put x = 1/2 here:
Divided by 2 on both sides:
Get back to our T(n):
Remember the n = 2h-1 is the total number of heap elements, so we have h = log2n+1.
Putting them together, we get: T(n) <= 2*n - 2
O(N) = N.
Which sift_up is better?
We have introduced two different sift_up methods above: a faster one and a slower one.
In Python’s heapq.py library, the algorithm is implemented in the slower way. The reason behind this is
heappop need sift_up too. (Not finished.)