# Putting more thoughts on heap and heapify

*23 Feb 2017*

In a recent conversation, I was asked what the time complexity of `heapify`

is. I said O(N * Log N) confidently. After putting more thoughts on it, I found it’s not.

I read through the source code of Python3’s builtin lib `heapq`

, which provides a handful of methods such as `heapify`

, `heappop`

and `heappush`

. I found there is a bit more complicated math behind it, and a trade-off that the author made in the implementation.

Just be clear, in this article, heap is binary min-heap.

### What is the complexity of heapify?

The builtin method `heapify`

converts a given list into a min-heap. A min-heap means the first element is the smallest, while any children has larger value than their parent node. The detail of building a heap is very well elaborated in Wikipedia and CLRS.

If you frequently use heap and know a little bit internal operation of heap like me, you might think confidently that `heapify`

is an N*logN operation(at least I did). But after an in-depth mathematical analysis, the conclusion is it could be O(N) or N*LogN.

Let’s start.

There are two important operations when building the heap: `sift_up`

and `sift_down`

.

**sift_down**

`sift_down`

is the operation that moves the element towards the root node if its value smaller than parent. Here is the code.

```
def sift_down(heap, bottom, i):
ele = heap[i]
while i > bottom:
parent_of_i = (i - 1) >> 1
if heap[i] < heap[parent_of_i]:
heap[i] = heap[parent_of_i]
i = parent_of_i
else:
break
heap[i] = ele
```

The idea is fairly simple, and the complexity is quite obvious: O(Log N). Not that in some text book like CLRS, the sifting process swaps values.

**sift_up**

`sift_up`

is the operation that moves the element towards the leaf if its value larger than any child. There are two ideas on `sift_up`

.

The first one is straight forward as described above:

- Comparing it’s value with child(if any).
- Swap with the child if its value larger than element.(Choose the smaller one if both children have larger value)
- Until reach leaf.

The code will be:

```
def sift_up(heap, pos):
while pos < len(heap):
left, right = pos * 2 + 1, pos * 2 + 2
smaller = pos
if left < len(heap) and heap[left] < heap[smaller]:
smaller = left
if right < len(heap) and heap[right] < heap[smaller]:
smaller = right
if smaller != pos:
heap[pos], heap[smaller] = heap[smaller], heap[pos]
pos = smaller
else:
break
```

The complexity is O(Log N) at first glance, but if we look at it closely, the complexity depends on which level of the element.

The second idea of `sift_up`

is:

- Moving the element all the way up to leaf, choosing the path of smaller child.
- Sift down to the level no lower than original position.

The code here is copied from Python 3 lib:

```
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
childpos = 2*pos + 1
while childpos < endpos:
rightpos = childpos + 1
if rightpos < endpos and heap[rightpos]<heap[childpos]:
childpos = rightpos
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
heap[pos] = newitem
_siftdown(heap, startpos, pos)
```

This idea looks cumbersome since it almost doubled the moving path, but we’ll get back to it later.

Anyway, either idea could be used in building a heap, and we now get back to the topic: **what is the time complexity of heapify**?

`heapify`

is fairly simple since the `sift_up`

and `sift_down`

are introduced.

```
def heapify(heap):
for i in reversed(range(len(heap)//2)):
_siftup(x, i)
```

People like me would say the time complexity is so obvious: O(N*Log N). But let’s do some math here.

Assuming there are `h`

levels of the heap. At the leaf leve, there are `2^h`

nodes, but `heapify`

does not operate on this level. At the next level, nodes number is `2^(h-1)`

, and each node might sift 1 level. At 3rd level there are `2^(h-2)`

nodes and each might move 2 level.

Therefore, we get the total moves for the heap is:

This result looks a bit frustrating, but if we can recall some basic knowledge from middle school: the sum of geometric series whose common ratio smaller than 1:

Then we take derivative on both sides with respect to x:

If we put x = 1/2 here:

Divided by 2 on both sides:

Get back to our T(n):

Remember the n = 2^{h}-1 is the total number of heap elements, so we have h = log_{2}^{n+1}.

Putting them together, we get: T(n) <= 2*n - 2

Clearly, `O(N) = N`

.

### Which sift_up is better?

We have introduced two different sift_up methods above: a faster one and a slower one.

In Python’s heapq.py library, the algorithm is implemented in the slower way. The reason behind this is `heappop`

need sift_up too. (Not finished.)