Lecture 5

Heapsort

(Complete) Binary Trees

This should be a Data Structures review for most people. Still, let's refresh our memories about binary trees. A binary tree is a data structure that is:
  1. Empty.
    or
  2. A node with two children, the left child and the right child, each of which is a binary tree.
Here is a sample binary tree:
                       1
                      / \
                     /   \
                    /     \
                   /       \
                  2         3
                 / \       / \
                /   \     /   \
               /     \   /     \
              6       7 4       5
             / \     /         /
            9  10   11        8
A node from which a binary tree extends is called the root node for that binary tree. The root of the entire tree above is node #1. Note that "root" is a recursive concept; the root of the right subtree of 1 is 3.

A leaf node is a node with no children (i.e., both children are empty binary trees). Above, 9, 10, 11, 4, and 8 are leaves. An internal node is a node that isn't a leaf. The degree of a node is the number of children that node has; it is either 0, 1 or 2.

The depth of a node is the number of edges in the path from the root to that node. For instance, node 4 is at depth 2; there are two edges from 1 to 4. Node 11 is at depth 3. Node 1 is at depth 0, since there are no edges in a path from 1 to itself.

The height of a tree is the maximum of the depths of all the nodes. So the tree above is of height 3.

All of the nodes with depth d are said to occupy level d. So level 1 above is the set of nodes {2, 3}.

A complete binary tree is a tree in which all leaf nodes are at the same level and all internal nodes have degree 2. How many nodes are in level d of a complete binary tree? Since level 0 has 1 node and with each level the number of nodes doubles, level d has 2d nodes. How many nodes are in an entire complete binary tree of height h? The sum of the number of nodes at each level:

h
2d = 20 + 21 + 22 + ... + 2h
d=0
This turns out to be 2h+1-1. A fun visual proof of this is to consider each depth, from 0 to h, in it's (h+1)-bit binary unsigned representation:
000000...0001 = 20 = 1
000000...0010 = 21 = 2
000000...0100 = 22 = 4
000000...1000 = 23 = 8
...
001000...0000 = 2h-2
010000...0000 = 2h-1
100000...0000 = 2h
So each binary number has a different bit "on" with all the other bits "off." Adding all of these binary numbers together results in all h+1 bits "on":
111111...1111
which is the highest number that can be represented in (h+1)-bit unsigned arithmetic. If we add one to this, it's like going from 9999 to 10000; we use one more bit (now we're up to h+2 bits) and get:
1000000...0000 = 2h+1
That sum is thus 2h+1-1.

Since the number of nodes in a complete binary tree of height h is 2h+1-1, a complete binary tree with n nodes must be of height h = log2 (n+1) - 1 = O(ln n).

Heaps

A binary heap is defined as follows:
  1. It it an "almost" complete binary tree. That is, the bottom level of a heap of height h is partially filled from left to right, and may only have from 1 to 2h nodes. This may make some of the nodes at level h-1 into leaf nodes. So a binary heap of height h may have anywhere from 2h to 2h+1-1 nodes.
  2. It satisfies the heap property. Each node has some ordered value associated with it, like a real number. The heap property asserts that a parent node must be greater than or equal to both of its children.
A binary heap is easily represented using an array, let's call it A[1..length(A)]. The size of the heap will be called heap-size(A) (obviously heap-size(A) <= length(A)). We can use the first element of the array, element #1, as the root of the heap. The left child of array element i is at element 2i, and the right child is at element 2i+1. Also, the parent of an element at node i is at half of i. An element can tell whether it is a left or right child simply be checking whether it is even or odd, respectively. We define these "algorithms" for finding indices parents and children in pseudocode:
	Parent (i)
		return floor (i / 2)

	Left (i)
		return 2 * i

	Right (i)
		return 2 * i + 1
You can tell if an array element i is in the heap simply by checking whether i <= heap-size(A). Note: this scheme doesn't work in general with binary (possibly incomplete) trees, since representing leaf nodes at depths less than the height of the tree minus 1 isn't possible, and if it were, it would waste space.

This representation of an almost complete binary tree is pretty efficient, since moving around the tree involves multiplying and dividing by two, operations that can be done with simple shifts in logic, and adding one, another simple instruction. Asymptotically speaking, each of the heap node access functions above consume O(1) time.

Let's look at an algorithm to determine whether an array is really a heap, i.e., all nodes satisfy the heap property:

	Is-Heap (A)
		for i in 1..heap-size(A) do

			/* if heap property is violated, not a heap */

			if (A[i] < A[Parent(i)]) return False
		end for

		/* no violations?  is a heap. */

		return True
If the heap contains n elements, then this algorithm does O(n) comparisons (or array accesses or whatever metric you want to use).