A heap data-structure serves a single purpose of returning the node with the maximum (or the minimum) value (often referred to as the key). Since the requirement is simple, it takes only O(1) to locate the element with maximum (or minimum) value and O(logN) to extract this element from the tree. If the heap stores element such that it returns the minimum element, then it is called a min-heap. On the other hand, if the heap stores element such that it returns the maximum element, then it is called a max-heap. On this page, we would focus on min-heap.
An important usage of heaps is in building priority queues, where the requirement is to return the element that has the maximum priority; that would be a max-heap. Priority queues are used in several applications. As an example, if we have a queue of processes that need to be scheduled at the processor, we can use a priority queue to enqueue them. The scheduler can extract the task with the highest priority from the list. Since the operation to extract the highest element is O(logN), the time to find out the next process for scheduling is small.
Another example for usage of priority is that of an event simulator. An event simulator enqueues various tasks and each task has an associated time for its execution. At any given time, the next task that needs to be executed would be the one with the smallest value of associated time. A priority queue can insert each new task based on this time of execution. When the event simulator finishes the current task, it can use the priority queue to extract the task with the smallest time value.
As a significant bonus, we can represent a heap by using a simple array and thus avoid the overhead of using left, right, or parent pointers for navigation that is used with binary trees. The root of the tree is the first element in the array (the element sitting at index 0 of the array). For a given node, sitting at index "i" of the array, its left-child, right-child and its parent can be easily located as follows:
left-child = heap_array[2 * i + 1] right-child = heap_array[2 * i + 2)] parent = heap_array[floor((i - 1)/2)]
In the figure below, the heap_array is [1, 2, 4, 32, 21, 89, 90, 45, 89, 67], then for node 2 (with i as 1), the root would be node 1 (with index as floor((1-1)/2) or 0), left-child would be 4 (with index as 2 * 1 + 1 or 3) and right-child would be 21 (with index as (2 * 1 + 2) or 4).
1 / \ 1 / \ | / \ | (parent node) 2 4 2 / \ / \ (left node) / \ (right node) / \ / \ / \ 32 21 89 90 32 21 / \ / / \ / 45 89 67 ---------------------------------------------- | 1 | 2 | 4 | 32 | 21 | 89 | 90 | 45 | 89 | 67 | Values ---------------------------------------------- 0 1 2 3 4 5 6 7 8 9 Index Figure: Representation of a Heap
Why does this simple representation in the form of an array works for heaps but not for binary trees? The answer lies in the fact that as a heap grows, the elements are added to the lowest depth (let us call it level) of the tree from the left side; as the heap grows, elements get added to the current level till it completes the current level. After that, the next element is added to the level lower than the current level. In the above example, the top level is that of {1}, followed by its two children {2,4} that form the second level. Next, the child nodes of nodes 2 and 4 form the third level {5,21,10,9}, where {5,21} are children of node 2 and {10,9} are children of node 4. Thus, this method of addition ensures that a new level would get added only when the current level is filled.
The above logic does not hold not true for a binary search tree, where it can easily get unbalanced and one side can have a much higher depth (level) than the other. For example, if the elements of the heap_array were to be added in the same sequence as above (1 is inserted first, followed by other elements, and 9 in inserted in the last), the tree would look like as follows. The top level would be once again {1} and it would have the same two children {2,4}, but next, 5 would become a child of 4 and not 2. Next, 21 would become a child of 5, thus, at this point, there would be only one element at second and third levels. If we were to use an array, it would be very inefficient. Hence, on grounds of efficiency, it is not viable to have an array representation for a binary search tree.
Heaps have one simple requirement: value of the parent node should be smaller than values of its left-child node or right-child node. Thus, for the heap_array, the value of the root node (i.e. 1) is smaller than its two child nodes {2,4}. Please note that binary search trees have a different requirement. For a parent node, the right child node must have a higher value than that of the left child. Thus, the value of the parent node lies in between the values of its two child nodes. Thus, various operations for a heap simply try to preserve the property that the value of the parent should be smaller than the values of its two child nodes.
Let us start with the extraction method for the heap -- this method allows us to retrieve the root element, which is the smallest element of the tree.
During extraction, we delete the current value of the root and also decrements the size of the heap. Deleting the root is fairly simple. We copy the value of the last element of the array into the root (which is the first element) and then rebuild the heap so that the basic property of heaps holds true again: value(parent-node) < min(value(left-child), value(right-child)) for a min-heap.
The above process is referred to as heapify. In heapify, we start at a given node (in case of extraction, it is the root node), and then compare if the value of current node is greater than either of its two children. If we find that the value is greater than any one of them, then we swap values of the parent node and its lower child node. Next, we do heapify on the child node for which the value was swapped. We continue this till we reach the bottom of the tree. Since this method involves traversing the depth of the tree, the complexity of extraction is O(logN).
10 22 12 12 / \ / \ / \ / \ / \ / \ / \ / \ 12 14 ---> 12 14 ---> 22 14 ---> 21 14 / \ / \ / \ / / \ / / \ / / \ / \ / \ / / \ / / \ / 32 21 89 22 32 21 89 32 21 89 32 22 89 Extracting Place last Call Extraction Root node 10 node at the heap_heapify_from_top() done! top and call for 22 heap_heapify_from_top() for 22 ---------------------------------- | 10 | 12 | 14 | 32 | 21 | 89 | 22 | ---------------------------------- Figure: Heap Extraction
Deletion of a node follows a similar logic as that of extract. However, instead of calling delete at the top, first we look for the element and locate its index (aka counter). Once we have found the index of the node that we want to delete, we simple follow the extraction method: replace the current node with the last element of the heap, reduce the size of the heap, and then call heapify from the given index of the node.
10 10 10 / \ / \ / \ / \ / \ / \ 12 14 -----> 89 14 -----> 21 14 / \ / / \ / \ / \ / / \ / \ 32 21 89 32 21 32 89 ----------------------------- | 10 | 12 | 14 | 32 | 21 | 89 | ----------------------------- Figure: Heap Deletion (node with value 12)
Insertion works in a manner that is rather opposite to both extraction and deletion. For that, we increase the size of the heap and place the new element at the end of the array; we call realloc() if the current size of the array is not sufficient. We call realloc() to allocate a batch of storage once we need it. In this example, when the size becomes insufficient, we increase the size of the array by SIZE_OF_BLOCK_ALLOCATION storage units. Next, we start to walk from this element towards the root and as we find a parent that is inconsistent with the heap-rule, then we swap the two and continue from there till we reach the root.
Let us illustrate the insertion process using the figure below for a min-heap. We insert a new node with value 2 (shown here as Z). We keep Z at the end of the array and travel upwards towards the root. During this walk, if we find that the parent is more than the Z, then we swap the two.
10 10 Z / \ / \ / \ / \ / \ / \ 12 14 -----> 12 Z -----> 12 10 / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ / \ 32 21 89 Z 32 21 89 14 32 21 89 14 ------------------------------------ | 10 | 12 | 14 | 32 | 21 | 89 | Z(=2)| ------------------------------------ Figure: Heap Insertion
Now that we have some idea about workings of a heap, let us try to code it up! For that, we begin with a simple implementation of a heap that stores integer values. Our implementation builds a min-heap. The implementation uses functions to add elements to the heap (heap_add()), to extract the min element (heap_extract()), to delete elements from the heap (heap_delete()), and to prints elements of a heap (heap_print()).
With that, we provide the implementation below. We would add description of the program at the end of the program.
#include <stdio.h> #include <stdlib.h> #include <math.h> /* Global Variables*/ int *heap_array = NULL; int heap_max_size = 0; int heap_current_size = 0; #define SIZE_OF_BLOCK_ALLOCATION 10 void heap_heapify_from_top (int counter) { int temp_val = 0; int child_counter; int has_left_child = 0; int has_right_child = 0; if ((2 * counter + 1) < heap_current_size) has_left_child = 1; if ((2 * counter + 2) < heap_current_size) has_right_child = 1; /* Now, let us find the lowest of the two children */ if (has_left_child && has_right_child) { if (heap_array[2* counter + 1] < heap_array[2*counter + 2]) child_counter = 2 * counter + 1; else child_counter = 2 * counter + 2; } else if (has_left_child) { child_counter = 2 * counter + 1; } else if (has_right_child) { child_counter = 2 * counter + 2; } else { return; } if (heap_array[counter] > heap_array[child_counter]) { temp_val = heap_array[counter]; heap_array[counter] = heap_array[child_counter]; heap_array[child_counter] = temp_val; heap_heapify_from_top(child_counter); } return; } void heap_extract () { if (heap_current_size == 0) { printf("The heap is empty\n"); return; } printf("Extract: Min node is %d\n", heap_array[0]); heap_array[0] = heap_array[heap_current_size-1]; heap_current_size--; if (heap_current_size != 1) { heap_heapify_from_top(0); } return; } void heap_insert_heapify_from_bottom (int counter) { int parent = (int) floor((double)counter/2); int temp_val; if (counter == 0) { return; } if (heap_array[parent] > heap_array[counter]) { temp_val = heap_array[counter]; heap_array[counter] = heap_array[parent]; heap_array[parent] = temp_val; } heap_insert_heapify_from_bottom(parent); } int heap_add (int value) { if (heap_current_size == heap_max_size) { heap_max_size += SIZE_OF_BLOCK_ALLOCATION; heap_array = (void*)realloc(heap_array, heap_max_size * sizeof(int)); if (!heap_array) { printf("realloc failed\n"); return -1; } } heap_array[heap_current_size] = value; heap_insert_heapify_from_bottom(heap_current_size); heap_current_size++; return 0; } int heap_delete_recursively (int counter, int value) { int child_counter; if ((counter < 0) || (counter >= heap_current_size)) return -1; if (heap_array[counter] == value) { heap_array[counter] = heap_array[heap_current_size-1]; heap_current_size--; heap_heapify_from_top(counter); return 0; } child_counter = 2 * counter + 1; if (child_counter >= heap_current_size) { return -1; } if (heap_delete_recursively(child_counter, value) == 0 ) { return 0; } child_counter = 2 * (counter + 1); if (child_counter >= heap_current_size) { return -1; } if (heap_delete_recursively(child_counter, value) == 0) { return 0; } return -1; } int heap_delete (int value) { return (heap_delete_recursively(0, value)); } void heap_print () { int counter = 0; printf("\nPrinting the Heap: \n"); printf("Heap-size: %d (Max-size: %d)\n", heap_current_size, heap_max_size); for (counter = 0; counter < heap_current_size; counter++) { if (counter == 0) printf("%d(p=NULL) ", heap_array[counter]); else printf("%d(p=%d) ", heap_array[counter], heap_array[(int)floor((double)(counter-1)/2)]); } printf("\n"); } int main (int argc, char *argv[]) { int return_val, min_node, found, i, err; int input_array[] = {80, 5, 64, 10, 44, 94, 15}; int len = sizeof(input_array)/sizeof(input_array[0]); for (i = 0; i < len; i++) { heap_add(input_array[i]); } heap_print(); /* Extract the minimum */ heap_extract(); /* Extract the minimum, again */ heap_extract(); heap_print(); /* Delete a node */ return_val = heap_delete(80); if (return_val == -1) { printf("Not good! The last deletion failed \n"); return -1; } heap_print(); return 0; }
The above file contains with three global variables. The first one is a "int *heap_array" and this represents an array that holds "integer" values. Next, we have heap_max_size, which shows the maximum size of the heap based of the previous malloc. As we keep adding newer elements to the heap, at some point we would exceed the allocated maximum size and at that point, the implementation would realloc the heap -- thus, the value of heap_max_size would grow in bursts. The third global is the heap_current_size member, which stores the current number of elements in the heap. This value would always be less than the heap_max_size. These global variables are initialized in their declaration itself.
The heap_extract() function is the one that extracts the root -- it returns the minimum element. As discussed above the heap_extract() replaces the root node with the last node and then calls heapify from the root (by internally calling the heap_heapify_from_top()() function) and continues towards the bottom as long as it finds a node that does not follow the heap rule.
Next, we can see the add/delete functions: heap_add() and heap_delete(). These two functions add and delete an integer to the heap, respectively. Similar to the heap_extract(), the heap_delete() also relies upon doing a heapify from the top to bottom. However, contrary to these two, the heap_add() follows the approach of doing a heapify starting from the bottom and moving upwards. When adding an element to the heap, the heap_add() also does a malloc(), when it runs out of currently allocated space. To avoid frequent mallocs, this function mallocs SIZE_OF_BLOCK_ALLOCATION element in one shot. The above file defines this macro as 10. This is okay for demonstration purpose. However, this size must be tweaked for production-quality implementations. If we are dealing with a large number of elements (which for most cases, would be the case), then we would need to bump up SIZE_OF_BLOCK_ALLOCATION to an appropriately higher value.
The implementation has a function for printing elements in the heaps. The print functionality prints the heap as an array but with each element, it also prints the value of its parent as well. This format allows us to visualize the tree slightly better!
Towards the end of the above file, we have the main function that triggers various heap operations. It begins by adding elements from an integer array (input_array) using the heap_add() API. After the elements are added to the heap, the main function calls various routines like heap_print(), heap_extract(), and heap_delete() on the array. The heap_extract() prints the minimum value from the heap. The heap_delete() function accepts the value of the element that needs to be deleted and if present in the heap, it deletes that element from the heap.
To compile the above implementation, we use gcc. Here is the output.
user@codingbison $ gcc heap.c -o heap -lm user@codingbison $ ./heap Printing the Heap: Heap-size: 7 (Max-size: 10) 5(p=NULL) 10(p=5) 44(p=5) 15(p=10) 80(p=10) 94(p=44) 64(p=44) Extract: Min node is 5 Extract: Min node is 10 Printing the Heap: Heap-size: 5 (Max-size: 10) 15(p=NULL) 64(p=15) 44(p=15) 94(p=64) 80(p=64) Printing the Heap: Heap-size: 4 (Max-size: 10) 15(p=NULL) 64(p=15) 44(p=15) 94(p=64) user@codingbison $
The earlier implementation provided a demonstration of heap and is a simple start for understanding inner workings of a heap. However, the above implementation would need several additions to make it a more generic production-style software.
Among other things, the above implementation stores only an integer. In most cases this would not be enough and so we provide another implementation that is more generic in nature. Next, the above program is monolithic. A generic implementation should be more modular with a header file, an implementation file, and an application file. Lastly, the heap_array is a global -- when we have multiple applications using it, a global variable would not work.
If you are interested, you can find the complete program for advanced implementation at our following page: Advanced Implementation of Heaps.