CodingBison

A heap data-structure serves a single purpose of returning the node with the maximum (or the minimum) value (often referred to as the key). Since the requirement is simple, it takes only O(1) to locate the element with maximum (or minimum) value and O(logN) to extract this element from the tree. If the heap stores element such that it returns the minimum element, then it is called a min-heap. On the other hand, if the heap stores element such that it returns the maximum element, then it is called a max-heap. On this page, we would focus on min-heap.

An important usage of heaps is in building priority queues, where the requirement is to return the element that has the maximum priority; that would be a max-heap. Priority queues are used in several applications. As an example, if we have a queue of processes that need to be scheduled at the processor, we can use a priority queue to enqueue them. The scheduler can extract the task with the highest priority from the list. Since the operation to extract the highest element is O(logN), the time to find out the next process for scheduling is small.

Another example for usage of priority is that of an event simulator. An event simulator enqueues various tasks and each task has an associated time for its execution. At any given time, the next task that needs to be executed would be the one with the smallest value of associated time. A priority queue can insert each new task based on this time of execution. When the event simulator finishes the current task, it can use the priority queue to extract the task with the smallest time value.

As a significant bonus, we can represent a heap by using a simple array and thus avoid the overhead of using left, right, or parent pointers for navigation that is used with binary trees. The root of the tree is the first element in the array (the element sitting at index 0 of the array). For a given node, sitting at index "i" of the array, its left-child, right-child and its parent can be easily located as follows:

 left-child  = heap_array[2 * i + 1]
 right-child = heap_array[2 * i + 2)] 
 parent      = heap_array[floor((i - 1)/2)]

In the figure below, the heap_array is [1, 2, 4, 32, 21, 89, 90, 45, 89, 67], then for node 2 (with i as 1), the root would be node 1 (with index as floor((1-1)/2) or 0), left-child would be 4 (with index as 2 * 1 + 1 or 3) and right-child would be 21 (with index as (2 * 1 + 2) or 4).

                             1
                           /  \                                 1  
                          /    \                                |    
                         /      \                               | (parent node)
                        2        4                              2   
                       / \      / \               (left node)  / \ (right node)
                      /   \    /   \                          /   \    
                     32    21 89   90                        32    21   
                    / \    / 
                   /   \  /   
                  45  89 67

                          ----------------------------------------------
                         | 1 | 2 | 4 | 32 | 21 | 89 | 90 | 45 | 89 | 67 | Values
                          ----------------------------------------------
                           0   1   2    3    4    5    6   7    8    9   Index

                                  Figure: Representation of a Heap 

Why does this simple representation in the form of an array works for heaps but not for binary trees? The answer lies in the fact that as a heap grows, the elements are added to the lowest depth (let us call it level) of the tree from the left side; as the heap grows, elements get added to the current level till it completes the current level. After that, the next element is added to the level lower than the current level. In the above example, the top level is that of {1}, followed by its two children {2,4} that form the second level. Next, the child nodes of nodes 2 and 4 form the third level {5,21,10,9}, where {5,21} are children of node 2 and {10,9} are children of node 4. Thus, this method of addition ensures that a new level would get added only when the current level is filled.

The above logic does not hold not true for a binary search tree, where it can easily get unbalanced and one side can have a much higher depth (level) than the other. For example, if the elements of the heap_array were to be added in the same sequence as above (1 is inserted first, followed by other elements, and 9 in inserted in the last), the tree would look like as follows. The top level would be once again {1} and it would have the same two children {2,4}, but next, 5 would become a child of 4 and not 2. Next, 21 would become a child of 5, thus, at this point, there would be only one element at second and third levels. If we were to use an array, it would be very inefficient. Hence, on grounds of efficiency, it is not viable to have an array representation for a binary search tree.

Heaps have one simple requirement: value of the parent node should be smaller than values of its left-child node or right-child node. Thus, for the heap_array, the value of the root node (i.e.. 1) is smaller than its two child nodes {2,4}. Please note that binary search trees have a different requirement. For a parent node, the right child node must have a higher value than that of the left child. Thus, the value of the parent node lies in between the values of its two child nodes. Thus, various operations for a heap simply try to preserve the property that the value of the parent should be smaller than the values of its two child nodes.

Basic Operations

Extraction

Let us start with the extraction method for the heap -- this method allows us to retrieve the root element, which is the smallest element of the tree.

During extraction, we delete the current value of the root and also decrements the size of the heap. Deleting the root is fairly simple. We copy the value of the last element of the array into the root (which is the first element) and then rebuild the heap so that the basic property of heaps holds true again: value(parent-node) < min(value(left-child), value(right-child)) for a min-heap.

The above process is referred to as heapify. In heapify, we start at a given node (in case of extraction, it is the root node), and then compare if the value of current node is greater than either of its two children. If we find that the value is greater than any one of them, then we swap values of the parent node and its lower child node. Next, we do heapify on the child node for which the value was swapped. We continue this till we reach the bottom of the tree. Since this method involves traversing the depth of the tree, the complexity of extraction is O(logN).

         10                      22                     12                   12    
        /  \                    /  \                   /  \                 / \
       /    \                  /    \                 /    \               /   \
     12     14      --->     12     14      --->     22     14   --->     21    14
     / \    / \              / \    /                / \    /            / \    / 
    /   \  /   \            /   \  /                /   \  /            /   \  /  
   32  21 89   22          32   21 89              32   21 89          32  22 89  

   Extracting             Place last               Call                      Extraction
   Root node 10           node at the              heap_heapify_from_top()   done!
                          top and call             for 22 
                          heap_heapify_from_top()
                          for 22

                          ----------------------------------
                         | 10 | 12 | 14 | 32 | 21 | 89 | 22 |
                          ----------------------------------

                             Figure: Heap Extraction 

Deletion

Deletion of a node follows a similar logic as that of extract. However, instead of calling delete at the top, first we look for the element and locate its index (aka counter). Once we have found the index of the node that we want to delete, we simple follow the extraction method: replace the current node with the last element of the heap, reduce the size of the heap, and then call heapify from the given index of the node.

                             10                      10                     10   
                            /  \                    /  \                   /  \                 
                           /    \                  /    \                 /    \               
                         12     14    ----->      89     14   ----->     21     14   
                         / \    /                / \                    / \    
                        /   \  /                /   \                  /   \   
                       32  21 89               32   21                32   89            

                                      -----------------------------
                                     | 10 | 12 | 14 | 32 | 21 | 89 |
                                      -----------------------------

                             Figure: Heap Deletion (node with value 12) 

Insertion

Insertion works in a manner that is rather opposite to both extraction and deletion. For that, we increase the size of the heap and place the new element at the end of the array; we call realloc() if the current size of the array is not sufficient. We call realloc() to allocate a batch of storage once we need it. In this example, when the size becomes insufficient, we increase the size of the array by SIZE_OF_BLOCK_ALLOCATION storage units. Next, we start to walk from this element towards the root and as we find a parent that is inconsistent with the heap-rule, then we swap the two and continue from there till we reach the root.

Let us illustrate the insertion process using the figure below for a min-heap. We insert a new node with value 2 (shown here as Z). We keep Z at the end of the array and travel upwards towards the root. During this walk, if we find that the parent is more than the Z, then we swap the two.

                 10                      10                       Z             
                /  \                    /  \                     /  \
               /    \                  /    \                   /    \ 
             12     14    ----->      12     Z     ----->      12     10 
             / \    / \              / \    / \               / \     / \
            /   \  /   \            /   \  /   \             /   \   /   \
           32  21 89    Z          32   21 89   14          32  21   89  14

                          ------------------------------------
                         | 10 | 12 | 14 | 32 | 21 | 89 | Z(=2)|
                          ------------------------------------

                             Figure: Heap Insertion 

Advanced Implementation

Now that we are familiar with the basic steps, let us try to code it up! For that, we begin with an implementation of a min-heap. The implementation consists of three files: a header file that publishes the data-structures and heap APIs (let us call it "heap.h"), a C file that contains the implementation of these APIs (let us call it "heap.c") and an application C file that contains the main routine and invokes heap APIs to do various things (let us call it "main_heap.c"). Having a header file and an implementation-only file allows us to have applications add the header file and thereby, access the heap APIs. This means that we can use the same implementation for multiple applications and thus, reuse the code.

Before we go any further, if you are looking for a simpler implementation to understand the inner workings of a heap data-structure, we would like to recommend our page, where we provide a simpler version of the implementation: Simple Implementation of a Heap.

So, first things first. We provide our header file. Along with the data-structure definition, the header file also publishes the following four APIs (public functions).

 heap_init():    Initializes the global variables 
 heap_add():     Adds an element to the heap 
 heap_extract(): Returns the min element 
 heap_delete():  Deletes an element from the heap 
 #ifndef __HEAP_H___
 #define __HEAP_H___

 #define SIZE_OF_BLOCK_ALLOCATION  10 

 typedef struct heap_ {
     void **heap_array;     /* An array of void pointers */
     int heap_max_size;     /* Maximum allocated size for heap */ 
     int heap_current_size; /* Number of elements in the heap */ 
     int (*compare_func)(const void *, const void *); /* To compare "void *" elements */
 } heap_t;

 int heap_init(heap_t *heap, int (*compareFunc)(const void *, const void *));
 int heap_add(heap_t *heap, void *value); 
 void *heap_extract(heap_t *heap, int *err);
 int heap_delete(heap_t *heap, void *value); 
 #endif /* __HEAP_H___ */

The above header file starts with a heap_t data-structure that encapsulates the heap and various heap-related values. The heap_array is of type "void **heap_array" and thereby, represents an array that holds "void *" pointers. This allows us to hold pointer to any type of data. Next, the data-structure contains heap_max_size, which shows the maximum size of the heap based of the previous malloc. As we keep adding newer elements to the heap, at some point we would exceed the allocated maximum size and at that point, the implementation would realloc the heap. Thus, the value of heap_max_size would grow in bursts. The next member is the heap_current_size member, which stores the current number of elements in the heap. This value would always be less than the heap_max_size.

Lastly, the structure also has a member that stores a comparison callback function; this function allows applications to provide their own based on the type of data they need to store. Since we are storing "void *" as the data-type, we really do not know how to compare two values. Keeping this with the application allows us to provide a more flexible implementation.

For the heap to be a min-heap, the comparison callback function should returns -1 when the first value is lower than the second one, +1 when the first value is greater than the second one, and 0 when they are both equal. For a max-heap, we would need to revert the return values for the first two cases: return +1 when the first value is lower than the second and return -1 when the first value is greater than the second one.

Let us move on and see the file that implements the above APIs.

 #include <stdio.h>
 #include <stdlib.h>
 #include <math.h>
 #include "heap.h"

 int heap_init (heap_t *heap, int (*compare_func)(const void *, const void *)) {
     if (!heap) {
         return -1;
     }

     heap->heap_array = NULL; 
     heap->heap_max_size = heap->heap_current_size = 0; 
     heap->compare_func = compare_func;
     return 0;
 }

 void heap_heapify_from_top (heap_t *heap, int counter) {
     void *temp_val = NULL;
     int child_counter;
     int has_left_child = 0;
     int has_right_child = 0;

     if ((2 * counter + 1) < heap->heap_current_size)
         has_left_child = 1;

     if ((2 * counter + 2) < heap->heap_current_size)
         has_right_child = 1;

     /* Now, let us find the lowest of the two children */
     if (has_left_child && has_right_child) { 
         /* If compare_fun returns negative value, then the left child is smaller */
         if (heap->compare_func(heap->heap_array[2* counter + 1], 
                         heap->heap_array[2*counter + 2]) < 0) {
             child_counter = 2 * counter + 1;
         } else {
             child_counter = 2 * counter + 2;
         }
     } else if (has_left_child) {
         child_counter = 2 * counter + 1;
     } else if (has_right_child) {
         child_counter = 2 * counter + 2;
     } else {
         return; 
     }

     /* If compare_func returns negative value, then the child is smaller */
     if (heap->compare_func(heap->heap_array[child_counter], heap->heap_array[counter]) < 0) { 
         temp_val = heap->heap_array[counter];
         heap->heap_array[counter] = heap->heap_array[child_counter];
         heap->heap_array[child_counter] = temp_val;
         heap_heapify_from_top(heap, child_counter); 
     }
     return;
 }

 void *heap_extract (heap_t *heap, int *err) { 
     void *min_node = NULL;
     *err = 0;

     if (heap->heap_current_size == 0 ) {
         *err = -1;
         return NULL;
     }

     min_node = heap->heap_array[0];

     heap->heap_array[0] = heap->heap_array[heap->heap_current_size-1];
     heap->heap_current_size--;

     if (heap->heap_current_size != 1) {
         heap_heapify_from_top(heap, 0);
     }
     return min_node;
 }

 void heap_insert_heapify_from_bottom (heap_t *heap, int counter) {
     int parent = (int) floor((double)counter/2);
     void *temp_val;

     if (counter == 0) {
         return;
     }

     /* If compare_func returns negative value, then the child is smaller */
     if (heap->compare_func(heap->heap_array[counter], heap->heap_array[parent]) < 0) {
         temp_val = heap->heap_array[counter];
         heap->heap_array[counter] = heap->heap_array[parent];
         heap->heap_array[parent] = temp_val;
     }
     heap_insert_heapify_from_bottom(heap, parent);
 }

 int heap_add (heap_t *heap, void *value) {
     if (heap->heap_current_size == heap->heap_max_size) {
         heap->heap_max_size += SIZE_OF_BLOCK_ALLOCATION;
         heap->heap_array = (void*)realloc(heap->heap_array, 
                         heap->heap_max_size * sizeof(void *));
         if (!heap->heap_array) {
             return -1;
         }
     }
     heap->heap_array[heap->heap_current_size] = value;
     heap_insert_heapify_from_bottom(heap, heap->heap_current_size);
     heap->heap_current_size++;
     return 0;
 }

 int heap_delete_recursively (heap_t *heap, int counter, void *value) {
     int child_counter; 

     if ((counter < 0) || (counter >= heap->heap_current_size))
         return -1;

     if (heap->compare_func(value, heap->heap_array[counter]) == 0) {
         heap->heap_array[counter] = heap->heap_array[heap->heap_current_size-1];
         heap->heap_current_size--;

         heap_heapify_from_top(heap, counter);
         return 0;
     }

     child_counter = 2 * counter + 1;
     if (child_counter >= heap->heap_current_size) {
         return -1;
     }

     if (heap_delete_recursively(heap, child_counter, value) == 0 ) {
         return 0;
     }

     child_counter = 2 * (counter + 1);
     if (child_counter >= heap->heap_current_size) {
         return -1;
     }

     if (heap_delete_recursively(heap, child_counter, value) == 0) {
         return 0;
     }
     return -1;
 }

 int heap_delete (heap_t *heap, void *value) {
     return (heap_delete_recursively(heap, 0, value)); 
 }

The above file holds the main implementation details. As discussed above the heap_extract() replaces the root node with the last node and then calls heapify from the root and continues towards the bottom as long as it finds a node that does not follow the heap rule. Similar to that, the heap_delete() also depends upon doing a heapify from the top to bottom. However, contrary to these two, the heap_add() follows the approach of doing a heapify starting from the bottom and moving upwards.

When adding an element to the heap, the heap_add() also does a malloc(), when it runs out of space. To avoid frequent mallocs, this function mallocs SIZE_OF_BLOCK_ALLOCATION element in one shot. The above file defines this macro as 10. This is okay for demonstration purpose. However, this size must be tweaked for production-quality implementations. If we are dealing with a large number of elements (which for most cases, would be the case), then we would need to bump up SIZE_OF_BLOCK_ALLOCATION to an appropriate higher value.

Lastly, here is the application file that uses the earlier heap implementation.

 #include <stdio.h>
 #include <stdlib.h>
 #include <strings.h>
 #include <math.h>
 #include "heap.h"

 int heap_compare (const void *value1, const void *value2) {
     if ( *(int *)value1 < *(int *)value2 ) {
         return -1;
     } else if ( *(int *)value1 > *(int *)value2 ) {
         return 1;
     } else {
         return 0;
     }
 }

 void heap_print (heap_t *heap) {
     int counter = 0;

     printf("\nPrinting the Heap: \n");
     if (heap == NULL){
         return;
     }

     printf("Heap-size: %d (Max-size: %d)\n", 
             heap->heap_current_size, heap->heap_max_size);
     for (counter = 0; counter < heap->heap_current_size; counter++) {
         if (counter == 0) { 
             printf("%d(p=NULL) ", *(int *)(heap->heap_array[counter]));
         } else {
             printf("%d(p=%d) ", 
                     *(int *)(heap->heap_array[counter]), 
                     *(int *)(heap->heap_array[(int) floor((double)(counter-1)/2)]));
         }
     }
     printf("\n");
 }

 int main (int argc, char *argv[]) {
     heap_t heap; 
     void *min_node = NULL;
     int return_val, found, i, err;
     int input_array[] = {80, 5, 64, 10, 44, 94, 15};
     int len = sizeof(input_array)/sizeof(input_array[0]);

     heap_init(&heap, heap_compare);

     for (i = 0; i < len; i++) {
         heap_add(&heap, (void *)&input_array[i]);
     }
     heap_print(&heap);

     /* Extract the minimum */
     min_node = heap_extract(&heap, &err);
     if (err == 0) {
         printf("The minimum is: %d \n", *(int *)min_node);
     } else {
         printf("Hmm.. No minimum for this tree \n");
     }

     /* Extract the minimum, again */
     min_node = heap_extract(&heap, &err);
     if (err == 0) {
         printf("The minimum is: %d \n", *(int *)min_node);
     } else {
         printf("Hmm.. No minimum for this tree \n");
     }
     heap_print(&heap);

     /* Delete a node */
     return_val = heap_delete(&heap, (void *)&input_array[0]);
     if (return_val == -1) {
         printf("Not good! The last deletion failed \n");
         return -1;
     }
     heap_print(&heap);
     return 0;
 }

The above main file starts with a heap structure (heap_t heap) and then does various heap operations with it. It begins by calling heap_init() and then passing heap_compare() as the comparison function for the heap. The heap_compare() function returns -1 when the first value is lower than the second one, +1 when the first value is greater than the second one, and 0 when they are both equal. This leads to a min-heap.

Next, the main() function adds elements from an integer array (input_array) using the heap_add() API. Please note that when adding, we use the address of the array element (&input_array[i]) -- this is because when storing the values in the heap, we need to pass a pointer and then type-case it to (void *) -- we can typecast any pointer to a (void *).

After the elements are added to the heap, the main function calls various routines like heap_print(), heap_extract(), and heap_delete() on the array. The heap_extract() returns the minimum value from the heap. The heap_delete() function accepts the value of the element that needs to be deleted and if present in the heap, it deletes that element from the heap.

We keep the print function with the application. The reason for that is since the heap stores a "void *", the core implementation of heap would not know how to print the "void *". Accordingly, it would be the responsibility of the application (the above file with the main() function) to provide the print functionality.

To bring all of these together, we pass both "main_heap.c" and "heap.c" file as arguments to gcc. We also pass "-lm" as an option to link maths module with gcc. Therefore, to compile, we can use "gcc main_heap.c heap.c -o heap -lm" and to run, we can use "./heap". Here is the output:

 user@codingbison $ gcc heap.c -o heap -lm 
 user@codingbison $ ./heap
 Printing the Heap: 
 Heap-size: 7 (Max-size: 10)
 5(p=NULL) 10(p=5) 44(p=5) 15(p=10) 80(p=10) 94(p=44) 64(p=44) 
 The minimum is: 5 
 The minimum is: 10 

 Printing the Heap: 
 Heap-size: 5 (Max-size: 10)
 15(p=NULL) 64(p=15) 44(p=15) 94(p=64) 80(p=64) 

 Printing the Heap: 
 Heap-size: 4 (Max-size: 10)
 15(p=NULL) 64(p=15) 44(p=15) 94(p=64) 
 user@codingbison $ 




comments powered by Disqus