CodingBison - Binary Search Trees: Advanced Implementation

A binary search tree is a popular data structure that is well-optimized for various tasks like lookup, insertion, deletion, and traversal. However, unlike hash-tables, where the lookup can be typically of the order of O(1), a binary search tree provides a best lookup of O(logN). O(logN) is still a great lookup time, if we consider the overall purpose for which a binary search tree is designed. A binary search tree guarantees that both the insertion/deletion time and the lookup time would be of the order of O(logN); of course, the assumption is that the binary search tree is fairly balanced.

A binary search tree stores each input as a node. Each node in the tree (excluding the leaf nodes) can have a maximum of two child nodes that are represented by, let us say, "left" and "right" pointers. A binary search tree can also have a pointer to its parent -- this can come in handy when we need to navigate upwards in the tree. In the following figure, for node 20, the "parent" pointer points to node 40, the "left" pointer points to 15, and the "right" pointer points to node 23.

                      40                                  40
                     /  \                                 | 
                    /    \                                | (parent pointer)
                   20    44                              20
                  /  \     \             (left pointer) /  \ (right pointer)
                 /    \     \                          /    \
                15   23     48                        15    23
                     /      / \    
                    /      /   \ 
                   21    46    100

                             Figure: Binary Search Tree

Binary search tree is also sorted -- the right node on the tree is always greater than its parent node and the left node. Hence, a left node is less than its parent node and parent's right node. For example, in the above Figure, nodes sitting on the left sub-tree of the root node (with value 40) are all less than 40. Likewise, nodes sitting on the right sub-tree of the root node are all greater than 40.

Basic Operations

Search

Compared to a linked list, searching becomes faster with a binary search tree since it is by definition sorted. This is because at any point, we have the knowledge that the nodes present on the right side have a higher key than those of nodes present on the left. And so, at each node, we avoid the other potentially "half" of the elements that sit on the other sub-tree. In other words, if we find that the value that we are looking for is greater than the root, then we focus our search on the right sub-tree and avoid all the elements sitting in the left sub-tree. Elimination of half of the nodes at each step reduces the complexity to O(logN) as compared to O(N) complexity for a linear linked list. Put differently, searching for a binary search tree, instead of traversing the entire tree, we only traverse the depth of the tree and that is of the order of logN.

We demonstrate this using two examples in the figure below. In the first example, we search for node 47; since 47 is not present in the tree, we end up traversing the entire depth before we conclude that 47 is not present. The search path taken is: [40,44,48,46]; in the figure, we show the search path taken using double lines. In the second example, we search for a node with value 23. Since 23 is present in the tree and its depth is only two, we are able to locate the node in 3 searches. The search path taken is: [40,20,23].

                          40                                 40               
                         /  \\                             //  \               
                        /    \\                           //    \             
                       20     44                         20      44          
                      / \      \\                       / \\      \         
                     /   \      \\                     /   \\      \        
                    15   23      48                   15    23      48     
                         /      // \                       / \      / \   
                        /      //   \                     /   \    /   \
                       21     46   100                   21   28  46   100

                 Searching 47: Not Found           Searching 23: Found 

                             Figure: Searching a Binary Search Tree

Insertion

Inserting into this tree is a matter of finding the location where one can insert the current value -- the goal being that the new location should still respect the binary search tree logic -- nodes on the right should be greater than the nodes on the left. In the following figure, we show path for insertion of a node with value 28. We show the path for locating the location using double lines. We start at node 40, since 28 is less than 40, we go to the left value of 40, which is 20. Since 28 is greater than 20, we go to the right value of 20, which is 23. Since 28 is still greater than 23, we go the right of 23. But, right value of 23 is NULL, so, we insert 28 as the right child of node with value 23 and make 23 the parent of 28.

                          40                            40               
                         /  \                         //   \               
                        /    \      Inserting 28     //     \             
                       20     44     --------->     20      44          
                      / \      \                   / \\       \         
                     /   \      \                 /   \\       \        
                    15   23      48              15    23       48     
                         /      / \                   / \\     / \   
                        /      /   \                 /   \\   /   \
                       21     46   100              21    28 46   100

                             Figure: Binary Search Tree Insertion

Deletion

Deleting a node from the binary search tree is slightly involved. There are three cases that one needs to consider for deletion. In the first case, we are deleting a node that does not have any child node -- this is the simplest case and we simply need to make current pointer from the parent node to the node being deleted (left or right) NULL. In the second case, we are deleting a node that has one child node -- in this case, we essentially have to splice the node out and make the parent of its node getting deleted, the parent of the child node. In the third case, we have a node that has both children. For such cases, we need to simply replace this node with its successor node.

A successor is the node that has the smallest value which is larger than the value of the given node. If a node has right sub-tree, then we find the smallest element in the right-subtree and that node becomes the successor of the node. If the node has no right sub-tree, then we travel upstream towards the root and find the smallest node that has a value higher than that of the current node. For example, for 101, finding a successor means finding the smallest element in its right sub-tree; this successor node would be 120. For node 97, there is no right sub-tree, so finding its successor means traveling upwards towards the root. 98 is the first node which has a value greater than that of 97; hence, 98 is the successor of node 97. Lastly, since 200 is the largest value, there would not be any successor to the node with value 200!

As we will see in the next function, we need to locate a successor node when deleting an element from the tree. Since we are going to use successor to handle delete when a node has both right an left sub-trees, our implementation retrieves successor node from the right subtree, which would be the minimum node on the right sub-tree. Of course, as an alternative, one can also select the maximum node on the left sub-tree too.

                                              101 
                                             /  \
                                            /    \ 
                                           98    120
                                          /        \
                                         /          \
                                        88          140
                                       / \            \
                                      /   \            \ 
                                     77   97          200 

                             Figure: Finding Successor in a Binary Search Tree

Now that we understand what successor nodes are, let us understand the three deletion cases using the following figure. In the first case, we delete a leaf node; a leaf node is a node that has no left or right child node. This case is simple as we simply remove the node from the tree and make its parent left or right fields (whichever holds) to NULL. In the figure below, deletion of node with value 200 falls in this case. In the second case, we delete a node that has only one child. In this case, we delete the node and make the parent's left or right fields (whichever holds) point to the child node of the current node. In the figure below, deletion of node with value 120 falls in this case. In the third case, we delete a leaf node that has both left and right children. In this case, we find the successor for the node, which would typically lie in the right sub-tree, delete the successor and place its value in the current node. Thus, one of the utilities of finding a successor node is when we try to delete a node that has both left and right nodes. In the figure below, deletion of node with value 88 falls in this case.

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 200          /    \ 
               98    120       ---------->        98     120
              /        \                          /        \
             /          \                        /          \
            88          140                     88          140
           / \            \                    / \            
          /   \            \                  /   \           
         77   97          200                77   97         

                 Case 1: Deleting a leaf node (value 200) 

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 120          /    \ 
               98    120       ---------->         98    140
              /        \                          /        \
             /          \                        /          \
            88          140                     88          200
           / \            \                    / \            
          /   \            \                  /   \           
         77   97          200                77   97         

                 Case 2: Deleting a node with only one child (value 120) 

                  101                                101 
                 /  \                                /  \
                /    \        Deleting 88           /    \ 
               98    120       ---------->         98    120
              /        \                          /        \
             /          \                        /          \
            88          140                     97          140
           / \            \                    /              \              
          /   \            \                  /                \           
         77   97          200                77                200         

                 Case 3: Deleting a node with both children (value 88) 

                             Figure: Deletion in a Binary Search Tree

Traversal

For a binary tree, traversal can be done in three ways: (a) in-order, (b) pre-order, and (c) post-order. In the "in-order" style, we start with the left node first, parent node second, and the right node last. In the "pre-order" style, we start with the parent node first, left node second, and the right node last. In the "post-order" style, we begin with the left node first, the right node second, and the parent node last. Thus, all of these styles are with-respect-to the parent. In the "in-order", the parent sits in the middle, in "pre-order", the parent sits on the left, and in "post-order", the parent sits on the right. Since we go through each node of the tree, the traversal cost O(N) in terms of time-complexity.

Advanced Implementation

Our implementation is added into three files: (a) a header file that publishes the data-structure of a binary search tree node and its APIs, (b) a C file that contains the implementation of the APIs published in the above header file, and (c) another C file that contains the main() routine and uses a binary search tree by invoking its APIs. Before we begin, these are the functions defined in the implementation of binary search tree.

 binary_search_init():        Initializes binary search tree structure 
 binary_search_tree_insert(): Adds a node to the tree
 binary_search_tree_search(): Searches for a node in the tree
 binary_search_tree_delete(): Finds the node that has the passed value and deletes it

BTW, if you are trying to learn the basics of Binary Search Trees and would like to see a simpler implementation, then you can find it here: Simple Implementation of Binary Search Tree.

Let us get started with the header file that we provide below. The provided header file has two data-structures and four APIs. Applications that need to use this implementation of binary search tree would need to include this in the file.

 #ifndef __BINARY_SEARCH_TREE_H___
 #define __BINARY_SEARCH_TREE_H___

 /* Binary Search Tree (bst) node */
 typedef struct bst_node {
     void   *value;
     struct bst_node *left;
     struct bst_node *right;
 } bst_node_t;

 /* Representation of the Binary Search tree */
 typedef struct bst_tree_ {
     bst_node_t *tree_root;
     int (*compare_func)(const void *, const void *);
 } bst_tree_t;

 void binary_search_tree_init(bst_tree_t *bst_tree, 
                 int (*compare_func)(const void *, const void *));

 bst_node_t* binary_search_tree_search(bst_tree_t *bst_tree,
                 bst_node_t *parent, void *value); 

 int binary_search_tree_insert(bst_tree_t *bst_tree, void *value);

 int binary_search_tree_delete(bst_tree_t *bst_tree, void *value);
 #endif /* __BINARY_SEARCH_TREE_H___ */

The "bst_node_t" data-structure holds pointers for the left child and the right child; if needed, applications can add a pointer for parent node. It also has a value member that holds the data for a tree node as "void *" since that allows applications to store any type of data. The next data-structure ("bst_tree_t") houses the root of the binary tree. In addition, it also holds a comparison function that allows users of this implementation to have their own comparison functions. The reason this is needed is because when bst_node holds the value as "void *", the implementation does not know what is being stored in the data-structure. This gives immense flexibility to the application since they can use this to store any type of data. We will see later on in this page, that it is pretty simple to write this comparison function. So, nothing to worry!

Our next file is the C file that provides the actual implementation of the above APIs. Let us provide the file below. We provide its description following that.

 #include <stdio.h>
 #include <stdlib.h>
 #include "binary_search_tree.h"

 void binary_search_tree_init (bst_tree_t *bst_tree, 
             int (*compare_func)(const void *, const void *)) {
     bst_tree->tree_root = NULL;
     bst_tree->compare_func = compare_func;
 }

 bst_node_t* binary_search_tree_search (bst_tree_t *bst_tree, 
                 bst_node_t *parent, void *value) {
     bst_node_t *found_node;
     if (!parent) {
         return NULL;
     }

     /* First, let us check the current node */
     if (bst_tree->compare_func(value, parent->value) == 0) { 
         return parent; 
     }

     /* Start with the left-side nodes of the current parent node */
     if (parent->left != NULL) {
         if (bst_tree->compare_func(value, parent->value) < 0) { 
             found_node = binary_search_tree_search(bst_tree, parent->left, value);
             if (found_node) {
                 return found_node; 
             }
         }
     }

     /* Next, we search the right-side nodes of the current parent node */
     if (parent->right != NULL) {
         if (bst_tree->compare_func(value, parent->value) > 0) { 
             found_node = binary_search_tree_search(bst_tree, parent->right, value);
             if (found_node != NULL) {
                 return found_node; 
             }
         }
     }
     return NULL;
 }

 bst_node_t* binary_search_tree_successor_on_right (bst_node_t *node) {
     while (node->left != NULL) {
         node = node->left;
     }   
     return node;
 }

 /* An internal routine that is not published to outside users */
 void binary_search_tree_insert_recursive (bst_tree_t *bst_tree, 
                 bst_node_t *parent, bst_node_t *new_node) {
     if (!parent || !new_node) {
          return; 
     }

     /* If the key is already present, then return */
     if (bst_tree->compare_func(new_node->value, parent->value) == 0) {
         return;
     }

     /* Start with the left pointer of the current node */
     if (bst_tree->compare_func(new_node->value, parent->value) < 0) {
         if (parent->left == NULL) {
             parent->left = new_node;
             return;
         } else {
             /* We need to insert it as a child of the left node */
             binary_search_tree_insert_recursive(bst_tree, parent->left, new_node);
             return;
         }
     }

     /* Next, the right pointer of the current node */
     if (bst_tree->compare_func(new_node->value, parent->value) > 0) {
         if (parent->right == NULL) {
             parent->right = new_node;
             return;
         } else {
             /* We need to insert it as a child of the left node */
             binary_search_tree_insert_recursive(bst_tree, parent->right, new_node);
             return;
         }
     }
     return;
 }

 int binary_search_tree_insert (bst_tree_t *bst_tree, void *value) {
     bst_node_t *new_node; 
     int inserted;

     new_node = (bst_node_t *)malloc(sizeof(bst_node_t));
     if (new_node == NULL) {
         printf("Malloc Failure \n");
         return -1;
     }
     new_node->value = value;
     new_node->left = NULL;
     new_node->right = NULL;

     if (bst_tree->tree_root == NULL) {
         /* Initialize the tree_root */
         bst_tree->tree_root = new_node;
         return 0;
     } 

     binary_search_tree_insert_recursive(bst_tree, bst_tree->tree_root, new_node); 
     return 0;
 }

 int binary_search_tree_delete (bst_tree_t *bst_tree, void *value) { 
     bst_node_t *node, *successor_node, *child_node, *parent_node;
     void *successor_value;

     node = parent_node = bst_tree->tree_root;
     while (node) {
         if (bst_tree->compare_func(value, node->value) < 0) {
             parent_node = node;
             node = node->left;
         } else if (bst_tree->compare_func(value, node->value) > 0) {
             parent_node = node;
             node = node->right;
         } else {
             break; /* Node is found */
         }
     }

     if (node) { 
         /* If the node has no children */
         if (!node->left && !node->right) {
             printf("Deleting a leaf node \n");
             if (parent_node) {
                 if (parent_node->left == node) { 
                     parent_node->left = NULL;
                 } else {
                     parent_node->right = NULL;
                 }
             } else { 
                 /* This must be the root */
                 bst_tree->tree_root = NULL;
             }
             free(node);
             return 0;
         } 

         /* If the node has only one child node */
         if ((node->left && !node->right)
             || (!node->left && node->right)) {
             printf("Deleting a node with only one child \n");

             if (node->left)  {
                 child_node = node->left;
             } else {
                 child_node = node->right;
             }

             if (parent_node) {
                 if (parent_node->left == node) { 
                     parent_node->left = child_node;
                 } else  {
                     parent_node->right = child_node;
                 }
             } else {
                 /* This must be the root */
                 bst_tree->tree_root = child_node;
             }
             free(node);
             return 0;
         }

         /* If the node has both children */
         if (node->left && node->right) {
             successor_node = binary_search_tree_successor_on_right(node->right);
             printf("Deleting a node with both children\n");
             if (successor_node) {
                 /* 
                  * Since the current node has right child, its successor 
                  * would certainly be found (this is food for thought!).
                  */
                 successor_value = successor_node->value;
                 binary_search_tree_delete(bst_tree, successor_node->value);
                 node->value = successor_value;
             }
             return 0;
         }
     } else {
         printf("Could not find the node to delete\n");
     }
     return -1;
 }

The above implementation starts with binary_search_tree_init() -- this API initializes the passed binary search tree root pointer to NULL and also sets the passed comparison function to the compare_func member of the bst_tree_t data-structure. We would need only one data-structure per binary search tree node. This step is important since during insertion, binary_search_tree_insert() checks if the root is NULL, then it makes the first node as the root of the tree.

The binary_search_tree_insert() inserts a node for the passed value -- its approach is to find the right spot in the tree, where the node can be inserted.

Next, binary_search_tree_delete() find a node that has the passed value and deletes it. Deletion of a node from a binary search tree is little tricky. As explained earlier, we need to handle 3 cases: (a) deletion of node with both left and right child, (b) deletion of node with one child, and (c) deletion of node with no children. For the case, where the node has both children, it relies on binary_search_tree_successor_on_right() to replace the node being deleted.

The next implementation is that of the binary_search_tree_search(), which allows us to search if we have a node in the tree that has the key which is same as the one passed in the "value" parameter. Since we use the binary search tree for searching elements, this is actually the bread and butter API for this data-structure.

In the end, we bring all of the above APIs together by triggering them using a main() function. The function calls binary search tree APIs that are defined in the "binary_search_tree.h" header file and uses them for initialization, insertion, deletion, search, and finding depth. In terms of input, it uses an array to get the list of elements that it needs to add to the tree.

 #include <stdio.h>
 #include <stdlib.h>
 #include "binary_search_tree.h"

 int bst_node_compare (const void *value1, const void *value2) {
     if ( *(int *)value1 < *(int *)value2 ) {
         return -1;
     } else if ( *(int *)value1 > *(int *)value2 ) {
         return 1;
     } else {
         return 0;
     }
 }

 void print_binary_search_tree_for_me (bst_node_t *node, bst_node_t *parent) {
     if (node == NULL) {
         return;
     }   
     print_binary_search_tree_for_me(node->left, node);
     if (parent) {
         printf("%d(p: %d) ", *(int *)node->value, *(int *)parent->value);
     } else {
         printf(" %d(Root) ", *(int *)node->value);
     }   
     print_binary_search_tree_for_me(node->right, node);
 }

 int main (int argc, char *argv[]) {
     bst_tree_t bst_tree;
     bst_node_t *node_found; 
     int input_array[] = {680, 101, 237, 980, 880};
     int len = sizeof(input_array)/sizeof(input_array[0]);
     int ret_val, i;

     binary_search_tree_init(&bst_tree, bst_node_compare);

     for (i = 0; i < len; i++) {
         ret_val = binary_search_tree_insert(&bst_tree, &input_array[i]);
         if (ret_val != 0) {
             printf("Not good! The last insert failed \n");
             return -1;
         }
     }
     print_binary_search_tree_for_me(bst_tree.tree_root, NULL);

     node_found = binary_search_tree_search(&bst_tree, bst_tree.tree_root, &input_array[1]);
     if (node_found) { 
         printf("\nNode found with value %d\n", input_array[1]);
     } else {
         printf("\nNode not found with value %d\n", input_array[1]);
     }

     ret_val = binary_search_tree_delete(&bst_tree, &input_array[1]);
     if (ret_val == -1) {
         printf("Not good! The last deletion failed \n");
         return -1;
     }
     print_binary_search_tree_for_me(bst_tree.tree_root, NULL);

     node_found = binary_search_tree_search(&bst_tree, bst_tree.tree_root, &input_array[1]);
     if (node_found) { 
         printf("\nNode found with value %d\n", input_array[1]);
     } else {
         printf("\nNode not found with value %d\n", input_array[1]);
     }
     return 0;
 }

The above main file starts with malloc of a bst_tree structure, which represents the binary search tree. Next it calls binary_search_tree_init() and passes bst_node_compare() as the comparison function. This function would be used for all the comparison operations for this tree. This function returns -1 when the first value is lower than the second one, +1 when the first value is greater than the second one, and 0 when they are both equal. Pretty simple!!

The above file also contains print_binary_search_tree_for_me() that does an "in-order" walk of the print and prints its elements. Since the implementation of the binary search tree ("binary_search_tree.c") does not know the inner content of the node's data (aka the value member), it cannot print the element appropriately. That is why we have kept the print function with the application (in this case, the "main_binary_search_tree.c" file). The print function prints each node and for each node, it also prints its parent. The root node would have no parent and so the print routine simply appends the "Root" text to the root node.

In the end, we compile the above C files by passing both of them to the gcc command ("main_binary_search_tree.c" and "binary_search_tree.c"). We need to pass both the C files since the API implementation is contained in the "binary_search_tree.c". Failure to include this file would irk gcc and it would complain that the symbols (APIs in the header file) are undefined.

 user@codingbison $ gcc main_binary_search_tree.c binary_search_tree.c -o bst
 user@codingbison $ ./bst
 101(p: 680) 237(p: 101)  680(Root) 880(p: 980) 980(p: 680) 
 Node found with value 101
 Deleting a node with only one child 
 237(p: 680)  680(Root) 880(p: 980) 980(p: 680) 
 Node not found with value 101
 user@codingbison $

In the above output, we attempt to insert elements {680, 101, 237, 980, 880} and then print the binary search tree. From the output, node with value 680 sits at the root of the tree (its parent is NULL). All nodes are (direct/indirect) child nodes of this node.

As good as a binary search tree is, it suffer from one major drawback -- it tends to get unbalanced. If we start with a root that is an "average" value among the data set, then as elements get added, they can continue to sit on the either side of the node. However, if we were to start with the extreme case of elements that are already sorted, then this might pose a problem. As an extreme case, we can end with all elements added along the same line (path), much like a linked list! If we were to do a search for the last element (the largest, let us say) in this unbalanced tree, we would have to search O(N) times instead of the expected O(logN) times. It is for this reason that we need AVL trees, where the tree is periodically balanced (when needed) to ensure that we stay close to O(logN) value instead of O(N). Further, if we were to create a binary search tree using a set of existing data, it might be a good idea to make the data set random instead of keeping it sorted!