CodingBison - Minimum Spanning Tree: Kruskal's Algorithm

Things To Remember

Kruskal's algorithm works even when the input graph is partitioned -- it builds one spanning tree per partition. By contrast, Prim's algorithm builds spanning tree only for the partition that contains the source vertex.

Kruskal's algorithm is a greedy algorithm that builds a minimum spanning tree by adding edges, one at a time. It works by selecting an edge that connects two partitions. At the start, each node is kept into its own set (or partition). At each step, we select an edge that connects two sets. This is different than Prim's algorithm, where we add vertices till we have built the minimum spanning tree. Kruskal's algorithm uses neither bread-first search nor depth-first search.

Compared to Prim's algorithm, Kruskal's algorithm has a few extra steps. The reason why we need these extra steps is that with Kruskal's algorithm, we can have the input graph partitioned. With Prim's algorithm, the input graph needs to be connected else the output minimum spanning tree would contain only the partition that houses the source vertex.

So, if we were to start with the following graph, then Kruskal's algorithm would build two trees. The first would contain the following edges: 680-880, 237-880, 101-237, and 280-101. The second one would contain only one edge: 15-215.

Figure: An undirected partitioned Graph with Weights

While Kruskal's algorithm might have a few additional steps, but one of the simplest things about this algorithm is that it adds all of the graph edges into one (min) heap and keeps extracting the edges from this heap, one by one. The crux of this algorithm is that at every step, it tries to add an edge such that its vertices sit in different sets (partitions). If we do not do this, then we would end up with loops since we would add edges even though their vertices belong to the same partition!

Let us see Kruskal's algorithm in action. For that, we take the above graph, we would need to start by adding all the edges to the heap and keep all of the vertices in their own set. To mimic each set, we place them in a queue and to mimic all all sets, we keep an array of queues.

Figure: Step 1: Add all edges to a Heap and assign each vertex to its own set

With the heap and the set of sets ready to go, we are all set to proceed further. For that at each step, we extract the edge from the heap; we show the selected edge in blue. Next, we look at the set numbers (representing a partition) of the two vertices. At each set number, we have a queue. Once we have two sets, we use a simplified version of union to merge the two sets. Thus, at each step, we would see that the vertices are being moved from one set to another. We keep doing this till we are left with one set for each partition. Since our example has two partitions, the final solution has two sets. Here are the steps.

Figure: Rest Steps: Add edges one-by-one to the Kruskal's Forest

Implementation

Our implementation uses three data-structures: Adjacency List, Heap, and Queue. We use adjacency list to build the graph. We use heap to add graph-edges for kruskal's algorithm to work. Lastly, we use queue to mimic a simple set operation since Kruskal's algorithm works by merging edges belonging to different sets. For these structures, we reuse our implementation provided in our Data-Structure module. Reusing these structures keeps the code simple. We recommend the reader to go through these data-structures first.

The implementation of Adjacency List, Heap, and Queue are provided in terms of both header file and an implementation file. With our Kruskal's algorithm, we simply add the header files ("adjacency_list.h", "heap.h", and "queue.h") so that we can access their APIs. Speaking of APIs, we use the following APIs from adjacency list: adjlist_add_vertex() to add a vertex to the adjacency list, adjlist_add_edge() to add an edge to the adjacency list, adjlist_free() to free the adjacency list, and adjlist_print() to print adjacency list. For heap, we use the following APIs: heap_init() to initialize the heap, heap_add() to add edges to the heap, and heap_extract() to pop the edge with the smallest weight; we pass prim_heap_compare() to the heap_init() API so that we can use prim_heap_compare() to compare two of the Prim's edge nodes. For queue, we use the following APIs: queue_init() to initialize the queue, queue_enqueue() to add an element to the queue, queue_dequeue() to pop an element from the queue, and queue_get_size() to get the size of the queue.

Before we go any further, let us provide the implementation. This implementation (provided below) uses a new data-structure, kruskal_edge_nodes that holds two vertex nodes and its weight. In addition, it also has a handful of global variables.

 #include <stdio.h>
 #include <stdlib.h>
 #include <stdbool.h>
 #include <math.h>
 #include "adjacency_list.h"
 #include "heap.h"
 #include "queue.h"

 /* Represents an edge (has both sides, represented as vnode) and the edge weight. */
 typedef struct kruskal_edge_nodes_ {
      vertex_node *vnodeA; 
      vertex_node *vnodeB; 
      int weight;          /* Weight of the link. */
 } kruskal_edge_nodes;

 queue *q_arr; /* An array of queues, for doing the union */

 int kruskal_heap_compare (const void *val1, const void *val2) {
     kruskal_edge_nodes *node1 = (kruskal_edge_nodes *)val1;
     kruskal_edge_nodes *node2 = (kruskal_edge_nodes *)val2;

     if (node1->weight < node2->weight) {
         return -1;
     } else if (node1->weight > node2->weight) { 
         return 1;
     } else {
         return 0;
     }
 }

 void kruskal_print_all_queues (int len) {
     queue_elem *elem = NULL; 
     queue *sq = NULL;
     int i;

     for (i = 0; i < len; i++) {
         sq = &q_arr[i];
         printf("Printing Queue (size: %d, index: %d): ", 
                     queue_get_size(sq), i);
         if (sq->size == 0) {
             printf("Empty\n");
             continue;
         }
         for (elem = sq->head; elem != NULL; elem = elem->next) { 
             printf("%d  ", *(int *)(elem->data));
         }
         printf("\n");
     }
 }

 bool kruskal_check_if_edge_exists (heap_t *heap, int val1, int val2) {
     kruskal_edge_nodes *node;
     int counter = 0;

     if (heap->heap_current_size == 0)  {
         return false;
     }
     for (counter = 0; counter < heap->heap_current_size; counter++) {
         node = (kruskal_edge_nodes *)heap->heap_array[counter];
         if ((node->vnodeA->val == val1) && (node->vnodeB->val == val2)) {
             return true;
         }
         if ((node->vnodeA->val == val2) && (node->vnodeB->val == val1)) {
             return true;
         }
     }
     return false;
 }

 void kruskal_heap_print (heap_t *heap) {
     kruskal_edge_nodes *node, *parent_node;
     int counter = 0;

     printf("Printing the Heap: \n");
     if ((heap == NULL) || (heap->heap_current_size == 0)) {
         printf("Heap is Empty\n\n");
         return;
     }
     for (counter = 0; counter < heap->heap_current_size; counter++) {
         node = (kruskal_edge_nodes *)heap->heap_array[counter];
         parent_node = (kruskal_edge_nodes *)(heap->heap_array[(int) floor((double)(counter-1)/2)]);
         if (counter == 0) {
             printf("\t[i: %d] %2d-%2d (Weight: %2d Parent: NULL)\n", 
                counter, node->vnodeA->val, node->vnodeB->val, node->weight);
         } else {
             printf("\t[i: %d] %2d-%2d (Weight: %2d Parent: %d-%d)\n", 
                counter, 
                node->vnodeA->val, node->vnodeB->val, node->weight,
                parent_node->vnodeA->val, parent_node->vnodeB->val);
         }
     }
     printf("\n");
 }

 void kruskal_add_all_edges_to_heap (vertex_node *graph_root, heap_t *heap) {
     vertex_node *vnode; 
     edge_node *enode; 
     kruskal_edge_nodes *both_enodes; 

     for (vnode = graph_root; vnode != NULL; vnode= vnode->next_vnode) { 
         for (enode = vnode->list_enode; enode != NULL; enode = enode->next_enode) {
             if (kruskal_check_if_edge_exists(heap, vnode->val, 
                 ((vertex_node *)enode->parent_vnode)->val)) {
                 /* This has already been added */
                 continue;
             }
             both_enodes = (kruskal_edge_nodes *)malloc(sizeof(kruskal_edge_nodes));
             if (!both_enodes) {
                 return;
             }
             both_enodes->vnodeA = vnode;
             both_enodes->vnodeB = enode->parent_vnode;
             both_enodes->weight = enode->weight;
             heap_add(heap, (void *)both_enodes);
         }
     }
 }

 void kruskal_make_set(int set_index, vertex_node * vnode) {
     queue_init(&q_arr[set_index]);
     vnode->misc_index = set_index;
     queue_enqueue(&q_arr[set_index], vnode);
 }

 void kruskal_do_union (kruskal_edge_nodes *both_enodes, int len) {
     vertex_node *vnode_dequeued;
     int q_indx_enqueue, q_indx_dequeue, err;
     int index1, index2, size1, size2;

     index1 = both_enodes->vnodeA->misc_index;
     index2 = both_enodes->vnodeB->misc_index;
     size1 = queue_get_size(&q_arr[index1]);
     size2 = queue_get_size(&q_arr[index2]);

     /* Add elements from the smaller queue to the bigger queue */
     q_indx_enqueue = (size1 >= size2) ? index1 : index2;
     q_indx_dequeue = (size1 >= size2) ? index2 : index1;
     printf("q_indx_enqueue: %d and q_indx_dequeue: %d\n", 
         q_indx_enqueue, q_indx_dequeue);
     while (1) {
         vnode_dequeued = (vertex_node *) queue_dequeue(&q_arr[q_indx_dequeue], &err);
         if (err != QUEUE_ERRNO_SUCCESS) {
             break;
         } 
         vnode_dequeued->misc_index = q_indx_enqueue;
         queue_enqueue(&q_arr[q_indx_enqueue], (void *)vnode_dequeued);
     }
     kruskal_print_all_queues(len);
 }

 void kruskal_run (vertex_node *graph_root, heap_t *heap, int len) {
     kruskal_edge_nodes *both_enodes;
     vertex_node *vnode, *node_higher_set_num, *node_lower_set_num; 
     int err, i;

     for (i = 0, vnode = graph_root; vnode != NULL; vnode = vnode->next_vnode, i++) {
         kruskal_make_set(i, (void *)vnode);
     }
     kruskal_print_all_queues(len);

     /* Add all of these edges (unique) to the heap. */
     kruskal_add_all_edges_to_heap(graph_root, heap); 

     while (heap->heap_current_size) {
         /* Print the Heap */
         kruskal_heap_print(heap); 

         /* Extract the Minimum */
         both_enodes = (kruskal_edge_nodes *)heap_extract(heap, &err); 
         printf("[%s]After extraction: %d-%d (Weight: %d)\n", 
             __FUNCTION__,
             both_enodes->vnodeA->val, both_enodes->vnodeB->val, both_enodes->weight);

         /* If two vertices are not in the same set, then process them. */
         if ((both_enodes->vnodeA) && 
             (both_enodes->vnodeB) && 
             (both_enodes->vnodeA->misc_index != both_enodes->vnodeB->misc_index) ) { 

             printf("[%s]Kruskal's Result: %d-%d (Weight: %d)\n", 
                 __FUNCTION__, both_enodes->vnodeA->val, both_enodes->vnodeB->val, 
                 both_enodes->weight); 
             kruskal_do_union(both_enodes, len);
         } else {
             printf("[%s]Ignore this edge for Kruskal's Result: %d-%d (Weight: %d)\n", 
                 __FUNCTION__, both_enodes->vnodeA->val, both_enodes->vnodeB->val, 
                 both_enodes->weight); 
         }
         free(both_enodes);
     }
 }

 int main () {
     vertex_node *graph_root = NULL; /* Root of the graph */
     heap_t heap; 
     int vertices[] = {101, 237, 680, 280, 880, 15, 215};
     int edges[][3] = {{101, 680, 12}, {101, 237, 10}, {880, 680, 2}, 
                       {101, 280, 8}, {237, 880, 3}, {15, 215, 6}};
     int len_vertices, len_edges, i; 

     /* Initialize the heap. */
     heap_init(&heap, kruskal_heap_compare);

     /* Add the vertices. */
     len_vertices = sizeof(vertices)/sizeof(vertices[0]);
     for (i = 0; i < len_vertices; i++) {
         adjlist_add_vertex(&graph_root, vertices[i], NULL); 
     }

     /* Add the edges. */
     len_edges = sizeof(edges)/sizeof(edges[0]);
     for (i = 0; i < len_edges; i++) {
         adjlist_add_edge(graph_root, edges[i][0], edges[i][1], edges[i][2]);
     }
     adjlist_print(graph_root);

     /* Malloc the queue array. Throw an error, if malloc fails. */
     q_arr = (queue *) malloc(sizeof(queue) * len_vertices);
     if (!q_arr) {
         return -1;
     }

     /* Run the algorithm. */
     kruskal_run(graph_root, &heap, len_vertices);
     printf("Kruskal is all done.\n");

     /* Done with the Adjacency List and queue-array. Free them */
     adjlist_free(&graph_root);
     free(q_arr);
 }

The main function starts by initializing the heap and all the queues sitting in each set. Next, it builds the adjacency list graph by calling its APIs: adjlist_add_vertex() to add vertices from the vertices input array and adjlist_add_edge() to add edges from the edges input array. We have chosen elements in the vertices and edges such that they form the same graph as is shown at the beginning of the page -- a graph with two partitions. Once these three data-structures are ready, it calls kruskal_run() to execute the Kruskal's algorithm.

For adding elements to sets and for doing union of set, we use a simpler approach and use a queue. For each node, we use the misc_index field and store the index in the global q_arr. Thus, if a node belongs to the set 0, then its misc_index would be 0 and that would mean that it is enqueued to the queue sitting in the index 0 of the q_arr. When doing the union, we move the objects from one queue and add it to another. The kruskal_run() algorithm starts by assigning a unique set_index to each vertex and then adding them to their set.

The above method of doing union is not a very efficient approach since at each union, we have to dequeue all of the elements from the shorter queue and enqueue them to the longer queue. We need to do this since we need to re-add all the dequeued elements to the new set and for doing that we update the misc_index field of the vertex to the new queue index. You might be tempted to think that when doing the union, why not take the head of the one queue and make it as the tail of the other queue. This would not work since we would still need to update the misc_index of each vertex from the first queue to point to the new index. Since we want to keep the implementation simple, we chose a simpler implementation of set. There are implementations of set that run faster -- we recommend the reader to give that a try! The reason why we have do the make-set and union is because we would like to avoid having loops in the minimum spanning tree and that can happen because when we add edges to the heap.

Kruskal's algorithm does the simplest thing -- merely add all the edges to the heap (kruskal_add_all_edges_to_heap()). We add kruskal_check_if_edge_exists() to ensure that we do not add the same edge twice. Without this, we would end up adding the same edge twice in the heap since the adjacency list representation maintains the same edge twice.

Once we have the heap, we start extracting the edges one by one. Once extracted, we check if the two vertices of the edge belong to the same set or not. If they do, then we ignore it since they already belong to the same set and adding them to the existing tree would mean a loop. On the other hand, if both the vertices belong to different sets, then we call kruskal_do_union() to merge all elements of the two sets. Once we have extracted all edges from the heap, we are done with the algorithm and the selected edges would form the minimum spanning tree.

Now that we have understood the above code, let us go ahead and compile it. For that we pass all the four files ("kruskals.c", "adjacency_list.c", "heap.c", and "queue.c") to gcc and get the executable (in this case, kruskals). From the output, we can see that we keep extracting edges and then as we do the union, vertices get moved from one queue to another. In the end, we are left with two sets, each identifying the two partitions present in the graph.

 user@codingbison $ gcc kruskals.c adjacency_list.c heap.c queue.c -lm - o kruskals
 user@codingbison $ ./kruskals
 [adjlist_print] Print the Adjacency List
 	Vertex [101][Color: 0]: -- Edge [680 (12)] -- Edge [237 (10)] -- Edge [280 (8)]
 	Vertex [237][Color: 0]: -- Edge [101 (10)] -- Edge [880 (3)]
 	Vertex [680][Color: 0]: -- Edge [101 (12)] -- Edge [880 (2)]
 	Vertex [280][Color: 0]: -- Edge [101 (8)]
 	Vertex [880][Color: 0]: -- Edge [680 (2)] -- Edge [237 (3)]
 	Vertex [ 15][Color: 0]: -- Edge [215 (6)]
 	Vertex [215][Color: 0]: -- Edge [ 15 (6)]
 Printing Queue (size: 1, index: 0): 101  
 Printing Queue (size: 1, index: 1): 237  
 Printing Queue (size: 1, index: 2): 680  
 Printing Queue (size: 1, index: 3): 280  
 Printing Queue (size: 1, index: 4): 880  
 Printing Queue (size: 1, index: 5): 15  
 Printing Queue (size: 1, index: 6): 215  
 Printing the Heap: 
 	[i: 0] 680-880 (Weight:  2 Parent: NULL)
 	[i: 1] 237-880 (Weight:  3 Parent: 680-880)
 	[i: 2] 15-215 (Weight:  6 Parent: 680-880)
 	[i: 3] 101-237 (Weight: 10 Parent: 237-880)
 	[i: 4] 101-680 (Weight: 12 Parent: 237-880)
 	[i: 5] 101-280 (Weight:  8 Parent: 15-215)

 [kruskal_run]After extraction: 680-880 (Weight: 2)
 [kruskal_run]Kruskal's Result: 680-880 (Weight: 2)
 q_indx_enqueue: 2 and q_indx_dequeue: 4
 Printing Queue (size: 1, index: 0): 101  
 Printing Queue (size: 1, index: 1): 237  
 Printing Queue (size: 2, index: 2): 680  880  
 Printing Queue (size: 1, index: 3): 280  
 Printing Queue (size: 0, index: 4): Empty
 Printing Queue (size: 1, index: 5): 15  
 Printing Queue (size: 1, index: 6): 215  
 Printing the Heap: 
 	[i: 0] 237-880 (Weight:  3 Parent: NULL)
 	[i: 1] 101-280 (Weight:  8 Parent: 237-880)
 	[i: 2] 15-215 (Weight:  6 Parent: 237-880)
 	[i: 3] 101-237 (Weight: 10 Parent: 101-280)
 	[i: 4] 101-680 (Weight: 12 Parent: 101-280)

 [kruskal_run]After extraction: 237-880 (Weight: 3)
 [kruskal_run]Kruskal's Result: 237-880 (Weight: 3)
 q_indx_enqueue: 2 and q_indx_dequeue: 1
 Printing Queue (size: 1, index: 0): 101  
 Printing Queue (size: 0, index: 1): Empty
 Printing Queue (size: 3, index: 2): 680  880  237  
 Printing Queue (size: 1, index: 3): 280  
 Printing Queue (size: 0, index: 4): Empty
 Printing Queue (size: 1, index: 5): 15  
 Printing Queue (size: 1, index: 6): 215  
 Printing the Heap: 
 	[i: 0] 15-215 (Weight:  6 Parent: NULL)
 	[i: 1] 101-280 (Weight:  8 Parent: 15-215)
 	[i: 2] 101-680 (Weight: 12 Parent: 15-215)
 	[i: 3] 101-237 (Weight: 10 Parent: 101-280)

 [kruskal_run]After extraction: 15-215 (Weight: 6)
 [kruskal_run]Kruskal's Result: 15-215 (Weight: 6)
 q_indx_enqueue: 5 and q_indx_dequeue: 6
 Printing Queue (size: 1, index: 0): 101  
 Printing Queue (size: 0, index: 1): Empty
 Printing Queue (size: 3, index: 2): 680  880  237  
 Printing Queue (size: 1, index: 3): 280  
 Printing Queue (size: 0, index: 4): Empty
 Printing Queue (size: 2, index: 5): 15  215  
 Printing Queue (size: 0, index: 6): Empty
 Printing the Heap: 
 	[i: 0] 101-280 (Weight:  8 Parent: NULL)
 	[i: 1] 101-237 (Weight: 10 Parent: 101-280)
 	[i: 2] 101-680 (Weight: 12 Parent: 101-280)

 [kruskal_run]After extraction: 101-280 (Weight: 8)
 [kruskal_run]Kruskal's Result: 101-280 (Weight: 8)
 q_indx_enqueue: 0 and q_indx_dequeue: 3
 Printing Queue (size: 2, index: 0): 101  280  
 Printing Queue (size: 0, index: 1): Empty
 Printing Queue (size: 3, index: 2): 680  880  237  
 Printing Queue (size: 0, index: 3): Empty
 Printing Queue (size: 0, index: 4): Empty
 Printing Queue (size: 2, index: 5): 15  215  
 Printing Queue (size: 0, index: 6): Empty
 Printing the Heap: 
 	[i: 0] 101-237 (Weight: 10 Parent: NULL)
 	[i: 1] 101-680 (Weight: 12 Parent: 101-237)

 [kruskal_run]After extraction: 101-237 (Weight: 10)
 [kruskal_run]Kruskal's Result: 101-237 (Weight: 10)
 q_indx_enqueue: 2 and q_indx_dequeue: 0
 Printing Queue (size: 0, index: 0): Empty
 Printing Queue (size: 0, index: 1): Empty
 Printing Queue (size: 5, index: 2): 680  880  237  101  280  
 Printing Queue (size: 0, index: 3): Empty
 Printing Queue (size: 0, index: 4): Empty
 Printing Queue (size: 2, index: 5): 15  215  
 Printing Queue (size: 0, index: 6): Empty
 Printing the Heap: 
 	[i: 0] 101-680 (Weight: 12 Parent: NULL)

 [kruskal_run]After extraction: 101-680 (Weight: 12)
 [kruskal_run]Ignore this edge for Kruskal's Result: 101-680 (Weight: 12)
 Kruskal is all done.