CodingBison

Things To Remember
Kruskal's algorithm works even when the input graph is partitioned -- it builds one spanning tree per partition. By contrast, Prim's algorithm builds spanning tree only for the partition that contains the source vertex.

Kruskal's algorithm is a greedy algorithm that builds a minimum spanning tree by adding edges, one at a time. It works by selecting an edge that connects two partitions. At the start, each node is kept into its own set (or partition). At each step, we select an edge that connects two sets. This is different than Prim's algorithm, where we add vertices till we have built the minimum spanning tree. Kruskal's algorithm uses neither bread-first search nor depth-first search.

Compared to Prim's algorithm, Kruskal's algorithm has a few extra steps. The reason why we need these extra steps is that with Kruskal's algorithm, we can have the input graph partitioned. With Prim's algorithm, the input graph needs to be connected else the output minimum spanning tree would contain only the partition that houses the source vertex.

So, if we were to start with the following graph, then Kruskal's algorithm would build two trees. The first would contain the following edges: 680-880, 237-880, 101-237, and 280-101. The second one would contain only one edge: 15-215.

Figure: An undirected partitioned Graph with Weights

While Kruskal's algorithm might have a few additional steps, but one of the simplest things about this algorithm is that it adds all of the graph edges into one (min) heap and keeps extracting the edges from this heap, one by one. The crux of this algorithm is that at every step, it tries to add an edge such that its vertices sit in different sets (partitions). If we do not do this, then we would end up with loops since we would add edges even though their vertices belong to the same partition!

Let us see Kruskal's algorithm in action. For that, we take the above graph, we would need to start by adding all the edges to the heap and keep all of the vertices in their own set. To mimic each set, we place them in a queue and to mimic all all sets, we keep an array of queues.

Figure: Step 1: Add all edges to a Heap and assign each vertex to its own set

With the heap and the set of sets ready to go, we are all set to proceed further. For that at each step, we extract the edge from the heap; we show the selected edge in blue. Next, we look at the set numbers (representing a partition) of the two vertices. At each set number, we have a queue. Once we have two sets, we use a simplified version of union to merge the two sets. Thus, at each step, we would see that the vertices are being moved from one set to another. We keep doing this till we are left with one set for each partition. Since our example has two partitions, the final solution has two sets. Here are the steps.

Figure: Rest Steps: Add edges one-by-one to the Kruskal's Forest

Implementation

Our implementation uses three data-structures: Adjacency List, Heap, and Queue. We use adjacency list to build the graph. We use heap to add graph-edges for kruskal's algorithm to work. Lastly, we use queue to mimic a simple set operation since Kruskal's algorithm works by merging edges belonging to different sets. For these structures, we reuse our implementation provided in our Data-Structure module. Reusing these structures keeps the code simple. We recommend the reader to go through these data-structures first.

Before we go any further, let us provide the implementation. This implementation (provided below) uses a new data-structure, kruskal_edge_nodes that holds two vertex nodes and its weight. In addition, it also has a handful of global variables.

``` #include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <math.h>
#include "heap.h"
#include "queue.h"

/* Represents an edge (has both sides, represented as vnode) and the edge weight. */
typedef struct kruskal_edge_nodes_ {
vertex_node *vnodeA;
vertex_node *vnodeB;
int weight;          /* Weight of the link. */
} kruskal_edge_nodes;

queue *q_arr; /* An array of queues, for doing the union */

int kruskal_heap_compare (const void *val1, const void *val2) {
kruskal_edge_nodes *node1 = (kruskal_edge_nodes *)val1;
kruskal_edge_nodes *node2 = (kruskal_edge_nodes *)val2;

if (node1->weight < node2->weight) {
return -1;
} else if (node1->weight > node2->weight) {
return 1;
} else {
return 0;
}
}

void kruskal_print_all_queues (int len) {
queue_elem *elem = NULL;
queue *sq = NULL;
int i;

for (i = 0; i < len; i++) {
sq = &q_arr[i];
printf("Printing Queue (size: %d, index: %d): ",
queue_get_size(sq), i);
if (sq->size == 0) {
printf("Empty\n");
continue;
}
for (elem = sq->head; elem != NULL; elem = elem->next) {
printf("%d  ", *(int *)(elem->data));
}
printf("\n");
}
}

bool kruskal_check_if_edge_exists (heap_t *heap, int val1, int val2) {
kruskal_edge_nodes *node;
int counter = 0;

if (heap->heap_current_size == 0)  {
return false;
}
for (counter = 0; counter < heap->heap_current_size; counter++) {
node = (kruskal_edge_nodes *)heap->heap_array[counter];
if ((node->vnodeA->val == val1) && (node->vnodeB->val == val2)) {
return true;
}
if ((node->vnodeA->val == val2) && (node->vnodeB->val == val1)) {
return true;
}
}
return false;
}

void kruskal_heap_print (heap_t *heap) {
kruskal_edge_nodes *node, *parent_node;
int counter = 0;

printf("Printing the Heap: \n");
if ((heap == NULL) || (heap->heap_current_size == 0)) {
printf("Heap is Empty\n\n");
return;
}
for (counter = 0; counter < heap->heap_current_size; counter++) {
node = (kruskal_edge_nodes *)heap->heap_array[counter];
parent_node = (kruskal_edge_nodes *)(heap->heap_array[(int) floor((double)(counter-1)/2)]);
if (counter == 0) {
printf("\t[i: %d] %2d-%2d (Weight: %2d Parent: NULL)\n",
counter, node->vnodeA->val, node->vnodeB->val, node->weight);
} else {
printf("\t[i: %d] %2d-%2d (Weight: %2d Parent: %d-%d)\n",
counter,
node->vnodeA->val, node->vnodeB->val, node->weight,
parent_node->vnodeA->val, parent_node->vnodeB->val);
}
}
printf("\n");
}

void kruskal_add_all_edges_to_heap (vertex_node *graph_root, heap_t *heap) {
vertex_node *vnode;
edge_node *enode;
kruskal_edge_nodes *both_enodes;

for (vnode = graph_root; vnode != NULL; vnode= vnode->next_vnode) {
for (enode = vnode->list_enode; enode != NULL; enode = enode->next_enode) {
if (kruskal_check_if_edge_exists(heap, vnode->val,
((vertex_node *)enode->parent_vnode)->val)) {
continue;
}
both_enodes = (kruskal_edge_nodes *)malloc(sizeof(kruskal_edge_nodes));
if (!both_enodes) {
return;
}
both_enodes->vnodeA = vnode;
both_enodes->vnodeB = enode->parent_vnode;
both_enodes->weight = enode->weight;
}
}
}

void kruskal_make_set(int set_index, vertex_node * vnode) {
queue_init(&q_arr[set_index]);
vnode->misc_index = set_index;
queue_enqueue(&q_arr[set_index], vnode);
}

void kruskal_do_union (kruskal_edge_nodes *both_enodes, int len) {
vertex_node *vnode_dequeued;
int q_indx_enqueue, q_indx_dequeue, err;
int index1, index2, size1, size2;

index1 = both_enodes->vnodeA->misc_index;
index2 = both_enodes->vnodeB->misc_index;
size1 = queue_get_size(&q_arr[index1]);
size2 = queue_get_size(&q_arr[index2]);

/* Add elements from the smaller queue to the bigger queue */
q_indx_enqueue = (size1 >= size2) ? index1 : index2;
q_indx_dequeue = (size1 >= size2) ? index2 : index1;
printf("q_indx_enqueue: %d and q_indx_dequeue: %d\n",
q_indx_enqueue, q_indx_dequeue);
while (1) {
vnode_dequeued = (vertex_node *) queue_dequeue(&q_arr[q_indx_dequeue], &err);
if (err != QUEUE_ERRNO_SUCCESS) {
break;
}
vnode_dequeued->misc_index = q_indx_enqueue;
queue_enqueue(&q_arr[q_indx_enqueue], (void *)vnode_dequeued);
}
kruskal_print_all_queues(len);
}

void kruskal_run (vertex_node *graph_root, heap_t *heap, int len) {
kruskal_edge_nodes *both_enodes;
vertex_node *vnode, *node_higher_set_num, *node_lower_set_num;
int err, i;

for (i = 0, vnode = graph_root; vnode != NULL; vnode = vnode->next_vnode, i++) {
kruskal_make_set(i, (void *)vnode);
}
kruskal_print_all_queues(len);

/* Add all of these edges (unique) to the heap. */

while (heap->heap_current_size) {
/* Print the Heap */
kruskal_heap_print(heap);

/* Extract the Minimum */
both_enodes = (kruskal_edge_nodes *)heap_extract(heap, &err);
printf("[%s]After extraction: %d-%d (Weight: %d)\n",
__FUNCTION__,
both_enodes->vnodeA->val, both_enodes->vnodeB->val, both_enodes->weight);

/* If two vertices are not in the same set, then process them. */
if ((both_enodes->vnodeA) &&
(both_enodes->vnodeB) &&
(both_enodes->vnodeA->misc_index != both_enodes->vnodeB->misc_index) ) {

printf("[%s]Kruskal's Result: %d-%d (Weight: %d)\n",
__FUNCTION__, both_enodes->vnodeA->val, both_enodes->vnodeB->val,
both_enodes->weight);
kruskal_do_union(both_enodes, len);
} else {
printf("[%s]Ignore this edge for Kruskal's Result: %d-%d (Weight: %d)\n",
__FUNCTION__, both_enodes->vnodeA->val, both_enodes->vnodeB->val,
both_enodes->weight);
}
free(both_enodes);
}
}

int main () {
vertex_node *graph_root = NULL; /* Root of the graph */
heap_t heap;
int vertices[] = {101, 237, 680, 280, 880, 15, 215};
int edges[][3] = {{101, 680, 12}, {101, 237, 10}, {880, 680, 2},
{101, 280, 8}, {237, 880, 3}, {15, 215, 6}};
int len_vertices, len_edges, i;

/* Initialize the heap. */
heap_init(&heap, kruskal_heap_compare);

len_vertices = sizeof(vertices)/sizeof(vertices[0]);
for (i = 0; i < len_vertices; i++) {
}

len_edges = sizeof(edges)/sizeof(edges[0]);
for (i = 0; i < len_edges; i++) {
}

/* Malloc the queue array. Throw an error, if malloc fails. */
q_arr = (queue *) malloc(sizeof(queue) * len_vertices);
if (!q_arr) {
return -1;
}

/* Run the algorithm. */
kruskal_run(graph_root, &heap, len_vertices);
printf("Kruskal is all done.\n");

/* Done with the Adjacency List and queue-array. Free them */
free(q_arr);
}
```

The main function starts by initializing the heap and all the queues sitting in each set. Next, it builds the adjacency list graph by calling its APIs: adjlist_add_vertex() to add vertices from the vertices input array and adjlist_add_edge() to add edges from the edges input array. We have chosen elements in the vertices and edges such that they form the same graph as is shown at the beginning of the page -- a graph with two partitions. Once these three data-structures are ready, it calls kruskal_run() to execute the Kruskal's algorithm.

For adding elements to sets and for doing union of set, we use a simpler approach and use a queue. For each node, we use the misc_index field and store the index in the global q_arr. Thus, if a node belongs to the set 0, then its misc_index would be 0 and that would mean that it is enqueued to the queue sitting in the index 0 of the q_arr. When doing the union, we move the objects from one queue and add it to another. The kruskal_run() algorithm starts by assigning a unique set_index to each vertex and then adding them to their set.

The above method of doing union is not a very efficient approach since at each union, we have to dequeue all of the elements from the shorter queue and enqueue them to the longer queue. We need to do this since we need to re-add all the dequeued elements to the new set and for doing that we update the misc_index field of the vertex to the new queue index. You might be tempted to think that when doing the union, why not take the head of the one queue and make it as the tail of the other queue. This would not work since we would still need to update the misc_index of each vertex from the first queue to point to the new index. Since we want to keep the implementation simple, we chose a simpler implementation of set. There are implementations of set that run faster -- we recommend the reader to give that a try! The reason why we have do the make-set and union is because we would like to avoid having loops in the minimum spanning tree and that can happen because when we add edges to the heap.

Kruskal's algorithm does the simplest thing -- merely add all the edges to the heap (kruskal_add_all_edges_to_heap()). We add kruskal_check_if_edge_exists() to ensure that we do not add the same edge twice. Without this, we would end up adding the same edge twice in the heap since the adjacency list representation maintains the same edge twice.

Once we have the heap, we start extracting the edges one by one. Once extracted, we check if the two vertices of the edge belong to the same set or not. If they do, then we ignore it since they already belong to the same set and adding them to the existing tree would mean a loop. On the other hand, if both the vertices belong to different sets, then we call kruskal_do_union() to merge all elements of the two sets. Once we have extracted all edges from the heap, we are done with the algorithm and the selected edges would form the minimum spanning tree.

Now that we have understood the above code, let us go ahead and compile it. For that we pass all the four files ("kruskals.c", "adjacency_list.c", "heap.c", and "queue.c") to gcc and get the executable (in this case, kruskals). From the output, we can see that we keep extracting edges and then as we do the union, vertices get moved from one queue to another. In the end, we are left with two sets, each identifying the two partitions present in the graph.

``` user@codingbison \$ gcc kruskals.c adjacency_list.c heap.c queue.c -lm - o kruskals
user@codingbison \$ ./kruskals
Vertex [101][Color: 0]: -- Edge [680 (12)] -- Edge [237 (10)] -- Edge [280 (8)]
Vertex [237][Color: 0]: -- Edge [101 (10)] -- Edge [880 (3)]
Vertex [680][Color: 0]: -- Edge [101 (12)] -- Edge [880 (2)]
Vertex [280][Color: 0]: -- Edge [101 (8)]
Vertex [880][Color: 0]: -- Edge [680 (2)] -- Edge [237 (3)]
Vertex [ 15][Color: 0]: -- Edge [215 (6)]
Vertex [215][Color: 0]: -- Edge [ 15 (6)]
Printing Queue (size: 1, index: 0): 101
Printing Queue (size: 1, index: 1): 237
Printing Queue (size: 1, index: 2): 680
Printing Queue (size: 1, index: 3): 280
Printing Queue (size: 1, index: 4): 880
Printing Queue (size: 1, index: 5): 15
Printing Queue (size: 1, index: 6): 215
Printing the Heap:
[i: 0] 680-880 (Weight:  2 Parent: NULL)
[i: 1] 237-880 (Weight:  3 Parent: 680-880)
[i: 2] 15-215 (Weight:  6 Parent: 680-880)
[i: 3] 101-237 (Weight: 10 Parent: 237-880)
[i: 4] 101-680 (Weight: 12 Parent: 237-880)
[i: 5] 101-280 (Weight:  8 Parent: 15-215)

[kruskal_run]After extraction: 680-880 (Weight: 2)
[kruskal_run]Kruskal's Result: 680-880 (Weight: 2)
q_indx_enqueue: 2 and q_indx_dequeue: 4
Printing Queue (size: 1, index: 0): 101
Printing Queue (size: 1, index: 1): 237
Printing Queue (size: 2, index: 2): 680  880
Printing Queue (size: 1, index: 3): 280
Printing Queue (size: 0, index: 4): Empty
Printing Queue (size: 1, index: 5): 15
Printing Queue (size: 1, index: 6): 215
Printing the Heap:
[i: 0] 237-880 (Weight:  3 Parent: NULL)
[i: 1] 101-280 (Weight:  8 Parent: 237-880)
[i: 2] 15-215 (Weight:  6 Parent: 237-880)
[i: 3] 101-237 (Weight: 10 Parent: 101-280)
[i: 4] 101-680 (Weight: 12 Parent: 101-280)

[kruskal_run]After extraction: 237-880 (Weight: 3)
[kruskal_run]Kruskal's Result: 237-880 (Weight: 3)
q_indx_enqueue: 2 and q_indx_dequeue: 1
Printing Queue (size: 1, index: 0): 101
Printing Queue (size: 0, index: 1): Empty
Printing Queue (size: 3, index: 2): 680  880  237
Printing Queue (size: 1, index: 3): 280
Printing Queue (size: 0, index: 4): Empty
Printing Queue (size: 1, index: 5): 15
Printing Queue (size: 1, index: 6): 215
Printing the Heap:
[i: 0] 15-215 (Weight:  6 Parent: NULL)
[i: 1] 101-280 (Weight:  8 Parent: 15-215)
[i: 2] 101-680 (Weight: 12 Parent: 15-215)
[i: 3] 101-237 (Weight: 10 Parent: 101-280)

[kruskal_run]After extraction: 15-215 (Weight: 6)
[kruskal_run]Kruskal's Result: 15-215 (Weight: 6)
q_indx_enqueue: 5 and q_indx_dequeue: 6
Printing Queue (size: 1, index: 0): 101
Printing Queue (size: 0, index: 1): Empty
Printing Queue (size: 3, index: 2): 680  880  237
Printing Queue (size: 1, index: 3): 280
Printing Queue (size: 0, index: 4): Empty
Printing Queue (size: 2, index: 5): 15  215
Printing Queue (size: 0, index: 6): Empty
Printing the Heap:
[i: 0] 101-280 (Weight:  8 Parent: NULL)
[i: 1] 101-237 (Weight: 10 Parent: 101-280)
[i: 2] 101-680 (Weight: 12 Parent: 101-280)

[kruskal_run]After extraction: 101-280 (Weight: 8)
[kruskal_run]Kruskal's Result: 101-280 (Weight: 8)
q_indx_enqueue: 0 and q_indx_dequeue: 3
Printing Queue (size: 2, index: 0): 101  280
Printing Queue (size: 0, index: 1): Empty
Printing Queue (size: 3, index: 2): 680  880  237
Printing Queue (size: 0, index: 3): Empty
Printing Queue (size: 0, index: 4): Empty
Printing Queue (size: 2, index: 5): 15  215
Printing Queue (size: 0, index: 6): Empty
Printing the Heap:
[i: 0] 101-237 (Weight: 10 Parent: NULL)
[i: 1] 101-680 (Weight: 12 Parent: 101-237)

[kruskal_run]After extraction: 101-237 (Weight: 10)
[kruskal_run]Kruskal's Result: 101-237 (Weight: 10)
q_indx_enqueue: 2 and q_indx_dequeue: 0
Printing Queue (size: 0, index: 0): Empty
Printing Queue (size: 0, index: 1): Empty
Printing Queue (size: 5, index: 2): 680  880  237  101  280
Printing Queue (size: 0, index: 3): Empty
Printing Queue (size: 0, index: 4): Empty
Printing Queue (size: 2, index: 5): 15  215
Printing Queue (size: 0, index: 6): Empty
Printing the Heap:
[i: 0] 101-680 (Weight: 12 Parent: NULL)

[kruskal_run]After extraction: 101-680 (Weight: 12)
[kruskal_run]Ignore this edge for Kruskal's Result: 101-680 (Weight: 12)
Kruskal is all done.
```