CodingBison - Handling Bits: Introduction

Computers store data in binary format. Thus, if we have a decimal number, then we can represent it in a binary format or in a hex format. A char with value of 1 can be represented as "00000001", where all the bits are zero except the last one. And an integer with value 1 can be represented in binary format as 31 zero bits followed by a single 1 bit. In terms of hex format, an integer with value 1 in a hex format would be "0x00000001" on a 32- bit machine. In this representation, each of the number excluding the initial "0x" represent 4 bits. Thus, we have 8 numbers that represent a total storage of 32 bits.

Manipulate bits allows us a natural way to store information in a compact manner. The constraint here y is that since bit can be either zero or one, the stored data for each bit has to be boolean in nature. Thus, if we have a large number of parameters and if all we care about is whether they are true or false, then using bits to store them would be more efficient.

In terms of decimal value, each of the bit values represent a power of 2. Thus, the last number represents 2 raised to the power of 0 (2^0) or 1, the second to last number represents 2 raised to the power of 1 (2^1) or 1, the third number from the last represents 2 raised to the power of 2 (2^2) or 4, and so on. We assign these values to a bit only if they store 1, else they have no values since they are set to 0. The combined decimal value of all bits is essentially the sum of individual values of these bits. Thus, if we have a binary number with 8 bits as "00000011", then its total sum is sum of values of the bit that store 1. Thus, it would be the sum of the values of the last two bits: (2^1 + 2^0) or (2 + 1) or 3.

Before we go any further, let us look at binary representation and hex representation of a few integer values; we provide these values in the following table. Note that this is for a 32-bit machine, where an integer requires 4 bytes. On a 64-bit system, this representation would require 8 bytes, but the underlying logic would be the same. To elaborate further, let us pick an example. The decimal value 1024 has 11th bit set from the right -- its binary representation reflects the same. For hex representation, we would need to know the decimal value of "0100", which is the 3rd 4-bit set from the right. Since "0100" equals 4, we place the value of 4 in the third 4-bit set from the right.

Table: Binary Representation of Decimal Values
Decimal	Binary Representation	Hex Representation
0	00000000 00000000 00000000 00000000	0x00000000
1	00000000 00000000 00000000 00000001	0x00000001
2	00000000 00000000 00000000 00000010	0x00000002
3	00000000 00000000 00000000 00000011	0x00000003
4	00000000 00000000 00000000 00000100	0x00000004
8	00000000 00000000 00000000 00001000	0x00000008
32	00000000 00000000 00000000 00100000	0x00000020
1024	00000000 00000000 00000100 00000000	0x00000400

One more thing. The above table provides binary representation such that the most significant byte is the first byte -- such systems are also known as big-endian systems. On some systems (x86-processors, for example), it is the other way around; the most significant byte is actually the last byte. Such systems are known as little-endian systems. On a little-endian system, an integer with value of 1 would actually be stored as 0x01000000, instead of 0x00000001. This is because "01" is the least significant byte and hence it gets stored first.

The good news is that common bit-handling operations are independent of machine endianness. So, we do not have to worry about it, well, mostly! The only place where it starts to matter is if we are looking into the memory and there, we would see values placed as per the system endianness.

Let us write a simple example that prints decimal values in binary representation. The example (provided below) prints bit values (0 or 1) for various values of an integer. It does so using the print_binary() function. This function uses a char array, binary_string, where each element of the array represents the corresponding bit value in the decimal numbers.

This function checks the value of each bit and if it is set, it adds the char '1' to a char array, else it adds the char '0'. This function checks each bit by comparing the number against a mask (BIT_MASK) that has all of its bits zero except the left-most bit. The function does a continuous left-shift of the number and each iteration, the left-most bit represents one of the bits of the number. Doing an AND (using &) with the mask checks if the left-most bit in the number is set or not. We cover both masks and left-shifting in the next section.

 #include <stdio.h>

 #define SIZE_OF_INT 32
 #define TOTAL_LEN (SIZE_OF_INT + 3 + 1) /* 3 for white-spaces and 1 for NUL */
 #define BIT_MASK  (1 << 31)

 char binary_string[TOTAL_LEN];

 char *print_binary (int num) {
     int i = 0, bits_since_last = 0;

     while (i <= TOTAL_LEN) {
         binary_string[i++] = '0';
     }
     i = 0;
     while (i < TOTAL_LEN) {
         if (num & BIT_MASK) {
             binary_string[i] = '1';
         }
         bits_since_last++; 
         num = num << 1;
         i++;
         if (bits_since_last == 8) {
             bits_since_last = 0;
             binary_string[i] = ' ';
             i++;
         }
     }
     binary_string[TOTAL_LEN-1] = '\0';
     return binary_string;
 }

 int main() {
     int var_num;

     var_num = 0;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 1;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 16;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 1024;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 255;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));
     return 0;
 }

We would like to briefly mention that the print_binary() prints values into the binary_string starting from left-most bit. We do this because we run the example on a little-endian (x86) system! Once again, for most of the bit-handling tasks -- besides printing bits and looking into bit values in the memory -- we should not bother ourselves with endianness.

When we run the example, we find that for the number 0, we do not have any bits set. No surprises there. For numbers 1, 16, and 1024, we see that only 1 bit is set -- this is because each of these numbers are powers of 2. So, any number which can be expressed as 2^n will essentially have only one bit set in the binary representation. On the other hand, for 255 we see that a total of 8 bits are set. The number 255 belongs to a yet another special class of numbers that can be expressed as (2^n - 1). Such number are 1 less than a power of 2. Some of these numbers are 7, 15, 31, 63, 127, and so on.

 Number:    0 (Bits: 00000000 00000000 00000000 00000000)
 Number:    1 (Bits: 00000000 00000000 00000000 00000001)
 Number:   16 (Bits: 00000000 00000000 00000000 00010000)
 Number: 1024 (Bits: 00000000 00000000 00000100 00000000)
 Number:  255 (Bits: 00000000 00000000 00000000 11111111)

In the next few sections, we present a handful of examples that allow us to do some of the common bit-related tasks like shifting bits and setting/clearing bits. In the end, we also present an example that counts the number of bits set in a storage.

Shifting Bits

One of the common bit operations is shifting bits. Depending upon the direction of the shift, it can be a left-shift or a right-shift. We represent left-shift with "<<" and right-shift with ">>". Shifting a number n times on the left is equivalent to multiplying that number by 2 raised to the power of n (2^n). Shifting a number n times on the right is equivalent to dividing that number by 2^n.

Thus, if we were to left-shift the number 1 by 4, as in "1 << 4", then this would be equivalent to multiplying 1 by 2^4 or 16. On the other hand, if we were to right-shift the number 16 by 4, as in "16 >> 4", then that would be equivalent to dividing 16 by 2^4 or 1.

Besides multiplication and division, yet another use of bit-shifting is in creating bit masks. Bitmasks help us read and write the value of bit stored at any specific location.

Let us now go through a simple example and see how left-shift and right-shift work. The example provided below uses the same print_binary() function provided in the earlier example -- to keep things simple, we put print_binary() implementation in a separate file, print-binary.c. Next, we declare an extern of this function the main file, shift-binary.c. We can pass both the files to gcc and gcc would understand that shift-binary.c file is accessing the print_binary() function defined in the print-binary.c file! If you want to run this example without doing all this, you can always copy the print_binary() function definition in shift-binary.c file here and you should be all set!

 [user@codingbison]$ cat print-binary.c
 #define SIZE_OF_INT 32
 #define TOTAL_LEN (SIZE_OF_INT + 3 + 1) /* 3 for white-spaces and 1 for NUL */
 #define BIT_MASK  (1 << 31)

 char binary_string[TOTAL_LEN];

 char *print_binary (int num) {
     int i = 0, bits_since_last = 0;

     while (i <= TOTAL_LEN) {
         binary_string[i++] = '0';
     }
     i = 0;
     while (i < TOTAL_LEN) {
         if (num & BIT_MASK) {
             binary_string[i] = '1';
         }
         bits_since_last++;
         num = num << 1;
         i++;
         if (bits_since_last == 8) {
             bits_since_last = 0;
             binary_string[i] = ' ';
             i++;
         }
     }
     binary_string[TOTAL_LEN-1] = '\0';
     return binary_string;
 }
 [user@codingbison]$ 
 [user@codingbison]$ cat shift-binary.c 
 #include <stdio.h>

 extern char *print_binary (int num);

 int main() {
     int var_num = 1;

     var_num = 1 << 4;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 1 << 10;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 15 << 8;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 1024 >> 5;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));

     var_num = 1024 >> 10;
     printf("Number: %4d (Bits: %s)\n", var_num, print_binary(var_num));
     return 0;
 }
 [user@codingbison]$

To run this example, we simple compile both the files together. The example shows that when we take 1 and left-shift it by 4, we see that the bit in the first place (from right) moves left by 5 bits; the final decimal value becomes 16. On the other hand, when we right-shift 1024 by 5, then we see that the bit in the 10th place moves right by 5 bits; the final decimal value equals dividing 1024 by 2^5 or 1024 divided by 32 or 32.

 [user@codingbison]$ gcc shift-binary.c print-binary.c -o bit-shift
 [user@codingbison]$ ./bit-shift 
 Number:   16 (Bits: 00000000 00000000 00000000 00010000)
 Number: 1024 (Bits: 00000000 00000000 00000100 00000000)
 Number: 3840 (Bits: 00000000 00000000 00001111 00000000)
 Number:   32 (Bits: 00000000 00000000 00000000 00100000)
 Number:    1 (Bits: 00000000 00000000 00000000 00000001)
 [user@codingbison]$

Setting/Clearing Bits

Yet another common operation for bits is setting, clearing, and checking a given bit. This ability allows us to read, write, and check information on a bit-level. Pretty cool, if you ask me! For these operation, we first need to have a bitmask and then use an appropriate (AND or OR) operation with the bitmask. For setting a bit, we use the OR ("|") operation with the bitmask. For clearing a bit, we use inverse of AND ("~&") with the bitmask. For checking if a bit is set, we use AND ("&") with the bitmask.

Let us understand the idea of setting, clearing, and checking a bit using a storage of one byte. Let us say, we have a number that has a size of one byte and has a value of 8. Its binary representation would be "00010000". Now, the requirement is to set the 3th bit from the right. We can easily do so, if we have a mask such that the bit at the 3th location from the last is 1 and everything else is 0 or "00000100".

If we take the number and OR it with the mask ("00010000") then we would get "00010000 OR 00000100" or "00010100". Since the mask has all bits 0 except the 3d bit, doing an OR with the number results in two things. First, because all other bits, except the 3rd bit, are set to 0, doing an OR has no effect on all those bits. Thus, if the bits present in those locations are 1, then doing an OR with 0 still leaves them as 1. On the other hand, if they are 0, then well, doing an OR with 0 still leaves them as 0. For the 3dd bit, it is a different story -- doing an OR always gives us 1. This is because, if the 3rd bit in the number is 1, then doing an OR of 1 and 1 would give us 1. On the other hand, if the 3rd bit in the number is 0, then doing an OR of 1 and 0 would still give us 1.

Clearing the bit is taking an inverse of the mask and doing an AND operation with the number. Thus, the inverse of our mask "00000100" is "11111011" and when we do an AND with the number, then we get "00010000 AND 11111011" or "00010000". Since the inverse mask has all bits except the 3rd 1 and the 3rd bit is 0, when we do an AND with the number, once again two things happen. For all other bits, doing an AND with 1 yields the same bit since doing "1 AND 0" leads to 0 as well as "1 AND 1" leads to 1. Thus, the other bit remains unchanged. However, when we do an AND with a zero, then the resulting bit is always 0 since both "0 AND 1" and "0 AND 0" lead to 0. Thus, with this approach, the 3rd bit is set to 0.

Checking if a bit is set is doing an AND operation with the mask. Thus, in our example, if we have to check if the 3rd bit is set or not, then we can do by doing an AND with the mask: "00010000 AND 00000100" or "00000000". The resulting value is zero and hence that means the 3rd bit is not set for the number 8. When we do an AND with the mask, then all the bits except the 3rd one are zero and doing an AND with 0 is always 0. So, all the other bits lead to 0. For the 3rd bit, doing an AND with 1 leads to 1 only if the value of the 3rd bit is set to 1. Else, it leads to 0.

With that, let us see our next example that focuses on setting, clearing, and, checking of bits. The example uses two bitmasks, BIT_10 and BIT_20 that focus on the 10th bit and 20th bit from the last, respectively. We start with number 0 and set both of these bits using these masks one by one. Once set, we clear these bits. The example also illustrates how we can check if a certain bit is set or not.

The example provided below uses the print_binary() function provided in the first example. Like the earlier example, we compile this file with print-binary.c file that contains definition of print_binary() function.

 #include <stdio.h>

 #define  BIT_10      (1 << 9)
 #define  BIT_20      (1 << 19)

 extern char *print_binary(int num);

 void check_bit (int num, int mask, char *desc) {
     if (num & mask) {
          printf("%s is set\n", desc);
     } else {
          printf("%s is NOT set\n", desc);
     }
 }

 int main() {
     int var_num = 0;

     printf("BIT_10: %6d (Bits: %s)\n",   BIT_10, print_binary(BIT_10));
     printf("BIT_20: %6d (Bits: %s)\n\n", BIT_20, print_binary(BIT_20));

     var_num = var_num | BIT_10;
     printf("Number: %6d (Bits: %s)\n", var_num, print_binary(var_num));
     check_bit(var_num, BIT_10, "BIT_10");

     var_num = var_num | BIT_20;
     printf("Number: %6d (Bits: %s)\n", var_num, print_binary(var_num));
     check_bit(var_num, BIT_20, "BIT_20");

     var_num = var_num & ~BIT_20;
     printf("Number: %6d (Bits: %s)\n", var_num, print_binary(var_num));
     check_bit(var_num, BIT_20, "BIT_20");

     var_num = var_num & ~BIT_10;
     printf("Number: %6d (Bits: %s)\n", var_num, print_binary(var_num));
     check_bit(var_num, BIT_10, "BIT_10");
     return 0;
 }

Here is the output:

 [user@codingbison]$ gcc example3-set-clear-bits.c print-binary.c -o set-clear-check-bits
 [user@codingbison]$ ./set-clear-check-bits 
 BIT_10:    512 (Bits: 00000000 00000000 00000010 00000000)
 BIT_20: 524288 (Bits: 00000000 00001000 00000000 00000000)

 Number:    512 (Bits: 00000000 00000000 00000010 00000000)
 BIT_10 is set
 Number: 524800 (Bits: 00000000 00001000 00000010 00000000)
 BIT_20 is set
 Number:    512 (Bits: 00000000 00000000 00000010 00000000)
 BIT_20 is NOT set
 Number:      0 (Bits: 00000000 00000000 00000000 00000000)
 BIT_10 is NOT set
 [user@codingbison]$

Setting bits, clearing bits, and checking bits are popularly used for dynamic systems consisting of various states. For such systems, we can use flags to represent these states as boolean attributes. If we were to choose events in the life of a painting, then we can model it using states or flags. As one example, let us say we want to store if a painting in an art gallery is sold or not. We can do so using a single flag -- if the flag is sold, then we set a particular bit. Thus, by checking that particular bit, we can know if the painting is sold or not.

In fact, let us extend this example to understand usage of flags a little bit more. If we were to assume, then we can say, let us say 8 life-events in the lifespan of a painting. We use a single flag to reflect each of these events. We have purposely chosen 8 events since we can represent them using 8 bits or one byte! Moving further, we provide below eight flags that correspond to each of the 8 states. In terms of bits, all of these flags have only a single bit set and the bit keeps shifting from right to left.

 #define PAINTING_WORK_IN_PROGRESS       0x01    /* Bit values: 00000001 */
 #define PAINTING_OIL_PAINT_DRYING       0x02    /* Bit values: 00000010 */
 #define PAINTING_DONE_BEING_LOADED      0x03    /* Bit values: 00000100 */
 #define PAINTING_IN_TRANSIT             0x04    /* Bit values: 00001000 */
 #define PAINTING_IN_STORE_WAREHOUSE     0x10    /* Bit values: 00010000 */
 #define PAINTING_IN_STORE_DISPLAY       0x20    /* Bit values: 00100000 */
 #define PAINTING_IN_GALLERY_DISPLAY     0x40    /* Bit values: 01000000 */
 #define PAINTING_SOLD                   0x80    /* Bit values: 10000000 */

Now, I know that the above states are a little artificial (you never know!). But, the example shows that we can use a single byte to store as many as 8 states -- thus, each bit of a byte ends up representing one state. If the first bit (from left) is set, then we know that the painting is in a sold state. Same holds for other states.

Since our model is dynamic, the painting switches from one state to another. When a painting moves to a new state, we need to do two things: clear the flag corresponding to the old state and set the flag corresponding to the new state. Here is a code snippet of how we would do that when a painting gets sold from a gallery display.

 painting_state = painting_state & ~PAINTING_IN_GALLERY_DISPLAY; 
 painting_state = painting_state | PAINTING_SOLD;

Counting Number of Bits Set

Counting the number of bits set in a storage (let us say an integer) is another common task. Hence, we provide a small example that achieves the task using two different approaches.

Both examples use the same print_binary() function provided in the very first example on this page, to print bits of an integer. In the interests of space, we put print_binary() implementation in a separate file, print-binary.c. Next, we declare this function as an extern in the main file, counting-bits.c. We can pass both the files to gcc and gcc would understand that counting-binary.c file is accessing the print_binary() function defined in the print-binary.c file!

The first example takes the obvious route. It keeps shifting the passed number towards the right by one. And with each shift, it does an AND with 1. Thus, if the last bit is 1, then doing an AND with 1 would lead to 1. In that case, it increases the counter. When it is done shifting the number by SIZE_OF_INT times, it returns the counter. Thus, if we have a number, let us say "00001111", then when as we shift this number, the bit at each position becomes the bit at the first position and doing an AND with 1 (which is "00000001") checks if the last bit is set or not!

 #include <stdio.h>

 #define SIZE_OF_INT   32

 extern char *print_binary(int num);

 int number_of_bits (int num) {
     int i, total_count = 0;
     for (i = 0; i < SIZE_OF_INT; i++) {
         if (num & 1) {
             total_count++;
         }
         num = num >> 1;
     }
     return total_count;
 }

 int main() {
     int var_num; 

     var_num = 7;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 8;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 255;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 256;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = (1 << 20) - 1;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n",
             var_num, print_binary(var_num), number_of_bits(var_num));
     return 0;
 }

 [user@codingbison]$ gcc counting-bits.c print-binary.c -o counting-bits
 [user@codingbison]$ ./counting-bits
 Number:       7 (Bits: 00000000 00000000 00000000 00000111), Bits set:  3
 Number:       8 (Bits: 00000000 00000000 00000000 00001000), Bits set:  1
 Number:     255 (Bits: 00000000 00000000 00000000 11111111), Bits set:  8
 Number:     256 (Bits: 00000000 00000000 00000001 00000000), Bits set:  1
 Number: 1048575 (Bits: 00000000 00001111 11111111 11111111), Bits set: 20
 [user@codingbison]$

The above approach is okay as long as we have to count the number of bits once in a blue moon. But, if we have to do the counting for several million numbers, then this approach will easily become rather expensive. The reason is that counting all the bits for a number with n bits is O(n). One practical way to reduce the time-complexity is to use a lookup table. Our next example does exactly the same.

For that, we build an array that holds values for a set of 4 bits. Thus, given a number between 0 and 15 (4 bits can have a maximum value of 15), we store the number of bits present in each number in this array: the number 1 has 1 bits, number 2 has 1 bits, number 3 has 2 bits, and so on. With that, the bit-counting becomes a lookup operation as long as we check each 4-bits, one at a time.

For each storage, we read the value of 4 bits and then do a lookup to count the number of bits. The example uses a 4 bit mask with all bits set to 1 to extract the value of each 4-bit. Since we are counting the number of bits set in 4 bits in one step, this approach would be roughly 4 times faster than the earlier one. If we wanted to optimize this further, then we can count the number of bits set in 8 bits in one shot. Of course, for a lookup with 8 bits, we would need a bigger lookup table.

Here is the modified example. Since the output is same as above, we do not provide it.

 #include <stdio.h>

 #define SIZE_OF_INT   32
 #define SIZE_OF_BLOCK 4
 #define MASK_4BITS  ((1 << SIZE_OF_BLOCK) - 1)

 extern char *print_binary (int num);

 short arr_bitcount[] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};

 int number_of_bits (int num) {
     int i, total_count = 0;
     int total_iterations = SIZE_OF_INT/SIZE_OF_BLOCK;
     unsigned char x;

     for (i = 0; i < total_iterations; i++) {
         x = num & MASK_4BITS;
         total_count += arr_bitcount[x];
         num = num >> SIZE_OF_BLOCK;
     }
     return total_count;
 }

 int main() {
     int var_num; 

     var_num = 7;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 8;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 255;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = 256;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));

     var_num = (1 << 20) - 1;
     printf("Number: %7d (Bits: %s), Bits set: %2d\n", 
             var_num, print_binary(var_num), number_of_bits(var_num));
     return 0;
 }