CodingBison

On most platforms, memory is read in words and so accessing data structures becomes faster if they are aligned along a word boundary. The word size refers to the size of a processor register. On a 32 bit processor, the word size is 32 bits and on a 64 bit processor, it is 64 bits.

What this also means is that if we define members of a data structure such that they are not aligned as words, then the compilers might sneak in a few padding bytes so that they becomes word-aligned. Trying to access data structure members that are not aligned means the compiler will have to generate more code and that will likely make your application slower. So, in most cases, what compiler is doing is actually the right thing.

Let us understand this using an example. The following structure contains a char array of 2 bytes, two integer values and a short. Thus, the total storage on a 32-bit machine should be 12 bytes. However, if we were to do a sizeof() for this structure, then the value would actually be 16!

 typedef struct painting_frame {
     char color_code[2];
     int painting_id;
     short width; 
     int height;
 } painting_frame_t;

So, why is the actual size of the structure different from the sum of sizes of its members. Well, this is because the members of this structure are not aligned and hence, the extra bytes are added by the compiler. The definition starts with a field color_code which is a char array of 2 chars. After that, we add an integer painting_id. Since we cannot fit the char array and an integer in the same word (2 + 4 would be 6 byes, which would go beyond a word), the compiler adds a padding of 2 bytes after the color_code and starts painting_id from the next row! After that, we have the field width of type short followed by field height of type integer. And once again, the compiler would add a padding of 2 bytes to the width field.

You could argue that is this so bad? The short answer is that it is. The actual size of all the data structure members is: 2 + 2 + 4 + 4 or 12 bytes. Thus, when the compiler bloats it to 16, the overhead is roughly 33.33%. If the application uses a large amount (an array say) of such data structures, then we would waste 33.33% of all the memory allocated for painting_frame_t. Not exactly efficient.

One way to avoid this overhead is to see if we can shuffle members of the data structure so that they can fit better in a word. If we were to move the width member immediately after color_code, then these two field (color_code and width) can fit in one word! The remaining two members, painting_id and height can have their own words. With that, the size of the data structure would becomes 4 + 4 + 4 or 12 bytes.

 typedef struct painting_frame {
     char color_code[2];
     short width; 
     int painting_id;
     int height;
 } painting_frame_t;

Let us demonstrate this using an example that uses both aligned and non-aligned versions of the structure. This example also uses offsetof() function that returns the offset of a data structure member from the start of the data structure. The offsetof() function is published by the "stddef.h" header file.

 #include <stdio.h>
 #include <stddef.h>

 typedef struct painting_frame_unaligned {
     char color_code[2];
     int painting_id;
     short width; 
     int height;
 } frame_unaligned_t;

 typedef struct painting_frame_aligned {
     char color_code[2];
     short width; 
     int painting_id;
     int height;
 } frame_aligned_t;

 int main() {
     printf("Printing values for frame_unaligned:\n");
     printf("Sizeof: %d\n", sizeof(frame_unaligned_t));
     printf("Offset of color_code: %d\n", offsetof(frame_unaligned_t, color_code));
     printf("Offset of painting_id: %d\n", offsetof(frame_unaligned_t, painting_id));
     printf("Offset of width: %d\n", offsetof(frame_unaligned_t, width));
     printf("Offset of height: %d\n", offsetof(frame_unaligned_t, height));

     printf("\nPrinting values for frame_aligned:\n");
     printf("Sizeof: %d\n", sizeof(frame_aligned_t));
     printf("Offset of color_code: %d\n", offsetof(frame_aligned_t, color_code));
     printf("Offset of width: %d\n", offsetof(frame_aligned_t, width));
     printf("Offset of painting_id: %d\n", offsetof(frame_aligned_t, painting_id));
     printf("Offset of height: %d\n", offsetof(frame_aligned_t, height));

     return 0;
 }

The output confirms our fear! For the first unaligned case, the size is indeed 16 bytes. This is also reflected in the return value of offsetof() for painting_id. The lesson here is that if there is a possibility that we can redefine a structure by shuffling the fields so that they align better, then by all means, we should!

 Printing values for frame_unaligned:
 Sizeof: 16
 Offset of color_code: 0
 Offset of painting_id: 4
 Offset of width: 8
 Offset of height: 12

 Printing values for frame_aligned:
 Sizeof: 12
 Offset of color_code: 0
 Offset of width: 2
 Offset of painting_id: 4
 Offset of height: 8

Yet another way (and rather risky way) to avoid the padding overhead is to force the compiler to not align fields -- we can do this using the packed attribute provided by GCC. Here is how, we can use this attribute on the painting_frame_unaligned data structure.

 #include <stdio.h>
 #include <stddef.h>

 typedef struct __attribute__((__packed__)) painting_frame_unaligned {
     char color_code[2];
     int painting_id; 
     short width; 
     int height; 
 } frame_unaligned_t; 

 int main() {
     printf("Printing values for frame_unaligned:\n");
     printf("Sizeof: %d\n", sizeof(frame_unaligned_t));
     printf("Offset of color_code: %d\n", offsetof(frame_unaligned_t, color_code));
     printf("Offset of painting_id: %d\n", offsetof(frame_unaligned_t, painting_id));
     printf("Offset of width: %d\n", offsetof(frame_unaligned_t, width));
     printf("Offset of height: %d\n", offsetof(frame_unaligned_t, height));

     return 0;
 }

The output shows that the packed attribute does indeed provide a more compact binding. However, this comes at a cost. Since the compilers must generate code to read unaligned members, this will slow down your program. And, if frame_unaligned_t is being used heavily, then this would bring noticeable latency in your application. So, the first thing to avoid memory overhead coming from data structure misalignment is to check if we can manually shuffle the data structure members -- this way, we can avoid the performance penalty paid when using the packed attribute.

 Printing values for frame_unaligned:
 Sizeof: 12
 Offset of color_code: 0
 Offset of painting_id: 2
 Offset of width: 6
 Offset of height: 8

One last note. We should avoid accessing members of a data structure by navigating the memory allocated to a data structure. Clearly, padding bytes has an element of surprise and using raw memory offset may fail since we may not know in advance how many bytes are added. For the case, when we use the packed attribute, it is possible that the value might not be aligned well -- and on some systems, this may generate an alignment fault. To conclude, as long as we stick with using the dot operator (".") or the pointer operator ("->"), we should be covered.





comments powered by Disqus