CodingBison - C String Library: Miscellaneous Functions

C provides a set of library functions for manipulating string and non-string data. These functions use a char pointer ("char *") for string data and a void pointer ("void *") for non-string data. C publishes them via the "string.h" header file. We should include this header file whenever we use these functions.

Please recall that a C string is an array of char types, where each char type requires one byte of storage. Equally important, when C stores a string in an array, it uses '\0' (NUL termination character) as the last character to mark the end of the string.

This section describes some of the miscellaneous functions provided by string library like concatenating strings, tokenising strings, etc.

Overview of Functions

As usual, we begin with the functions prototype for these miscellaneous string-related APIs.

 size_t  strlen(const char *s);
 char    *strcat(char *dest, const char *src);
 char    *strncat(char *dest, const char *src, size_t n);
 char    *strtok(char *dest, const char *delimiter);
 char    *strerror(int errnum);
 void    *memset(void *dest, int c, size_t n);

The first function is strlen() and we have already seen it working in earlier sections! Basically, this function returns the total number of characters (that is, the length) of a string, excluding the last NUL-termination byte.

The next two functions strcat() and strncat(), append destination string (dest) to the source string (src). Where as, strcat() appends entire string src to the end of string dest, strncat() does so only for the first n characters of string src. If the length of the string src is less than n, then it concatenates the entire string.

For both cases, dest string should have enough space to accommodate both the strings and the NUL character. Hence, for strcat(), the dest string should have at least (strlen(dest) + str(src) + 1) bytes of space. For strncat(), it should have at least (strlen(dest) + n + 1) bytes of space; if n is greater than the length of src, then the dest string should have at least (strlen(dest) + strlen(src) + 1) bytes of space.

Both of these functions remove the NUL character from the end of dest string before appending the src string and then, once again add the NUL character at the end. If the size n is greater than the length of the src string, then strncat() copies entire src string and then appends the NUL character at the end. Needless to say, this is done to make the combined character array a new string because C uses the NUL character to mark the end of a string.

Both strcat and strncat, return the dest string after appending. Lastly, the src and dest strings should not overlap, otherwise the behavior is undefined.

The next function, strtok(), splits dest string into tokens that are separated by a string delimiter. We need to call strtok() multiple times to retrieve all the tokens. The function strerror() returns the string of the error name that is associated with the error number, errnum. We typically pass the system error number (errno) to this function.

Lastly, memset() copies character c into the first n byte of dest buffer. This function returns the updated dest string. We can use memset() even on strings, as long as we cast the strings to void pointers.

Examples

Having described these string-based functions, let us now delve into the fun part of their implementation! For that, we provide two examples. The first example uses strcat() and strncat() to concatenate two strings. The second example focuses on the remaining functions.

The first example (provided below) has two parts. It starts by calling strcat() to concatenate a string (STR_TO_APPEND1) to another string (str_dest1) -- str_dest1 has enough storage (100 bytes) to hold both of these strings. Next, it uses malloc to create a string buffer big enough to hold two strings and then uses strncat() to append both strings, one after another. We include the "stdlib.h" header file for the malloc() and free() calls.

 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>

 #define STR_TO_APPEND1 "was painted by Leonardo da Vinci"
 #define STR_TO_APPEND2 "was painted by Van Gogh"

 int main () {
     char str_dest1[100] = "Mona Lisa ";
     char str_dest2[] = "Starry Night ";
     char *str_cat, *str_malloc; 

     /* Append to an existing string buffer */
     str_cat = strcat(str_dest1, STR_TO_APPEND1);
     printf("[strcat]  Returned string: %s\n", str_cat);
     printf("[strcat]  str_cat: %p, str_dest1: %p\n\n", str_cat, str_dest1);

     /* Malloc a string buffer and then append a string to it */
     str_malloc = (char *)malloc(sizeof(char) * 
               (strlen(str_dest2) + strlen(STR_TO_APPEND2) + 1));
     if (!str_malloc) return -1;

     str_cat = strncat(str_malloc, str_dest2, strlen(str_dest2));
     str_cat = strncat(str_malloc, STR_TO_APPEND2, strlen(STR_TO_APPEND2));
     printf("[strncat] Returned string: %s\n", str_cat);
     printf("[strncat] str_cat: %p, str_malloc: %p\n", str_cat, str_malloc);

     /* Free the malloced string */
     free(str_malloc);
     return 0;
 }

At the cost of repeating myself, for concatenation operations, the destination string must have enough space to accommodate both the strings and the NUL character; else, the behavior would be undefined.

We provide below the output. Note that when we print the address of the string returned by both of these functions, then we find that it is the same string as that of the destination string (since both of them point to the same address).

 $ gcc strcat.c -o strcat
 $
 $ ./strcat
 [strcat]  Returned string: Mona Lisa was painted by Leonardo da Vinci 
 [strcat]  str_cat: 0xbfbc9454, str_dest1: 0xbfbc9454

 [strncat] Returned string: Starry Night was painted by Van Gogh 
 [strncat] str_cat: 0x9112008, str_malloc: 0x9112008

Our second and last example shows the usage of memset(), strlen(), strtok(), and strerror(). The program begins by setting characters in str_a to zero using memset. Next, it uses strtok() to split str_b into tokens. For strtok() function, we pass white-space (" ") as the delimiter and thus, strtok() returns each word of the sentence as a token. Also, it is only the first call to strtok(), where we pass the string as input -- subsequent calls take NULL as input and retrieve the earlier string from memory.

In the end, the program passes the system (global) errno variable to strerror() function; errno contains the last error encountered by the system (this system-wide variable is defined in "errno.h" header file).

 #include <stdio.h>
 #include <string.h>
 #include <errno.h>

 int main () {
     char str_a[] = "The Last Supper by Leonardo da Vinci";
     char str_b[] = "The Last Supper";
     char *str_temp;

     printf("[memset] Before setting to zero, str: %s\n", str_a);
     memset((void *)str_a, 0, strlen(str_a));
     printf("[memset]  After setting to zero, str: %s\n", str_a);

     /* Split the string into tokens */
     printf("\n[strtok] Let us tokenize the string: %s\n", str_b);
     str_temp = strtok(str_b, " ");
     while (str_temp != NULL) {
         printf("[strtok] Returned token: %s\n", str_temp);
         str_temp = strtok(NULL, " ");
     }

     printf("\n[strerror] The error string: %s\n", strerror(errno));
     return 0;
 }

We provide the output below. Since, there is no error, the strerror() call returns "Success" string when we passed errno to it!

 $ gcc strmisc.c -o strmisc
 $ 
 $ ./strmisc 
 [memset] Before setting to zero, str: The Last Supper by Leonardo da Vinci 
 [memset]  After setting to zero, str:  

 [strtok] Let us tokenize the string: The Last Supper
 [strtok] Returned token: The
 [strtok] Returned token: Last
 [strtok] Returned token: Supper

 [strerror] The error string: Success