CodingBison

C String library provides various functions for handling string and non-string data. These functions are published using the "string.h" header file. Where as, the string-related functions deal with char pointers (char *), the memory-related functions deal with void pointers (void *).

Let us recall that C represents strings as an array of char types -- each char type requires one byte of storage. When C stores a string in an array, it uses '\0' (NUL termination character) as the last character to mark the end of the string.

In this section, we take a look at functions that help us search for a specific value (character, string, or data) within another value (string or data).

Overview of Functions

As always, we begin with a short overview of search-related functions. The first three functions search string data and the remaining two search non-string data. However, we can use the memory-related functions even on strings by casting strings to void pointers.

 char *strchr(const char* dest, int c);
 char *strrchr(const char* dest, int c);
 char *strstr(const char *dest, const char *substr);
 void *memchr(const void *dest, int c, size_t n); 
 void *memrchr(const void *dest, int c, size_t n); 

The function strchr() returns the first occurrence of a given character ('c') in the string dest. strrchr() returns the first occurrence of a given character, but the search begins from the reverse side of the string dest. The next function, strstr() locates the first occurrence of a substring (substr) within the dest string.

Functions memchr() and memrchr() are in someways counterparts of strchr() and strrchr() with one difference -- they can search non-string data as well. memchr() finds the first occurrence of character c in the n bytes of dest buffer; memrchr() is similar to memchr() but it searches from end of the buffer.

Even though, the above functions take "c" as an integer, the value of "c" is actually read as an unsigned char. Thus, even though an integer has multiple bytes (4 bytes in most of the platforms), "int c" is actually reads only as a single byte, since an unsigned char requires only one byte. Lastly, since bytes are stored as ASCII values and ASCII values are different for upper-case and lower-case values, these functions are case-sensitive.

If the lookup fails, then all of the above functions return NULL.

Before we go any further, we would like to mention that GNU Lib C (glibc) provide a memory equivalent function for strstr(): memmem(). memmem() finds occurrences of a given data subset within a bigger data. Since memmem() may not be available on all versions of C-releases, we do not cover it here. Nonetheless, if you are using glibc and code-portability beyond gcc is not an issue for you (at least, in the short-term!), then "memmove" may be worth exploring.

Examples

Let us now go through two simple examples and enhance our understanding of these search functions.

The first example demonstrates how we can search for a given character or a given substring, within another string. It uses all of the above functions: strchr(), strrchr(), strstr(), memchr(), and memrchr().

Each of these functions return a pointer to the location of the first occurrence of the character (or the substring) being searched; for the reverse-based functions, the first location is counted from the end. The program prints the returned value of these functions.

 #include <stdio.h>
 #include <string.h>

 int main () {
     char str[] = "The Last Supper by Leonardo da Vinci";
     char *str_temp;

     str_temp = strchr(str, 'L'); /* Find a character */
     printf("[strchr]  Returned string: %s\n", str_temp);

     str_temp = strrchr(str, 'L'); /* Start search from the end */
     printf("[strrchr] Returned string: %s\n", str_temp);

     str_temp = strstr(str, "Supper"); /* Find a substring in the string */
     printf("[strstr]  Returned string: %s\n\n", str_temp);

     /* Following calls use memory APIs for the above tasks */
     str_temp = (char *)memchr((void *)str, 'L', strlen(str));
     printf("[memchr]  Returned string: %s\n", str_temp);

     str_temp = (char *)memrchr((void *)str, 'L', strlen(str));
     printf("[memrchr] Returned string: %s\n", str_temp);
     return 0;
 }

Note that, if we were to search for an non-existing character (e.g. 'Z'), or a non-existing substring (e.g. "Mona"), then these functions would return a NULL. Accordingly, the calling function should check if the returned value is NULL and process the returned value only when it is not NULL. Here is the output.

 $ gcc strsearch.c -o strsearch
 $ 
 $ ./strsearch
 [strchr]  Returned string: Last Supper by Leonardo da Vinci
 [strrchr] Returned string: Leonardo da Vinci
 [strstr]  Returned string: Supper by Leonardo da Vinci

 [memchr]  Returned string: Last Supper by Leonardo da Vinci 
 [memrchr] Returned string: Leonardo da Vinci 

Our discussion for search-related functions would be incomplete if we do not see an example of how memory search works for non-string data! With that goal in mind, let us write our second example uses memchr() to search a character within a data structure.

The example begins with a definition of a simple data structure and then uses memchr() to see if a character is present in the data structure or not. Since the returned value does not point a string, we cannot use printf() to print the returned value. Bounded by this constraint, we choose a workaround and instead, print the address of the data structure and the returned value.

 #include <stdio.h>
 #include <string.h>

 #define BUFFER_SIZE 100
 #define STR_PAINTER "Leonardo da Vinci"

 typedef struct painting_frame {
     int painting_id;
     int width;
     int height;
     char painter[BUFFER_SIZE];
 } painting_frame_t;

 int main () {
     painting_frame_t painting = {1001, 40, 100, STR_PAINTER};
     void *void_temp;

     void_temp = memchr((void *)&painting, 'L', sizeof(painting_frame_t));
     printf("Address of data-structure: %p\n", &painting);
     printf("Address of 'L' character : %p\n", &painting.painter);
     printf("Returned location for 'L': %p\n", void_temp);

     void_temp = memchr((void *)&painting, 'a', sizeof(painting_frame_t));
     printf("Returned location for 'a': %p\n", void_temp);

     return 0;
 }

When we run this program, we find that memchr() returns a pointer to the location of 'L' -- this location is located at an offset of 12 bytes from the address of the painting data structure. This is because, 'L' is at the start of the painter field of the structure and prior to that, the structure has three integer fields. On the system, where this is compiled, the size of an integer is 4 bytes and so the offset is 12 bytes. If we run this on a system, where an integer requires 8 bytes, then the offset would be 24 bytes. The pointer to 'a' character is 4 bytes further down that of 'L'. Here is the output.

 Address of data-structure: 0xbf8ce0ec
 Address of 'L' character : 0xbf8ce0f8
 Returned location for 'L': 0xbf8ce0f8
 Returned location for 'a': 0xbf8ce0fc

Before we conclude, we should note that the painting_frame structure is well-aligned -- each of the int fields are of sizes 4 bytes and follow the alignment boundary. Even the painter array has a size of 100, which can also be aligned with 4 bytes. However, if this were not the case and if painting_frame was not aligned (e.g. by adding a "char val[3]" at the start of the structure), then the compiler would add some padding bytes. In that cases, the above offsets would also need to take into account compiler-added alignment padding.

This example is provided purely for the sake of helping us understand memchr() for non-string data. Before accessing data structure members using raw memory, we should pay close attention to alignment for data structure. For more on data structure alignment, please visit our page on C data structures.





comments powered by Disqus