CodingBison

Functions are important building blocks of C language (or most of the languages, for that matter). In C, every piece of code sits in one function or the other. In fact, the very first place where the execution begins is in the main() function. All other programs are either called by main() or are called by the functions that are called by the main(), and so on.

Functions offer several key advantages. First, they allow a programmer to keep the program modular -- each function can provide a set of code that typically does one task or a set of closely related tasks. Second, functions promote code-reuse. Different parts of program may need to do the same task and if we have a common function, then all they have to do is to simply call the common function. Third, since the common task now sits in one function, this automatically provides a consistent behavior to all the callers of that function.

We start by providing a pseudo-code for functions and then discuss its components. The pseudo-code (provided below) contains a function named foo(). The function has been intentionally simplified to help us understand how functions work.

 /* Function declaration */
 int foo (int bar);

 /* Function definition */
 int foo (int bar) {
     int output;

     /* Function Statements */
     ... 
     ...

     /* If needed, return a value */
     return (output);
 }

The pseudo-code starts by declaring the function as "int foo(int bar);". This declaration helps other functions know that there is a function named foo(). The declaration also provides a signature (or the interface) of the function. What follows the declaration is the main body of the function, also known as function definition. A definition holds a set of statements that provide the function logic; these statements are enclosed within a set of curly braces.

A function signature also contains both the list of arguments and the return value of the function. For a function to do anything useful, it often takes a set of input (called arguments or parameters) from the caller function, as in "int bar". Next, the function statements do some work on that input. And when done, the function returns the output -- in many cases, the output is nothing but a value that the function returns (using the return keyword), as in "return (output)". A function can take as many input values as needed but it can return only one value! We can visualize a function to be a black-box -- we pass some values to it and in return, it provides an output.

With that, it is time to get our hands dirty and write an example to see how functions work. This example (provided below) implements a function (get_painting_year()) that retrieves the year in which a painting was painted. This function gets called from the main() function.

 #include <stdio.h>

 /* Paintings by Leonardo da Vinci and Van Gogh */
 #define MONA_LISA            1002
 #define POTATO_EATERS        1003

 #define YEAR_MONA_LISA       1505
 #define YEAR_POTATO_EATERS   1885

 #define MAX_ELEM             2

 /* Function declaration */
 int get_painting_year(int id);

 /* Function definition */
 int get_painting_year (int param_id) {
     int ret_val;

     printf("\t[%s] Passed param: %d\n", __FUNCTION__, param_id);
     switch (param_id) {
     case MONA_LISA:
         ret_val = YEAR_MONA_LISA;
         break;
     case POTATO_EATERS:
         ret_val = YEAR_POTATO_EATERS;
         break;
     default:
         ret_val = -1;
     }
     printf("\t[%s] All done (returning %d)\n", __FUNCTION__, ret_val);
     return ret_val;
 }

 int main () {
     int painting_year, i;
     int arr_id[MAX_ELEM] = {MONA_LISA, POTATO_EATERS};

     printf("[%s] Starting to run..\n", __FUNCTION__);
     /* Loop over all the painting names */
     for (i=0 ; i < MAX_ELEM; i++) {
         printf("[%s] i: %d  painting id: %d\n", __FUNCTION__, i, arr_id[i]);
         painting_year = get_painting_year(arr_id[i]);
         printf("[%s] painting year: %d\n", __FUNCTION__, painting_year); 
     }
     printf("[%s] All done..\n", __FUNCTION__);
     return 0;
 }

The above example starts with function declaration: "int get_painting_year(int id);". This declaration specifies that get_painting_year() takes an integer as an argument and it returns an integer value.

Next, the example provides the function definition. The function uses a switch statement to find the year for a given painting and then returns this value. If the painting does not belong to the list of cases, then the switch statement falls to the default case, where it returns -1.

Lastly, we have the definition of the main() function. Even though the main() is also a function, it never needs a declaration! The reason why we do not need a declaration for main() is that the definition of the main() function is well-specified, as in "int main(int argc, char *argv[])". The combination of argv and argc specify how one can pass command line arguments to the program: argc is the number of strings contained in the argv. One can also omit these arguments entirely, if we do not need to pass any command-line options. That is precisely why, our definition of main() omits these values. More on argc and argv a little later in this section!

The definition of the main() contains an array of names for various paintings. It uses a for loop to go over each element of the array and for each of the element, it calls get_painting_year(). Lastly, it prints the returned value from these functions. Every C program must have one (and only one) main() function -- it is this function that is run first when we execute the program.

Please note that since the function get_painting_year() is defined before the main() function, in this case, it is not necessary to declare it. However, it is a good practice to do so since this way, the programmer need not worry about the ordering of functions int the given file (or a set of files).

When we compile and run the above program, the output (provided below) shows that the main() gets called first. For each element in the loop, we call get_painting_year() and then print the value returned from it. Please note that the macro __FUNCTION__ (similar to macro __LINE__) prints the current function in which it is present.

 [main] Starting to run..
 [main] i: 0  painting id: 1002
 	[get_painting_year] Passed param: 1002 
 	[get_painting_year] All done (returning 1505) 
 [main] painting year: 1505
 [main] i: 1  painting id: 1003
 	[get_painting_year] Passed param: 1003 
 	[get_painting_year] All done (returning 1885) 
 [main] painting year: 1885
 [main] All done..

A function may also return a void (which means nothing!) instead of an integer or any other type. Here is a rewrite of the earlier program that moves the task of retrieving and printing the year from the main to the print_painting_year() function. With this, the function does not need to return any integer. Accordingly, we update both the declaration and the definition of the function!

 #include <stdio.h>

 /* Paintings by Leonardo da Vinci and Van Gogh */
 #define MONA_LISA            1002
 #define POTATO_EATERS        1003

 #define YEAR_MONA_LISA       1505
 #define YEAR_POTATO_EATERS   1885

 #define MAX_ELEM             2

 /* Function declaration */
 void get_painting_year(int id);

 /* Function definition */
 void get_painting_year(int param_id) {
     int ret_val;

     printf("\t[%s] Passed param: %d\n", __FUNCTION__, param_id);
     switch (param_id) {
     case MONA_LISA:
         ret_val = YEAR_MONA_LISA;
         break;
     case POTATO_EATERS:
         ret_val = YEAR_POTATO_EATERS;
         break;
     default:
         ret_val = -1;
     }
     printf("\t[%s] painting year: %d\n", __FUNCTION__, ret_val); 
     printf("\t[%s] All done..\n", __FUNCTION__);
 }

 int main () {
     int painting_year, i;
     int arr_id[MAX_ELEM] = {MONA_LISA, POTATO_EATERS};

     printf("[%s] Starting to run..\n", __FUNCTION__);
     /* Loop over all the painting names */
     for (i=0 ; i < MAX_ELEM; i++) {
         printf("[%s] i: %d  painting id: %d\n", __FUNCTION__, i, arr_id[i]);
         get_painting_year(arr_id[i]);
     }
     printf("[%s] All done..\n", __FUNCTION__);
     return 0;
 }

The output in this case is same as before except that it is the get_painting_year() that prints the value of the year.

 [main] Starting to run..
 [main] i: 0  painting id: 1002
 	[get_painting_year] Passed param: 1002 
 	[get_painting_year] painting year: 1505
 	[get_painting_year] All done..
 [main] i: 1  painting id: 1003
 	[get_painting_year] Passed param: 1003 
 	[get_painting_year] painting year: 1885
 	[get_painting_year] All done..
 [main] All done..

Functions and Scope

Functions provide an important basis for variable scoping in C.

If we define a variable inside a function, then it can be used only inside that function. On the other hand, if we define a variable outside of all functions (including main()), then its scope is global and can be used throughput the file. Thus, depending upon where we define a variable (inside a function or outside), its scope can be different.

As flexible and wonderful as global variables sound, they are not without their share of headaches! When we keep a variable global, we risk it getting overwritten by multiple functions. Also, a global variable is more likely cause conflict when we have to merge our programs (source-files) with additional files when merging different code-bases -- yes, this happens in real-life software processes! So, we should use global variables as sparingly as possible.

As an example, in the earlier program, the function print_painting_year() defines the variable painting_year. This variable can be used only within this function. If we try to use it outside of this function, then that would be a compilation error.

Whenever we call the function print_painting_year(), the value of the variable painting_year is initialized and once the call returns from the function, the variable goes out of scope; that is, it no longer exists in the stack. For some cases, we would like to remember the scope across different invocations.

We can do so by using the "static" keyword that enables the variable to remember its value across different invocations of the function. Thus, defining it as "static int painting_year;" would mean that it would retain the value of painting_year that the function had in the previous invocation.

In fact, functions themselves can also be scope-limited. To do so we can use the same "static" keyword before its definition or declaration -- a static function is a function whose scope is limited to the current source file. Referring to such function beyond the current file would be an error that the compiler would not tolerate!

Thus, if we were to make the print_painting_year() function static, then all we need to do is to add the static keyword at the very start of the function. Here is the function signature; for the sake of brevity, we skip the body of the function.

 static void print_painting_year (int param_id) {
     ...
     ...
     ...
 }

We should reiterate the difference in the behavior of the static keyword, when used with a function and when used with a variable. For a function, it means the scope is only for the current file, where as for a variable, it means that C would remember the value of the variable when we call the function next time.

Recursive Functions

Functions are so versatile that they can call even themselves! Such types of functions are called recursive functions.

A typical use case for such functions is when a function processes a series of data successfully. It starts with the first data element of the series and then calls itself to process the next data in the series, and so on. At each call, the "remaining" series is passed as the argument to the next call of the same function.

To show a use-case, let us consider a function that computes factorial of a number. As we know, a factorial of a number, n is n * (n-1) * (n-2) *..... * 3 * 2 * 1. Thus, factorial of 5 would be 5 * 4 * 3 * 2 * 1 = 120.

Here is a program that does the same. It uses a recursive function, compute_factorial(), to compute factorial of a passed value. Note that since we pass the argument to compute factorial from command-line, the main() function can not omit its arguments (argc and argv) in its definition.

 #include <stdio.h>

 int compute_factorial(int num);

 int compute_factorial(int num) {
     int factorial = 1;

     printf("\t[%s] Passed param: %d\n", __FUNCTION__, num);
     if (num != 1) { 
         factorial = num * compute_factorial(num-1);
     }
     printf("\t[%s] All done (returning %d)\n", 
                     __FUNCTION__, factorial);
     return factorial;
 }

 int main (int argc, char *argv[]) {
     int number_passed, factorial_value;

     printf("[%s] Starting to run..\n", __FUNCTION__);
     if (argc == 2) {
         number_passed = atoi(argv[1]);
         printf("Let us calculate factorial for %d\n", number_passed);
     } else {
         printf("Error! Please enter a number.\n");
         return -1;
     }

     factorial_value = compute_factorial(number_passed);
     printf("[%s] The factorial of %s is %d\n", 
                     __FUNCTION__, argv[1], factorial_value);
     printf("[%s] All done..\n", __FUNCTION__);
     return 0;
 }

When we run this program (let us say the executable output is named factorial) for the value 5, it shows how the function calls itself for all the integers starting from 5 to 1.

 $ gcc factorial.c -o factorial
 $ 
 $ ./factorial 5
 [main] Starting to run..
 Let us calculate factorial for 5
 	[compute_factorial] Passed param: 5 
 	[compute_factorial] Passed param: 4 
 	[compute_factorial] Passed param: 3 
 	[compute_factorial] Passed param: 2 
 	[compute_factorial] Passed param: 1 
 	[compute_factorial] All done (returning 1) 
 	[compute_factorial] All done (returning 2) 
 	[compute_factorial] All done (returning 6) 
 	[compute_factorial] All done (returning 24) 
 	[compute_factorial] All done (returning 120) 
 [main] The factorial of 5 is 120
 [main] All done..

The output reflects the nature of recursive calls. Recursive programs are run using program stack. Thus, in the first call to compute_factorial(), the program pushes (adds) the context in the stack with the value of num variable as 5. More specifically, the context is the address of the instruction, but we will leave it at that! Since the base case for a factorial is 1, the program calls the same function again, but this time with the value of 4, and thus pushes another context in the stack. It continues to do so with values 3 and 2, till it reaches 1.

And when the call reaches 1, compute_factorial() does not need to call itself since factorial of 1 is 1 and so, it simply returns the same. After 1 completes its task, its context is popped (removed) from the stack. With that, the call-chain in the stack starts to unwind. We go back to the context with value of 2, where it is computed (as 2 * 1) and then it pops as well. In the end, it reaches the call in the stack that was passed with the value of 5. This one also pops and the call is sent back to the main() function.

Inline Functions

C allows us to define inline functions, where the compiler replaces occurrences of functions with the actual code of the function. Thus, inline functions are like macros of functions! We can define an inline function by placing the "inline" keyword before the return type of the function. The "inline" keyword is a request to the compiler and the compiler is free to ignore it.

 inline void print_painting_year (int param_id) {
     ...
     ...
     ...
 }

So, why do we need inline functions? Well, if the function is very small, the compiler saves a call (and thus, the time overhead) in the stack by simply substituting the function calls with the function body. Thus, expanding inline functions makes things run faster since there is no need to push or pop function's address and the parameter value on the stack. This optimization can be helpful if we have a small function, that executes frequently.

However, a word of caution with inline functions. Overzealous use of inlining can lead to bloating of code since every function call would simply be replaced by the actual function code. Also, inline functions are not the friendliest functions when we have to debug the code with GDB. Because the compiler substitutes inline functions, the function does not have an address, and so it might be difficult to set GDB breakpoints with the function name. As a workaround, compilers often use additional arguments to make GDB think that even inline functions have an actual address.

Passing arguments to the main() function

The main() function can also accept input from the command line. To do so, we can add arguments to the main() function itself: an integer "argc" representing the number (or count) of arguments and "char *argv[]" representing an array of strings.

In the earlier output for the recursive function, we pass one argument to the program, as in "./factorial 5". The first element of the argv[] array is always the name of the program itself!. Thus, argv[0] would be "./factorial" and argv[1] would be "5". Since the arguments are in string, we need to use atoi() function to convert this string (symbolic as "a" in the atoi() name) to an integer (symbolic as "i" in the atoi() name) and hence the name, atoi!

One can also use the argc/argv arguments to pass explicit options to the program. C allows us to achieve this using the built-in getopt() function.

We provide below an example that uses the getopt() function; we pass "-l" and "-v" to the program and signal the program to do a specific task. The "-l" option prints paintings by Leonardo da Vinci and the "-v" option prints paintings by Van Gogh. Note that we provide multi-line input to printf.

 #include <stdio.h>

 void print_paintings_by_leonardo(void); 
 void print_paintings_by_van(void);

 void print_paintings_by_leonardo (void) {
     printf("Paintings by Leonardo da Vinci:\n"
            "Mona Lisa and The Last Supper \n");
     return;
 }

 void print_paintings_by_van (void) {
     printf("Paintings by Van Gogh:\n"
            "Potato Eaters, Cypress, and Starry Nights \n");
     return;
 }

 int main (int argc, char *argv[]) {
     char c;

     while ((c = getopt(argc, argv, "lv")) != -1) {
         switch (c) {
         case 'l':
             print_paintings_by_leonardo();
             break;
         case 'v':
             print_paintings_by_van(); 
             break;
         default:
             printf("Incorrect option -- please pass l or v\n");
             break;
         }
     }
     return 0;
 }

Here is the output of passing these option flags to the program.

 $ gcc getopt.c -o getopt
 $ 
 $ ./getopt -l
 Paintings by Leonardo da Vinci: 
 Mona Lisa and The Last Supper 
 $ 
 $ ./getopt -v
 Paintings by Van Gogh: 
 Potato Eaters, Cypress, and Starry Nights 
 $ 
 $ ./getopt -d
 ./getopt: invalid option -- 'd'
 Not a correct option. Please pass options n or p 
 $ 
 $ ./getopt -lv
 Paintings by Leonardo da Vinci: 
 Mona Lisa and The Last Supper 
 Paintings by Van Gogh: 
 $ 




comments powered by Disqus