CodingBison - Multiple Threads

When we run a Python program, the program runs as a process. A process runs in its own context -- meaning that it can execute only one instruction at a time. If we need a program to run multiple contexts, then we can do so using threads. A program can have multiple threads and each thread can run its own task. The advantage of having multiple threads is that if one thread blocks on some input-output, then other threads can still continue to do their tasks. Thus, having multiple threads allow the process to do tasks concurrently. And, if we have multiple CPUs (cores), then we can have threads run independently on different CPUs, leading to faster applications.

As an example, let us consider a multi-threaded web-server handling multiple HTTP requests. The web-server can spawn multiple threads, where each thread can handle requests from various clients.

Threads belonging to a process share the address space, file descriptors, etc belonging to the parent process with each other. Each thread, however, does maintain its own stack so that it can execute common lines of code independently. Since they do not have the overhead of maintaining their own address space, file descriptors etc, threads are also referred to as lightweight processes!

Python threads are implemented on top of native threads like POSIX Threads or Windows threads. It is due to this reason that Python does not implement any thread scheduler, thread-preemption mechanisms, or thread priorities of its own -- it simply relies on the underlying Operating System to provide that support.

Python provides two modules for using threads: thread and threading. The threading module is written on top of thread module and offers several additional features. Hence, we focus on threading module. This page starts with discussions on creating a single thread followed by creating multiple threads. Next, it talks about achieving synchronization among threads so that they can access common data without overwriting it. In the end, we also discuss Python's Global Interpreter Lock (GIL).

The threading Module

In Python, the popular way to use multiple threads is to use the threading module. Let us begin by investigating this module and see the methods and attributes supported by this module.

 [user@codingbison]$ python3
 Python 3.2.3 (default, Jun  8 2012, 05:40:06) 
 [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> 
 >>> import threading
 >>> 
 >>> dir(threading)
 ['Barrier', 'BoundedSemaphore', 'BrokenBarrierError', 'Condition', 'Event', 
     'Lock', 'RLock', 'Semaphore', 'TIMEOUT_MAX', 'Thread', 'ThreadError', 
     'Timer', 'WeakSet', '_BoundedSemaphore', '_CRLock', '_Condition', 
     '_DummyThread', '_Event', '_MainThread', '_PyRLock', '_RLock', '_Semaphore', 
     '_Timer', '_VERBOSE', '_Verbose', '__all__', '__builtins__', '__cached__', 
     '__doc__', '__file__', '__name__', '__package__', '_active', 
     '_active_limbo_lock', '_after_fork', '_allocate_lock', '_counter', 
     '_dangling', '_enumerate', '_format_exc', '_get_ident', '_limbo', 
     '_newname', '_pickSomeNonDaemonThread', '_profile_hook', '_shutdown', 
     '_sleep', '_start_new_thread', '_sys', '_time', '_trace_hook', 'activeCount', 
     'active_count', 'currentThread', 'current_thread', 'enumerate', 'local', 
     'setprofile', 'settrace', 'stack_size']
 >>>

In the above output, some of the common functions are: Thread(), start(), join(), current_thread(), and Lock(). The Thread()) method creates a thread and assigns a target function to it along with function arguments. The start() method starts the thread. The join() method allows a thread (usually the parent thread) to wait for the (child) thread to finish. The current_thread() (or currentThread()) allows us to get a handle of the thread running in the current context. Lastly, the Lock() method provides a lock that can be used to protect common data shared among multiple threads.

Here is a simple pseudo-code that does the bare-minimum task of creating a thread, t, starting it, and makes it run the target function, foo(). Even though it is a simple code, the example still has two threads. The first thread is the main thread that calls the Thread() method to create the child thread and then waits for it. The second thread is the child thread, t, that runs the target function, foo().

 import threading

 def foo(n):
     ...
     ...

 t= threading.Thread(target = foo, args = (n,))
 t.start()
 t.join()

With that, let us write our first threaded program! In the example (provided below), we use a thread, t, that has sleepAndRun() function as the target function. The main thread passes a number, n, to this function; the number, n, specifies the number of iterations in the sleepAndRun(). The thread mimics a case of an animal that runs and and when it gets tired, takes a small nap. Using my rather limited imagination of the wild-life, one such animal could be a turtle running the famous slow-and-steady race!

The example also prints the names of the two threads: the main thread and the t thread. For that it uses the current_thread() method to get the thread present in the context and then uses getName() method of the current thread to get its name. For the sake of better-readability of the output, we add a tab for print statements in the target function.

 from threading import Thread, current_thread
 import time
 import random

 def sleepAndRun(n):
     tName = current_thread().getName()
     print("\t[%s] Starting the thread" % (tName))
     i = 0
     while i < n:
         x = 5 * random.random()
         print("\t[%s] Let us sleep for %f seconds" % (tName, x))
         time.sleep(x)

         x = 5 * random.random()
         print("\t[%s] Let us run for %f seconds" % (tName, x))
         i += 1
     print("\t[" + tName + "] Finishing the thread")

 if __name__ == '__main__':
     tName = current_thread().getName()
     print("[%s] Starting the thread" % (tName))

     t = Thread(target=sleepAndRun, args=(5,))
     t.start()

     print("[%s] Wait for thread: %s" % (tName, t.getName()))
     t.join()
     print("[%s] Finishing the thread" % (tName))

The output (provided below) shows the workings of two threads. The main thread (with name, "MainThread") starts by spawning the child thread (with name, "Thread-1") and then waits for it to finish. The child threads run the target function and when done, the join() call returns in the main thread.

 [MainThread] Starting the thread
         [Thread-1] Starting the thread
         [Thread-1] Let us sleep for 2.148944 seconds
 [MainThread] Wait for thread: Thread-1
         [Thread-1] Let us run for 4.318185 seconds
         [Thread-1] Let us sleep for 4.438070 seconds
         [Thread-1] Let us run for 2.521535 seconds
         [Thread-1] Let us sleep for 1.906382 seconds
         [Thread-1] Let us run for 0.965319 seconds
         [Thread-1] Let us sleep for 0.044034 seconds
         [Thread-1] Let us run for 1.867500 seconds
         [Thread-1] Let us sleep for 1.964782 seconds
         [Thread-1] Let us run for 3.948760 seconds
         [Thread-1] Finishing the thread
 [MainThread] Finishing the thread

The above example shows that the thread, t, has its own methods like start(), join(), and setName(). Let us use the dir() command to see all of the methods that belong to a Python thread (created using the threading module).

 >>> def foo():
 ...     print("Hello")
 ... 
 >>> t = threading.Thread(target=foo, args=())
 >>> 
 >>> dir(t)
 ['_Thread__exc_info', '_Thread__initialized', '__class__', '__delattr__', '__dict__', 
     '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', 
     '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', 
     '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', 
     '__str__', '__subclasshook__', '__weakref__', '_args', '_block', '_bootstrap', 
     '_bootstrap_inner', '_daemonic', '_delete', '_ident', '_initialized', '_kwargs', 
     '_name', '_note', '_reset_internal_locks', '_set_daemon', '_set_ident', 
     '_started', '_stderr', '_stop', '_stopped', '_target', '_verbose', 'daemon', 
     'getName', 'ident', 'isAlive', 'isDaemon', 'is_alive', 'join', 'name', 'run', 
     'setDaemon', 'setName', 'start']
 >>>

The thread has several methods and attributes. For example, setName() and getName() methods set and get the name of a thread.

Let us move on and write our next example that uses multiple threads. The example is a yet another trivial abstraction of what is known as a salmon run, when salmon fish migrate from ocean (North Atlantic and North Pacific) to the upper reaches of the river to spawn. This migration is not without its risk! When they swim upstream in the river, they are often caught by bears waiting patiently for them.

To model this, the example creates two threads: t_bear and t_salmon. The thread, t_bear, mimics a bear catching a salmon. It does so by decrementing a global variable (salmonCount) since catching a fish decreases their count. The thread, t_salmon, mimics a salmon swimming upstream. It does so by incrementing the same global variable since a new incoming fish (from the ocean) jumping upstream increases the fish count that can be caught by the bears.

Thus, while one thread (t_bear) consumes salmon, the other thread (t_salmon) actually adds them. Hence, this becomes a producer-consumer problem; one thread produces the salmon and the other thread consumes the salmon. Lastly, to get a more readable output, we add a tab for print statements in both target functions.

 import time
 import random
 from threading import Thread, current_thread

 salmonCount = 0

 def catchSalmon(n):
     global salmonCount
     i = 0
     tName = current_thread().getName()
     print("\t[%13s] Starting the thread" % (tName))
     while i < n:
         print("\t[%13s] Total Salmon: %d" % (tName, salmonCount))
         if (salmonCount <= 0):
             x = 5 * random.random()
             print("\t[%13s] No Salmon yet. Let us wait again" % (tName))
             time.sleep(x)
         else:
             print("\t[%13s] Woohoo.. Caught a Salmon" % (tName))
             salmonCount -= 1
             i += 1
     print("\t[%13s] Finishing the thread" % (tName))

 def jumpUpstream(n):
     global salmonCount
     i = 0
     tName = current_thread().getName()
     print("\t[%13s] Starting the thread " % (tName))
     while i < n:
         x = 5 * random.random()
         time.sleep(x)
         salmonCount += 1
         print("\t[%13s] One Salmon jumped upstream" % (tName))
         print("\t[%13s] Total Salmon: %d" % (tName, salmonCount))
         i += 1
     print("\t[%13s] Finishing the thread" % (tName))

 if __name__ == '__main__':
     tName = current_thread().getName()
     print("[%s] Starting the thread " % (tName))

     t_bear = Thread(target=catchSalmon, args=(3,))
     t_bear.setName("Bear-Thread")
     t_bear.start()

     t_salmon = Thread(target=jumpUpstream, args=(3,))
     t_salmon.setName("Salmon-Thread")
     t_salmon.start()

     print("[%s] Wait for thread: %s" % (tName, t_bear.getName()))
     t_bear.join()
     print("[%s] Wait for thread: %s" % (tName, t_salmon.getName()))
     t_salmon.join()
     print("[%s] Finishing the thread" % (tName))

The output (provided below) shows that when there is no salmon, the t_bear thread simply waits for it (by calling time.sleep()). When the thread wakes up and if the salmonCount is non-zero, then it decrements it by 1. On the other hand, when the t_salmon thread wakes up, it increments the salmonCount by 1. After the completion of n such iterations for both threads, they finish and the join() call returns in the main thread.

 [MainThread] Starting the thread 
         [  Bear-Thread] Starting the thread
         [  Bear-Thread] Total Salmon: 0
         [  Bear-Thread] No Salmon yet. Let us wait again
 [MainThread] Wait for thread: Bear-Thread
         [Salmon-Thread] Starting the thread 
         [Salmon-Thread] One Salmon jumped upstream
         [Salmon-Thread] Total Salmon: 1
         [  Bear-Thread] Total Salmon: 1
         [  Bear-Thread] Woohoo.. Caught a Salmon
         [  Bear-Thread] Total Salmon: 0
         [  Bear-Thread] No Salmon yet. Let us wait again
         [Salmon-Thread] One Salmon jumped upstream
         [Salmon-Thread] Total Salmon: 1
         [Salmon-Thread] One Salmon jumped upstream
         [Salmon-Thread] Total Salmon: 2
         [Salmon-Thread] Finishing the thread
         [  Bear-Thread] Total Salmon: 2
         [  Bear-Thread] Woohoo.. Caught a Salmon
         [  Bear-Thread] Total Salmon: 1
         [  Bear-Thread] Woohoo.. Caught a Salmon
         [  Bear-Thread] Finishing the thread
 [MainThread] Wait for thread: Salmon-Thread
 [MainThread] Finishing the thread

One last note. If you tried to kill the program using Ctrl-C, then you might be in for a surprise. When using multiple threads in Python, Ctrl-C may take a long time to kill the process. The reason is that the main thread is blocked on the uninterruptible join() call and so it does not get scheduled to receive the Ctrl-C signal. So, if you have to kill a Python threaded program quickly, one way would be to use the kill command.

Thread Synchronization

When multiple threads work on a common data, it is possible that they could overwrite the data. Python can switch thread context based on couple of criteria like number of bytecode instructions, start of long-running tasks, execution of embedded C routines etc. In the above example, it is possible that the common variable salmonCount might still get a different value. This is because, updating a variable is not necessarily a single bytecode. Hence, if one thread is scheduled out of the CPU while seeing some value of salmonCount, it is possible that the other thread can update that value during that time. So, when the earlier thread resumes, it is still looking at the older value. This can lead to loss of data integrity.

Hence, we need to synchronize threads such that out of many threads, only one thread can access the common data at any given time. One solution is to use a lock that is provided by the Lock() method of the threading module.

Each thread that intends to update the common data, must first acquire the lock (using its acquire() method). If the lock is already acquired by some other thread, then the thread will block till the other thread releases the lock. Once a thread is done accessing the common data, it can release the lock (using its release() method).

Since lock.acquire() can potentially block the current thread, sometimes a thread can use a handy shortcut to check if the lock is available or not. The thread can pass an optional boolean parameter to this method and set it to False; if no parameter is specified, then the default value if True. Thus, lock.acquire(False) would acquire the lock if it is available, else, it would return immediately with a value of False.

With that, let us rewrite the earlier program and use a lock to protect the common salmonCount variable. The output of this program is similar to that of the earlier example. Hence, we omit it.

 import time
 import random
 from threading import Thread, Lock, current_thread

 salmonCount = 0
 lock = Lock()   #Common lock for both threads.

 def catchSalmon(n):
     global salmonCount
     global lock
     i = 0
     tName = current_thread().getName()
     print("\t[%13s] Starting the thread" % (tName))
     while i < n:
         print("\t[%13s] Total Salmon: %d" % (tName, salmonCount))
         if (salmonCount <= 0):
             x = 5 * random.random()
             print("\t[%13s] No Salmon yet. Let us wait again" % (tName))
             time.sleep(x)
         else:
             print("\t[%13s] Woohoo.. Caught a Salmon" % (tName))
             lock.acquire()
             salmonCount -= 1
             lock.release()
             i += 1
     print("\t[%13s] Finishing the thread" % (tName))

 def jumpUpstream(n):
     global salmonCount
     global lock
     i = 0
     tName = current_thread().getName()
     print("\t[%13s] Starting the thread " % (tName))
     while i < n:
         x = 5 * random.random()
         time.sleep(x)
         lock.acquire()
         salmonCount += 1
         lock.release()
         print("\t[%13s] One Salmon jumped upstream" % (tName))
         print("\t[%13s] Total Salmon: %d" % (tName, salmonCount))
         i += 1
     print("\t[%13s] Finishing the thread" % (tName))

 if __name__ == '__main__':
     tName = current_thread().getName()
     print("[%s] Starting the thread " % (tName))

     t_bear = Thread(target=catchSalmon, args=(3,))
     t_bear.setName("Bear-Thread")
     t_bear.start()

     t_salmon = Thread(target=jumpUpstream, args=(3,))
     t_salmon.setName("Salmon-Thread")
     t_salmon.start()

     print("[%s] Wait for thread: %s" % (tName, t_bear.getName()))
     t_bear.join()
     print("[%s] Wait for thread: %s" % (tName, t_salmon.getName()))
     t_salmon.join()
     print("[%s] Finishing the thread" % (tName))

Global Interpreter Lock (GIL)

In Python, at any given time, only one thread can run in the interpreter. This limitation is provided due to Python's Global Interpreter Lock (GIL). Other languages like, C or Java, do allow multiple threads to run concurrently. In fact, the issue of having GIL is unique to the C implementation of Python (CPython). The Java implementation of Python (Jython) supports true threading by using the underlying JVM.

Every thread must acquire the GIL before it can run. And, if it blocks, let us waiting for IO, then it releases the GIL and one of the other threads waiting in the ready state, can acquire it. Acquiring the GIL is an expensive mechanism since it involves thread signaling. This behavior can sometimes make multi-threaded programs run slower than non-threaded programs!

What this also means is that if we have multiple CPUs (cores), then the restriction imposed by GIL may not allow us to take full advantage of multiple CPUs. For such cases, threads running on other cores must also acquire the GIL, which can be even more expensive. It is due to this reason that GIL may not provide us as much performance gain as we can when using multiple CPUs.

It is understandable to think that if only one thread holds the GIL at a given time, then that would mean that only one thread gets to access the common data. This is not the case since GIL protects the interpreter state only. And hence, if we have data in the application space, then it is not protected by the GIL and hence, we still must protect common application data.

One way to avoid the limitation provided by GIL is to use the multiprocessing module. This module allows us to use multiple processes similar to that of having multiple threads. Since GIL is per-process, each process acquires its own GIL and continues to run concurrently.