Generator functions rely on lazy evaluation of data sequences. If the application logic needs to build a large sequence of certain numbers, let us say Fibonacci numbers, then one way is to evaluate all the Fibonacci numbers and store them in an sequence. The other approach is to evaluate them one at a time, as and when needed. Generators allow us to do the latter.
Like regular functions, generator functions are also built using the "def" keyword. Unlike regular functions, generators typically use a yield statement to return the value and not a return statement. Note that if design warrants, a generator function can have both yield and return statements, but that is not a common case. The unique thing about a generator object is that it saves the state of the function iteration logic and every time we iterate over the elements of the generator's sequence, the generator resumes its state from the last saved state.
Using a generator function helps us be more efficient in terms of memory. Why generate a large sequence when we do not need all the values right away? Thus, the memory requirement for a sequence of size n goes from O(n) to O(1). Also, there could be cases, where computing elements of the sequence is expensive. In that case, it once again, makes sense to use generators since we do not want to generate a large list of computationally expensive values if we do not need them immediately.
However, when using generators, we should tread carefully! One consideration for not using generators should be when we use a generated sequence over and over again. If the sequence is small and computing them is not very expensive, then building the whole sequence might be less expensive if we have to use the same sequence a lot of times. With generators, we have to save the function state and resume it with every iteration -- this can get more expensive if we have to use the same sequence over and over again. Equally important, if building the sequence is computationally expensive, then saving the entire sequence would still be more efficient, if we end up using the sequence multiple times.
With that, let us get our feet wet.
We provide an example that uses both a function and a generator to return odd numbers lying between 0 and n. With the regular function, getOddNumbers(), we see that the function first builds the entire list of odd numbers and then returns it. The generator function, getOddNumbersGenerator(), however does things differently. Using the yield statement, it yields the current odd numbers, saves the state, and returns. This way, it avoids the cost of building the entire list. This can be an efficient approach when we are dealing with lists that have, let us say, millions of numbers.
def getOddNumbers(n): i = 0 listOdd = [] while i < n: i += 1 if ((i % 2) != 0): listOdd.append(i) return listOdd def getOddNumbersGenerator(n): i = 0 while i < n: i += 1 if ((i % 2) != 0): yield i if __name__ == '__main__': N = 20 print("First %d odd numbers [regular function]" % (N/2)) result = getOddNumbers(N) print(result) print("\nFirst %d odd numbers [generator function]" % (N/2)) generatorFunc = getOddNumbersGenerator(N) for elem in generatorFunc: print(elem)
As expected, both approaches yield (pun intended) the same output.
First 10 odd numbers [regular function] [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] First 10 odd numbers [generator function] 1 3 5 7 9 11 13 15 17 19
So, how does a generator function do this magic? Well, generator functions are objects like other regular functions. When we add the yield statement, Python adds an iterator method to this object. Due to this, every time we run a loop on this generator object, the iterator computes the next value and returns it. We can easily see these methods by using the dir() function on a generator object. Please note that the function itself is not a generator but only when we invoke it, do we get a generator.
Let us see an example that illustrates this behavior. The example (provided below) prints methods supported by a generator object (generatorFunc) as well as the function itself (varFunc). The example calls next() method of the generator object in a loop.
def getOddNumbersGenerator(n): i = 0 while i < n: i += 1 if ((i % 2) != 0): yield i if __name__ == '__main__': N = 20 generatorFunc = getOddNumbersGenerator(N) print("Printing the Generator Object") print(generatorFunc) print(dir(generatorFunc)) while True: try: print(next(generatorFunc)) except StopIteration: print("Done! Reached End of the List") break print("\nPrinting the Function Object") varFunc = getOddNumbersGenerator print(varFunc) print(dir(varFunc))
When we run the above example, the output shows that the generator function has both __iter__() and __next__() methods. When we run a for loop over the generator function, it is these two methods that allow us to navigate over elements returned by the generator function object. To be more precise, the __iter__() methods returns a handle to the object at the beginning of the for loop and the __next__() method returns the next element of the sequence at each loop iteration. The output also shows that when we reach the end of the sequence, the generator function throws a StopIteration exception.
Also, when we simply assign the function to the varFunc variable, we do not get any generator. This is evident when we print the methods supported by varFunc -- we neither see __next__ method nor __iter__ method,
[user@codingbison]$ python3 generator.py Printing the Generator Object <generator object getOddNumbersGenerator at 0x9c35554> ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'send', 'throw'] 1 3 5 7 9 11 13 15 17 19 Done! Reached End of the List Printing the Function Object <function getOddNumbersGenerator at 0xb74b0c6c> ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] [user@codingbison]$
We have been discussing that a generator function usually deals with application properties that are sequences in nature. But, in theory, they do not have to be. A generator is free to return just a single value as well. In fact, the following example does exactly that. The getOddNumbersGenerator() function, instead of yielding a list of numbers, it yields a single value.
def getOddNumbersGenerator(n): yield None if __name__ == '__main__': N = 20 generatorFunc = getOddNumbersGenerator(N) i = 0 while i < N/2: i += 1 try: print(next(generatorFunc)) except StopIteration: print("End of the List")
When we run this program, we find that the first iteration leads to the None value. Subsequent iterations of the while loop hit the StopIteration exception, since the generator function does not return values beyond the first value.
None End of the List End of the List End of the List End of the List End of the List End of the List End of the List End of the List End of the List
In one of our earlier examples, we saw that the output of the generator object supports several methods. Two of them are send() and close(). The send() method allows us to communicate a value to the generator and then update the internal state of the generator. The close() method allows us to close the generator object -- accessing this object beyond this call would lead to a StopIteration exception.
Accordingly, our next example illustrates the usage of these two methods. For that, we modify our getOddNumbersGenerator() function to accept sent values. To do this, we save the value of the yield output in a variable. To see if the generator function has a value communicated from the caller, we look at the value being returned by the yield statement. If the value is not None, then that means that the caller of the generator object has passed a value. If that is the case, then we reset the current counter to the passed value and thereby, skips odd numbers that are less than the passed value. We should also note that the send() call not only sends a value to the generator, but it also returns the next value from the generator object. In the end, we call close() to close the generator object.
def getOddNumbersGenerator(n): i = 0 while i < n: i += 1 if ((i % 2) != 0): currVal = (yield i) if currVal != None: #Caller has passed a value, reset the counter. print("Passed Value " + str(currVal)) i = currVal if __name__ == '__main__': N = 20 generatorFunc = getOddNumbersGenerator(N) print("Print elements of the Generator List") i = 0 while i < N/2: print(next(generatorFunc)) i += 1 print("\nPrint elements of the Generator List (send case)") generatorFunc = getOddNumbersGenerator(N) print(next(generatorFunc)) print(next(generatorFunc)) print(generatorFunc.send(7)) print(next(generatorFunc)) print(generatorFunc.send(13)) print(next(generatorFunc)) generatorFunc.close() #Close the generator object. try: print(next(generatorFunc)) except StopIteration: print("End of Generator List")
The output shows that if we do not send any values, then the generator prints the odd values as expected. However, when we send a value, then the getOddNumbersGenerator() function resets the counter. Also, once we close the generator, trying to call next() on the object, leads to StopIteration exception.
Print elements of the Generator List 1 3 5 7 9 11 13 15 17 19 Print elements of the Generator List (send case) 1 3 Passed Value 7 9 11 Passed Value 13 15 17 End of Generator List
In addition to using the yield statement, one can also build generators in a style that is similar to those of list comprehensions. For that, we can build a generator like a list comprehension, but with one difference. The expression should be kept in parenthesis, instead of, square brackets. Such expressions are also known as generator expressions.
List comprehensions typically have the following structure: "[property_of_elem for elem in S]", where given an element, elem, of the sequence, S, we provide some property of element, property_of_elem. We can also add an if statement towards the end, which then acts as a filter. As stated earlier, when using comprehension to build generators, all we need to do is to replace the square brackets with parenthesis. Thus, while "[property_of_elem for elem in S]" would give us a list, "(property_of_elem for elem in S)" would actually give us a generator.
Here is our last example that generates the same set of odd numbers, using both list comprehension and generator expression.
N = 20 #Build a list comprehension. objList = [x for x in range(N) if x % 2 != 0] print(type(objList)) print("Print elements of the List:") for elem in objList: print(elem) #Build a generator expression. objGenerator = (x for x in range(N) if x % 2 != 0) print(type(objGenerator)) print("Print elements of the Generator:") for elem in objGenerator: print(elem)
The output confirms that the second approach actually gives us a generator instead of a regular list.
<class 'list'> Print elements of the List: 1 3 5 7 9 11 13 15 17 19 <class 'generator'> Print elements of the Generator: 1 3 5 7 9 11 13 15 17 19