6

Comprehensions and Generators

Many programs are built around processing lists, dictionary key/value pairs, and sets. Python provides a special syntax, called comprehensions, for succinctly iterating through these types and creating derivative data structures. Comprehensions can significantly increase the readability of code performing these common tasks and provide a number of other benefits.

This style of processing is extended to functions with generators, which enable a stream of values to be incrementally returned by a function. The result of a call to a generator function can be used anywhere an iterator is appropriate (e.g., for loops, starred unpacking expressions). Generators can improve performance, reduce memory usage, increase readability, and simplify implementations.

Item 40: Use Comprehensions Instead of map and filter

Python provides compact syntax for deriving a new list from another sequence or iterable. These expressions are called list comprehensions. For example, say that I want to compute the square of each number in a list. Here, I do this by using a simple for loop:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = []
for x in a:
    squares.append(x**2)
print(squares)

>>>
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

With a list comprehension, I can achieve the same outcome in a single line by specifying the expression for my computation along with the input sequence variable to loop over:

squares = [x**2 for x in a]  # List comprehension
print(squares)

>>>
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Unless you’re applying a single-argument function, list comprehensions are clearer than the map built-in function for simple cases. map requires the creation of a lambda function for the computation (see Item 39: “Prefer functools.partial over lambda Expressions for Glue Functions”), which is visually noisy in comparison:

alt = map(lambda x: x**2, a)

Unlike map, list comprehensions let you easily filter items from the input list to remove corresponding outputs from the result. For example, say that I want to compute the squares of the numbers that are divisible by 2. Here, I do this by adding an if clause to the list comprehension after the loop:

even_squares = [x**2 for x in a if x % 2 == 0]
print(even_squares)

>>>
[4, 16, 36, 64, 100]

The filter built-in function can be used along with map to achieve the same outcome, but it is much harder to read due to nesting and boilerplate:

alt = map(lambda x: x**2, filter(lambda x: x % 2 == 0, a))
assert even_squares == list(alt)

Dictionaries and sets have their own equivalents of list comprehensions (called dictionary comprehensions and set comprehensions, respectively). These make it easy to create other types of derivative data structures when writing algorithms:

even_squares_dict = {x: x**2 for x in a if x % 2 == 0}
threes_cubed_set = {x**3 for x in a if x % 3 == 0}
print(even_squares_dict)
print(threes_cubed_set)

>>>
{2: 4, 4: 16, 6: 36, 8: 64, 10: 100}
{216, 729, 27}

Achieving the same outcome is possible with map and filter if you wrap each call with a corresponding constructor. These statements get so long that you have to break them up across multiple lines, which is even noisier and should be avoided:

alt_dict = dict(
    map(
        lambda x: (x, x**2),
        filter(lambda x: x % 2 == 0, a),
    )
)
alt_set = set(
    map(
        lambda x: x**3,
        filter(lambda x: x % 3 == 0, a),
    )
)

However, one benefit of the map and filter built-in functions is that they return iterators that incrementally produce one result at a time. This enables these functions to be composed together efficiently with minimal memory usage (see Item 43: “Consider Generators Instead of Returning Lists” and Item 24: “Consider itertools for Working with Iterators and Generators” for background). List comprehensions, in contrast, materialize the entire result upon evaluation, which consumes much more memory. Luckily, Python also provides a syntax that’s very similar to list comprehensions that can create infinitely long, memory-efficient streams of values (see Item 44: “Consider Generator Expressions for Large List Comprehensions”).

Things to Remember

  • Images List comprehensions are clearer than the map and filter built-in functions because they don’t require lambda expressions.

  • Images List comprehensions allow you to easily skip items from the input list by using if clauses, a behavior that map doesn’t support without help from filter.

  • Images Dictionaries and sets may also be created using comprehensions.

  • Images List comprehensions materialize the full result when evaluated, which can use a significant amount of memory compared to an iterator that produces each output incrementally.

Item 41: Avoid More Than Two Control Subexpressions in Comprehensions

Beyond basic usage (see Item 40: “Use Comprehensions Instead of map and filter”), comprehensions also support multiple levels of looping. For example, say that I want to simplify a matrix (a list containing other list instances) into one flat list of all items. Here, I do this with a list comprehension by including two for subexpressions. These subexpressions run in the order provided, from left to right:

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
flat = [x for row in matrix for x in row]
print(flat)

>>>
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This example is simple, readable, and a reasonable usage of multiple loops in a comprehension. Another reasonable usage of multiple loops involves replicating the two-level-deep layout of the input list. For example, say that I want to square the value in each cell of a two-dimensional matrix. This comprehension is noisier because of the extra [] characters, but it’s still relatively easy to read:

squared = [[x**2 for x in row] for row in matrix]
print(squared)

>>>
[[1, 4, 9], [16, 25, 36], [49, 64, 81]]

If this comprehension included another loop, it would get so long that I’d have to split it over multiple lines:

my_lists = [
    [[1, 2, 3], [4, 5, 6]],
    ...
]
flat = [x for sublist1 in my_lists
        for sublist2 in sublist1
        for x in sublist2]

At this point, the multiline comprehension isn’t much shorter than the alternative. Here, I produce the same result using normal loop statements. The indentation of this version makes the looping clearer than the three-level-list comprehension above:

flat = []
for sublist1 in my_lists:
    for sublist2 in sublist1:
        flat.extend(sublist2)

Comprehensions support multiple if conditions. Multiple conditions at the same loop level have an implicit and expression. For example, say that I want to filter a list of numbers to only even values greater than 4. These two list comprehensions are equivalent:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [x for x in a if x > 4 if x % 2 == 0]
c = [x for x in a if x > 4 and x % 2 == 0]

Conditions can be specified at each level of looping after the for subexpression. For example, say that I want to filter a matrix so the only cells remaining are those divisible by 4 in rows that sum to 10 or higher. Expressing this with a list comprehension does not require a lot of code, but it is extremely difficult to read:

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
]
filtered = [[x for x in row if x % 4 == 0]
            for row in matrix if sum(row) >= 10]
print(filtered)

>>>
[[4], [8]]

Although this example is a bit convoluted, in practice you’ll see situations arise where such comprehensions seem like a good fit. I strongly encourage you to avoid using list, dictionary, or set comprehensions that look like this. The resulting code is very difficult for new readers to understand. The potential for confusion is especially great with a dictionary comprehension since it already needs an extra parameter to represent both the key and the value for each item.

The rule of thumb is to avoid using more than two control subexpressions in a comprehension. This could be two conditions, two loops, or one condition and one loop. As soon as it gets more complicated than that, you should use normal if and for statements and write a helper function (see Item 43: “Consider Generators Instead of Returning Lists”).

Things to Remember

  • Images Comprehensions support multiple levels of loops and multiple conditions per loop level.

  • Images Comprehensions with more than two control subexpressions are very difficult to read and should be avoided.

Item 42: Reduce Repetition in Comprehensions with Assignment Expressions

A common pattern with comprehensions—including list, dictionary, and set variants—is the need to reference the same computation in multiple places. For example, say that I’m writing a program to manage orders for a fastener company. As new orders come in from customers, I need to be able to tell them whether or not I can fulfill their orders. Concretely, imagine that I need to verify that a request is sufficiently in stock and above the minimum threshold for shipping (e.g., in batches of 8), like this:

stock = {
    "nails": 125,
    "screws": 35,
    "wingnuts": 8,
    "washers": 24,
}

order = ["screws", "wingnuts", "clips"]

def get_batches(count, size):
    return count // size

result = {}
for name in order:
    count = stock.get(name, 0)
    batches = get_batches(count, 8)
    if batches:
        result[name] = batches

print(result)

>>>
{'screws': 4, 'wingnuts': 1}

Here, I implement this looping logic more succinctly by using a dictionary comprehension (see Item 40: “Use Comprehensions Instead of map and filter” for best practices):

found = {name: get_batches(stock.get(name, 0), 8)
         for name in order
         if get_batches(stock.get(name, 0), 8)}
print(found)

>>>
{'screws': 4, 'wingnuts': 1}

Although this code is more compact, the problem with it is that the get_batches(stock.get(name, 0), 8) expression is repeated. This hurts readability by adding visual noise and is technically unnecessary. The duplication also increases the likelihood of introducing a bug if the two expressions aren’t kept in sync. For example, here I’ve changed the first get_batches call to have 4 as its second parameter instead of 8, which causes the results to be different:

has_bug = {name: get_batches(stock.get(name, 0), 4)  # Wrong
           for name in order
           if get_batches(stock.get(name, 0), 8)}

print("Expected:", found)
print("Found:   ", has_bug)

>>>
Expected: {'screws': 4, 'wingnuts': 1}
Found:    {'screws': 8, 'wingnuts': 2}

An easy solution to these problems is to use an assignment expression—often called the walrus operator—as part of the comprehension (see Item 8: “Prevent Repetition with Assignment Expressions” for background):

found = {name: batches for name in order
         if (batches := get_batches(stock.get(name, 0), 8))}

The assignment expression (batches := get_batches(...)) allows me to look up the value for each order key in the stock dictionary a single time, call get_batches once, and then store its corresponding value in the batches variable. I can then reference that variable elsewhere in the comprehension to construct the dictionary’s contents instead of having to call get_batches a second time. Eliminating the redundant calls to get and get_batches may also improve performance by avoiding unnecessary computations for each item in order.

It’s valid syntax to define an assignment expression in the value expression for a comprehension. But if you try to reference the variable it defines (tenth) in other parts of the comprehension, you might get an exception at runtime because of the order in which comprehensions are evaluated:

result = {name: (tenth := count // 10)
          for name, count in stock.items() if tenth > 0}

>>>
Traceback ...
NameError: name 'tenth' is not defined

You can fix this example by moving the assignment expression into the condition and then referencing the variable name it defined (tenth) in the comprehension’s value expression:

result = {name: tenth for name, count in stock.items()
          if (tenth := count // 10) > 0}
print(result)

>>>
{'nails': 12, 'screws': 3, 'washers': 2}

When a comprehension uses walrus operators, any corresponding variable names will be leaked into the containing scope (see Item 33: “Know How Closures Interact with Variable Scope and nonlocal” for background):

half = [(squared := last**2)
        for count in stock.values()
        if (last := count // 2) > 10]
print(f"Last item of {half} is {last} ** 2 = {squared}")

>>>
Last item of [3844, 289, 144] is 12 ** 2 = 144

The leakage of these variable names is similar to what happens with a normal for loop:

for count in stock.values():
    last = count // 2
    squared = last**2

print(f"{count} // 2 = {last}; {last} ** 2 = {squared}")

>>>
24 // 2 = 12; 12 ** 2 = 144

However, this leakage behavior can be surprising because when comprehensions don’t use assignment expressions, the loop variable names won’t leak like that (see Item 20: “Never Use for Loop Variables After the Loop Ends” and Item 84: “Beware of Exception Variables Disappearing” for more background):

half = [count // 2 for count in stock.values()]
print(half)   # Works
print(count)  # Exception because loop variable didn't leak

>>>
[62, 17, 4, 12]
Traceback ...
NameError: name 'count' is not defined

Using an assignment expression also works the same way in generator expressions (see Item 44: “Consider Generator Expressions for Large List Comprehensions”). Here, I create an iterator of pairs containing the item name and the current count in stock instead of a dict instance:

found = ((name, batches) for name in order
         if (batches := get_batches(stock.get(name, 0), 8)))
print(next(found))
print(next(found))

>>>
('screws', 4)
('wingnuts', 1)

Things to Remember

  • Images Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance.

  • Images Although it’s possible to use an assignment expression outside of a comprehension or generator expression’s condition, you should avoid doing so because it doesn’t work reliably.

  • Images In comprehensions, variables from assignment expressions will leak into the enclosing scope; in contrast, comprehension loop variables don’t leak.

Item 43: Consider Generators Instead of Returning Lists

The simplest choice for a function that produces a sequence of results is to return a list of items. For example, say that I want to find the index of every word in a string. Here, I accumulate results in a list by using the append method and return it at the end of the function:

def index_words(text):
    result = []
    if text:
        result.append(0)
    for index, letter in enumerate(text):
        if letter == " ":
            result.append(index + 1)
    return result

This works as expected for some sample input:

address = "Four score and seven years ago..."
result = index_words(address)
print(result[:10])

>>>
[0, 5, 11, 15, 21, 27, 31, 35, 43, 51]

There are two problems with the index_words function above.

The first problem is that the code is a bit dense and noisy. Each time a new result is found, I call the append method. The method call’s bulk (result.append) deemphasizes the value being added to the list (index + 1). There is one line for creating the result list and another for returning it. While the function body contains around 130 characters (without whitespace), only around 75 characters are important.

A better way to write this function is by using a generator, which is a function that uses yield expressions to incrementally produce outputs. Here, I define a generator version of the function that achieves the same result as before:

def index_words_iter(text):
    if text:
        yield 0
    for index, letter in enumerate(text):
        if letter == " ":
            yield index + 1

When called, a generator function does not actually run but instead immediately returns an iterator. With each call to the next built-in function, the iterator advances the generator to its next yield expression. Each value passed to yield by the generator is returned by the iterator to the caller of next:

it = index_words_iter(address)
print(next(it))
print(next(it))

>>>
0
5

The index_words_iter function is significantly easier to read because all interactions with the result list have been eliminated. Results are passed to yield expressions instead. I can easily convert the iterator returned by the generator to a list by passing it to the list built-in function if necessary (see Item 44: “Consider Generator Expressions for Large List Comprehensions” for how this works):

result = list(index_words_iter(address))
print(result[:10])

>>>
[0, 5, 11, 15, 21, 27, 31, 35, 43, 51]

The second problem with index_words is that it requires all results to be stored in the list before being returned. For huge inputs, this can cause a program to run out of memory and crash.

In contrast, a generator version of this function can easily be adapted to take inputs of arbitrary length due to its bounded memory requirements. For example, here I define a generator that streams input from a file one line at a time and yields outputs one word at a time:

def index_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == " ":
                yield offset

The working memory for this function is limited to the maximum length of one line of input instead of the entire input file’s contents. Here, I show that running the generator on a file input produces the same results (see Item 24: “Consider itertools for Working with Iterators and Generators” for more about the islice function):

with open("address.txt", "r") as f:
    it = index_file(f)
    results = itertools.islice(it, 0, 10)
    print(list(results))

>>>
[0, 5, 11, 15, 21, 27, 31, 35, 43, 51]

The only gotcha with defining generators like this is that the callers must be aware that the iterators returned are stateful and can’t be reused (see Item 21: “Be Defensive when Iterating over Arguments”).

Things to Remember

  • Images Using generators can be clearer than the alternative of having a function return a list of accumulated results.

  • Images The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body.

  • Images A generator can produce a sequence of outputs for arbitrarily large inputs because its working memory doesn’t include a materialization of all prior inputs and outputs.

Item 44: Consider Generator Expressions for Large List Comprehensions

One problem with list comprehensions (see Item 40: “Use Comprehensions Instead of map and filter”) is that they create new list instances that may potentially contain an item for each value in their input sequences. This is fine for small inputs, but for large inputs, this behavior might consume significant amounts of memory and cause a program to crash.

For example, say that I want to read a file and return the number of characters on each line. Here, I use a list comprehension to implement this logic:

value = [len(x) for x in open("my_file.txt")]
print(value)

>>>
[100, 57, 15, 1, 12, 75, 5, 86, 89, 11]

This code requires holding the length of every line of the file in memory. If the file is absolutely enormous or perhaps a never-ending network socket, it won’t work. To solve this issue, Python provides generator expressions, which build on the syntax of list comprehensions and the behavior of generators. Generator expressions don’t materialize the whole output sequence when they’re run. Instead, generator expressions evaluate to an iterator that yields one item at a time from the expression.

You create a generator expression by putting list-comprehension-like syntax between () characters. Here, I use a generator expression that is equivalent to the code above. However, the generator expression immediately evaluates to an iterator, doesn’t make any forward progress, and has little memory overhead:

it = (len(x) for x in open("my_file.txt"))
print(it)

>>>
<generator object <genexpr> at 0x104f37510>

The returned iterator can be advanced one step at a time to produce the next output from the generator expression, as needed (using the next built-in function). I can consume as much of the generator expression as I want without risking a blowup in memory usage:

print(next(it))
print(next(it))

>>>
100
57

Another powerful outcome of generator expressions is that they can be composed together. Here, I take the iterator returned by the generator expression above and use it as the input for another generator expression:

roots = ((x, x**0.5) for x in it)

Each time I advance this iterator, it also advances the interior iterator, creating a domino effect of looping, evaluating expressions, and passing around inputs and outputs, all while being as memory efficient as possible:

print(next(roots))

>>>
(15, 3.872983346207417)

Code that chains generators together like this executes very quickly in Python. When you’re looking for a way to compose functionality that’s operating on a large stream of input, generator expressions are the best tool for the job (see Item 23: “Pass Iterators to any and all for Efficient Short-Circuiting Logic” and Item 24: “Consider itertools for Working with Iterators and Generators” for more examples). The only gotcha is that the iterators returned by generator expressions are stateful, so you must be careful not to use such an iterator more than once (see Item 21: “Be Defensive when Iterating over Arguments”).

Things to Remember

  • Images List comprehensions can cause problems for large inputs by using too much memory.

  • Images Generator expressions avoid memory issues by producing outputs one at a time as iterators.

  • Images Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another.

  • Images Generator expressions execute very quickly when chained together and are memory efficient.

Item 45: Compose Multiple Generators with yield from

Generators provide a variety of benefits (see Item 43: “Consider Generators Instead of Returning Lists”) and solutions to common problems (see Item 21: “Be Defensive when Iterating over Arguments”). Generators are so useful that many programs start to look like layers of generators strung together.

For example, say that I have a graphical program that’s using generators to animate the movement of images onscreen. To get the visual effect I’m looking for, I need the images to move quickly at first, pause temporarily, and then continue moving at a slower pace. Here, I define two generators that yield the expected onscreen deltas for each part of this animation:

def move(period, speed):
    for _ in range(period):
        yield speed

def pause(delay):
    for _ in range(delay):
        yield 0

To create the final animation, I need to combine move and pause together to produce a single sequence of onscreen deltas. Here, I do this by calling a generator for each step of the animation, iterating over each generator in turn, and then yielding the deltas from all of them in sequence:

def animate():
    for delta in move(4, 5.0):
        yield delta
    for delta in pause(3):
        yield delta
    for delta in move(2, 3.0):
        yield delta

Now, I can render those deltas onscreen as they’re produced by the single animation generator:

def render(delta):
    print(f"Delta: {delta:.1f}")
    # Move the images onscreen
    ...

def run(func):
    for delta in func():
        render(delta)

run(animate)

>>>
Delta: 5.0
Delta: 5.0
Delta: 5.0
Delta: 5.0
Delta: 0.0
Delta: 0.0
Delta: 0.0
Delta: 3.0
Delta: 3.0

The problem with this code is the repetitive nature of the animate function. The redundancy of the for statements and yield expressions for each generator adds noise and reduces readability. This example includes only three nested generators, and it’s already hurting clarity; a complex animation with a dozen phases or more would be extremely difficult to follow.

The solution to this problem is to use the yield from expression. This advanced generator feature allows you to yield all values from a nested generator before returning control to the parent generator. Here, I reimplement the animation function by using yield from:

def animate_composed():
    yield from move(4, 5.0)
    yield from pause(3)
    yield from move(2, 3.0)

run(animate_composed)

>>>
Delta: 5.0
Delta: 5.0
Delta: 5.0
Delta: 5.0
Delta: 0.0
Delta: 0.0
Delta: 0.0
Delta: 3.0
Delta: 3.0

The result is the same as before, but now the code is clearer and more intuitive. yield from essentially instructs the Python interpreter to do the nested for loop and yield expressions for you, resulting in slightly faster execution as well. If you find yourself composing generators, I strongly encourage you to use yield from when possible.

Things to Remember

  • Images The yield from expression allows you to compose multiple nested generators together into a single combined generator.

  • Images yield from eliminates the boilerplate required for manually iterating nested generators and yielding their outputs.

Item 46: Pass Iterators into Generators as Arguments Instead of Calling the send Method

yield expressions provide generator functions with a simple way to produce an iterable series of output values (see Item 43: “Consider Generators Instead of Returning Lists”). However, this channel appears to be unidirectional: There’s no immediately obvious way to simultaneously stream data in and out of a generator as it runs. Having such bidirectional communication could be valuable in a variety of situations.

For example, say that I’m writing a program to transmit signals using a software-defined radio. Here, I use a function to generate an approximation of a sine wave with a given number of points:

import math

def wave(amplitude, steps):
    step_size = 2 * math.pi / steps
    for step in range(steps):
        radians = step * step_size
        fraction = math.sin(radians)
        output = amplitude * fraction
        yield output

Now, I can transmit the wave signal at a single specified amplitude by iterating over the wave generator:

def transmit(output):
    if output is None:
        print(f"Output is None")
    else:
        print(f"Output: {output:>5.1f}")

def run(it):
    for output in it:
        transmit(output)

run(wave(3.0, 8))

>>>
Output:   0.0
Output:   2.1
Output:   3.0
Output:   2.1
Output:   0.0
Output:  -2.1
Output:  -3.0
Output:  -2.1

This works fine for producing basic waveforms, but it can’t be used to constantly vary the amplitude of the wave based on a separate input (i.e., as required to broadcast AM radio signals). I need a way to modulate the amplitude on each iteration of the generator.

Python generators support the send method, which upgrades yield expressions into a two-way channel. The send method can be used to provide streaming inputs to a generator at the same time it’s yielding outputs. Normally, when iterating a generator, the value of the yield expression is None:

def my_generator():
    received = yield 1
    print(f"{received=}")

it = my_generator()
output = next(it)  # Get first generator output
print(f"{output=}")

try:
    next(it)       # Run generator until it exits
except StopIteration:
    pass

>>>
output=1
received=None

When I call the send method instead of iterating the generator with a for loop or the next built-in function, the supplied parameter becomes the value of the yield expression when the generator is resumed. However, when the generator first starts, a yield expression has not been encountered yet, so the only valid value for calling send initially is None. (Any other argument would raise an exception at runtime.) Here, I run the same generators as above but using send instead of next to progress them forward:

it = my_generator()
output = it.send(None)  # Get first generator output
print(f"{output=}")

try:
    it.send("hello!")   # Send value into the generator
except StopIteration:
    pass

>>>
output=1
received='hello!'

I can take advantage of this behavior in order to modulate the amplitude of the sine wave based on an input signal. First, I need to change the wave generator to save the amplitude returned by the yield expression and use it to calculate the next generated output:

def wave_modulating(steps):
    step_size = 2 * math.pi / steps
    amplitude = yield              # Receive initial amplitude
    for step in range(steps):
        radians = step * step_size
        fraction = math.sin(radians)
        output = amplitude * fraction
        amplitude = yield output   # Receive next amplitude

Then, I need to update the run function to stream the modulating amplitude into the wave_modulating generator on each iteration. The first input to send must be None since a yield expression would not have occurred within the generator yet:

def run_modulating(it):
    amplitudes = [None, 7, 7, 7, 2, 2, 2, 2, 10, 10, 10, 10, 10]
    for amplitude in amplitudes:
        output = it.send(amplitude)
        transmit(output)

run_modulating(wave_modulating(12))

>>>
Output is None
Output:   0.0
Output:   3.5
Output:   6.1
Output:   2.0
Output:   1.7
Output:   1.0
Output:   0.0
Output:  -5.0
Output:  -8.7
Output: -10.0
Output:  -8.7
Output:  -5.0

This works; it properly varies the output amplitude based on the input signal. The first output is None, as expected, because a value for amplitude wasn’t received by the generator until after the initial yield expression.

One problem with this code is that it’s difficult for new readers to understand: Using yield on the right side of an assignment statement isn’t intuitive, and it’s hard to see the connection between yield and send without already knowing the details of this advanced generator feature.

Now, imagine that the program’s requirements get more complicated. Instead of using a simple sine wave as my carrier, I need to use a complex waveform consisting of multiple signals in sequence. One way to implement this behavior is by composing multiple generators together with the yield from expression (see Item 45: “Compose Multiple Generators with yield from”). Here, I confirm that this works as expected in the simpler case where the amplitude is fixed:

def complex_wave():
    yield from wave(7.0, 3)
    yield from wave(2.0, 4)
    yield from wave(10.0, 5)

run(complex_wave())

>>>
Output:   0.0
Output:   6.1
Output:  -6.1
Output:   0.0
Output:   2.0
Output:   0.0
Output:  -2.0
Output:   0.0
Output:   9.5
Output:   5.9
Output:  -5.9
Output:  -9.5

Given that the yield from expression handles the simpler case, you may expect it to also work properly along with the generator send method. Here, I try to use yield from to compose multiple calls to the wave_modulating generator (that uses send):

def complex_wave_modulating():
    yield from wave_modulating(3)
    yield from wave_modulating(4)
    yield from wave_modulating(5)

run_modulating(complex_wave_modulating())

>>>
Output is None
Output:   0.0
Output:   6.1
Output:  -6.1
Output is None
Output:   0.0
Output:   2.0
Output:   0.0
Output: -10.0
Output is None
Output:   0.0
Output:   9.5
Output:   5.9

This works to some extent, but the result contains a big surprise: There are many None values in the output! Why does this happen? When each yield from expression finishes iterating over a nested generator, it moves on to the next one. Each nested generator starts with a bare yield expression—one without a value—in order to receive the initial amplitude from a generator send method call. This causes the parent generator to output a None value when it transitions between child generators.

This means that your assumptions about how the yield from and send features behave individually will be broken if you try to use them together. Although it’s possible to work around this None problem by increasing the complexity of the run_modulating function, it’s not worth the trouble. It’s already difficult for new readers of the code to understand how send works. This surprising gotcha with yield from makes it even worse. My advice is to avoid the send method entirely and go with a simpler approach.

The easiest solution is to pass an iterator into the wave function. The iterator should return an input amplitude each time the next built-in function is called on it. This arrangement ensures that each generator is progressed in a cascade as inputs and outputs are processed (see Item 44: “Consider Generator Expressions for Large List Comprehensions” and Item 23: “Pass Iterators to any and all for Efficient Short-Circuiting Logic” for other examples):

def wave_cascading(amplitude_it, steps):
    step_size = 2 * math.pi / steps
    for step in range(steps):
        radians = step * step_size
        fraction = math.sin(radians)
        amplitude = next(amplitude_it)  # Get next input
        output = amplitude * fraction
        yield output

I can pass the same iterator into each of the generator functions that I’m trying to compose together with yield from. Iterators are stateful, and thus each of the nested generators picks up where the previous generator left off (see Item 21: “Be Defensive when Iterating over Arguments” for background):

def complex_wave_cascading(amplitude_it):
    yield from wave_cascading(amplitude_it, 3)
    yield from wave_cascading(amplitude_it, 4)
    yield from wave_cascading(amplitude_it, 5)

Now, I can run the composed generator by simply passing in an iterator from the amplitudes list:

def run_cascading():
    amplitudes = [7, 7, 7, 2, 2, 2, 2, 10, 10, 10, 10, 10]
    it = complex_wave_cascading(iter(amplitudes))  # Iterator
    for amplitude in amplitudes:
        output = next(it)
        transmit(output)

run_cascading()

>>>
Output:   0.0
Output:   6.1
Output:  -6.1
Output:   0.0
Output:   2.0
Output:   0.0
Output:  -2.0
Output:   0.0
Output:   9.5
Output:   5.9
Output:  -5.9
Output:  -9.5

The best part about this approach is that the input iterator can come from anywhere and could be completely dynamic (e.g., implemented using a generator function or composed; see Item 24: “Consider itertools for Working with Iterators and Generators”). The only downside is that this code assumes that the input generator is thread safe, which may not be the case. If you need to cross thread boundaries, async functions may be a better fit (see Item 77: “Mix Threads and Coroutines to Ease the Transition to asyncio”).

Things to Remember

  • Images The send method can be used to inject data into a generator by giving the yield expression a value that can be assigned to a variable.

  • Images Using send with yield from expressions may cause surprising behavior, such as None values appearing at unexpected times in the generator output.

  • Images Providing an input iterator to a set of composed generators is a better approach than using the send method, which should be avoided.

Item 47: Manage Iterative State Transitions with a Class Instead of the Generator throw Method

In addition to yield from expressions (see Item 45: “Compose Multiple Generators with yield from”) and the send method (see Item 46: “Pass Iterators into Generators as Arguments Instead of Calling the send Method”), another advanced generator feature is the throw method for re-raising Exception instances within generator functions. The way throw works is simple: When the method is called, the next occurrence of a yield expression inside the generator re-raises the provided Exception instance after its output is received instead of continuing normally. Here, I show a simple example of this behavior in action:

class MyError(Exception):
    pass

def my_generator():
    yield 1
    yield 2
    yield 3

it = my_generator()
print(next(it))                         # Yields 1
print(next(it))                         # Yields 2
print(it.throw(MyError("test error")))  # Raises

>>>
1
2
Traceback ...
MyError: test error

When you call throw, the generator function may catch the injected exception with a standard try/except compound statement that surrounds the last yield expression that was executed (see Item 80: “Take Advantage of Each Block in try/except/else/finally” for more about exception handling):

def my_generator():
    yield 1

    try:
        yield 2
    except MyError:
        print("Got MyError!")
    else:
        yield 3

    yield 4

it = my_generator()
print(next(it))                         # Yields 1
print(next(it))                         # Yields 2
print(it.throw(MyError("test error")))  # Yields 4

>>>
1
2
Got MyError!
4

This functionality provides a two-way communication channel between a generator and its caller that can be useful in certain situations. For example, imagine that I need a timer program that supports sporadic resets. Here, I implement this behavior by defining a generator that relies on Reset exceptions to be raised when the yield expression is evaluated:

class Reset(Exception):
    pass

def timer(period):
    current = period
    while current:
        try:
            yield current
        except Reset:
            print("Resetting")
            current = period
        else:
            current -= 1

Whenever the throw method is called on the generator with a Reset exception, the counter is restarted in the except block. Here, I define a driver function that iterates the timer generator, announces progress at each step, and injects reset events that might be caused by an externally polled input (such as a button):

def check_for_reset():
    # Poll for external event
    ...

def announce(remaining):
    print(f"{remaining} ticks remaining")

def run():
    it = timer(4)
    while True:
        try:
            if check_for_reset():
                current = it.throw(Reset())
            else:
                current = next(it)
        except StopIteration:
            break
        else:
            announce(current)

run()

>>>
4 ticks remaining
3 ticks remaining
2 ticks remaining
Resetting
4 ticks remaining
3 ticks remaining
Resetting
4 ticks remaining
3 ticks remaining
2 ticks remaining
1 ticks remaining

This code works as expected, but it’s much harder to read than necessary. The various levels of nesting required to catch StopIteration exceptions or decide which function to call make the code noisy.

A simpler approach to implementing this functionality is to create a basic class to manage the timer’s state and enable state transitions. Here, I define a class with a tick method to step the timer, a reset method to restart the clock, and the __bool__ special method to check whether the timer has elapsed (see Item 57: “Inherit from collections.abc Classes for Custom Container Types” for background):

class Timer:
    def __init__(self, period):
        self.current = period
        self.period = period

    def reset(self):
        print("Resetting")
        self.current = self.period

    def tick(self):
        before = self.current
        self.current -= 1
        return before

    def __bool__(self):
        return self.current > 0

Now, the run method can use the Timer object as the test expression in the while statement; the code in the loop body is much easier to follow because of the reduction in the levels of nesting:

def run():
    timer = Timer(4)
    while timer:
        if check_for_reset():
            timer.reset()

        announce(timer.tick())

run()

>>>
4 ticks remaining
3 ticks remaining
2 ticks remaining
Resetting
4 ticks remaining
3 ticks remaining
Resetting
4 ticks remaining
3 ticks remaining
2 ticks remaining
1 ticks remaining

The output matches the earlier version using throw, but this implementation is much easier to understand, especially for new readers of the code. I suggest that you avoid using throw entirely and instead use a stateful class if you need this type of exceptional behavior (see Item 89: “Always Pass Resources into Generators and Have Callers Clean Them Up Outside” for another reason). Otherwise, if you really need more advanced cooperation between generator-like functions, it’s worth considering Python’s asynchronous features (see Item 75: “Achieve Highly Concurrent I/O with Coroutines”).

Things to Remember

  • Images The throw method can be used to re-raise an exception within a generator at the position of the most recently executed yield expression.

  • Images Using throw harms readability because it requires additional nesting and boilerplate in order to raise and catch exceptions.

  • Images A better approach is to simply define a stateful class that provides methods for iteration and state transitions.