Effective Python: 125 Specific Ways to Write Better Python, 3rd Edition

3

Loops and Iterators

Programs often need to process through sequential data—of fixed or dynamic length. As a primarily imperative programming language, Python makes it easy to implement sequential processing using loops. This is the general pattern: On each pass through a loop, read data—stored in variables, lists, dictionaries, and so on—and carry out corresponding state modifications or I/O operations. Loops in Python feel natural and capable for the most common tasks involving built-in data types, container types, and user-defined classes.

Python also supports iterators, which enable a more functional-style approach to processing arbitrary streams of data. Instead of directly interacting with how the sequential data is stored, you can use iterators, which provide a common abstraction that hides the details. Iterators can make programs more efficient, easier to refactor, and capable of handling arbitrarily sized data. Python also includes functionality to compose iterators together and fully customize their behavior by using generators (see more in Chapter 6, “Comprehensions and Generators”).

Item 17: Prefer `enumerate` over `range`

The range built-in function is useful for loops that iterate over sequences of integers. For example, here I generate a 32-bit random number by flipping a coin for each bit position:

Click here to view code image

from random import randint

random_bits = 0
for i in range(32):
    if randint(0, 1):
        random_bits |= 1 << i
print(bin(random_bits))

>>>
0b11101000100100000111000010000001

When you have a data structure to iterate over, such as a list of strings, you can loop directly over the sequence:

Click here to view code image

flavor_list = ["vanilla", "chocolate", "pecan", "strawberry"]
for flavor in flavor_list:
    print(f"{flavor} is delicious")

>>>
vanilla is delicious
chocolate is delicious
pecan is delicious
strawberry is delicious

Often, you’ll want to iterate over a list and also know the index of the current item in the list. For example, say that I want to print the ranking of my favorite ice cream flavors. One way to do this is by using range to generate an offset for each position in the list:

Click here to view code image

for i in range(len(flavor_list)):
    flavor = flavor_list[i]
    print(f"{i + 1}: {flavor}")

>>>
1: vanilla
2: chocolate
3: pecan
4: strawberry

This looks clumsy compared with the other examples of a for statement over flavor_list or range. I have to get the length of the list. I have to index into the array. The multiple steps make it harder to read.

Python provides the enumerate built-in function to simplify this situation. enumerate wraps any iterator with a lazy generator (see Item 43: “Consider Generators Instead of Returning Lists”). enumerate yields pairs of the loop index and the next value from the given iterator. Here, I manually advance the returned iterator with the next built-in function to demonstrate what it does:

it = enumerate(flavor_list)
print(next(it))
print(next(it))

>>>
(0, 'vanilla')
(1, 'chocolate')

Each pair yielded by enumerate can be succinctly unpacked in a for statement (see Item 5: “Prefer Multiple-Assignment Unpacking over Indexing” for how that works). The resulting code is much clearer:

Click here to view code image

for i, flavor in enumerate(flavor_list):
    print(f"{i + 1}: {flavor}")

>>>
1: vanilla
2: chocolate
3: pecan
4: strawberry

I can make this even shorter by specifying the number for enumerate to use to begin counting (1 in this case) as the second parameter:

Click here to view code image

for i, flavor in enumerate(flavor_list, 1):
    print(f"{i}: {flavor}")

Things to Remember

enumerate provides concise syntax for looping over an iterator and getting the index of each item from the iterator as you go.
Prefer enumerate instead of looping over a range and indexing into a sequence.
You can supply a second, optional parameter to enumerate that specifies the beginning number for counting (zero is the default).

Item 18: Use `zip` to Process Iterators in Parallel

Often in Python you find yourself with many lists of related objects. List comprehensions make it easy to take a source list and produce another derived list by applying an expression to each item (see Item 40: “Use Comprehensions Instead of map and filter”). For example, here I take a list of names and create a corresponding list of how many characters are in each name:

Click here to view code image

names = ["Cecilia", "Lise", "Marie"]
counts = [len(n) for n in names]
print(counts)

>>>
[7, 4, 5]

The items in the derived list (counts) are related to the items in the source list (names) by their corresponding positions in the sequences. To access items from both lists in a single loop, I can iterate over the length of the source list (names) and use the offsets generated by range to index into either list. For example, here I use parallel indexing to determine which name is the longest:

Click here to view code image

longest_name = None
max_count = 0

for i in range(len(names)):
    count = counts[i]
    if count > max_count:
        longest_name = names[i]
        max_count = count

print(longest_name)

>>>
Cecilia

The problem is that this whole for statement is visually noisy. The indexing operations—names[i] and counts[i]—make the code hard to read. Indexing into two arrays by the same loop index i seems redundant. I can use the enumerate built-in function (see Item 17: “Prefer enumerate over range”) to improve this slightly, but it’s still not ideal because of the counts[i] indexing operation:

Click here to view code image

longest_name = None
max_count = 0

for i, name in enumerate(names):  # Changed
    count = counts[i]
    if count > max_count:
        longest_name = name       # Changed
        max_count = count

To make this code clearer, Python provides the zip built-in function. zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator. These tuples can be unpacked directly within a for statement (see Item 5: “Prefer Multiple-Assignment Unpacking over Indexing” for background). By eliminating indexing operations, the resulting code is much cleaner than the code above that separately accesses two lists:

Click here to view code image

longest_name = None
max_count = 0

for name, count in zip(names, counts):  # Changed
    if count > max_count:
        longest_name = name
        max_count = count

zip consumes the iterators it wraps one item at a time, which means it can be used with infinitely long inputs without risk of your program using too much memory and crashing (see Item 43: “Consider Generators Instead of Returning Lists” and Item 44: “Consider Generator Expressions for Large List Comprehensions” for how to create such inputs).

However, it’s important to beware of zip’s behavior when the input iterators have different lengths. For example, say that I add another item to the names list, but I forget to update the counts list. Running zip on the two input lists has an unexpected result:

Click here to view code image

names.append("Rosalind")
for name, count in zip(names, counts):
    print(name)

>>>
Cecilia
Lise
Marie

The new item for "Rosalind" isn’t in the output. Why not? This is just how zip works. It keeps yielding tuples until any one of the wrapped iterators is exhausted. Its output is only as long as its shortest input. If premature truncation could be a problem for your program, you can pass the strict keyword argument to zip—a new option since Python 3.10—which will cause the returned generator to raise an exception if any of the inputs is exhausted before the others:

Click here to view code image

for name, count in zip(names, counts, strict=True):  # Changed
    print(name)

>>>
Cecilia
Lise
Marie
Traceback ...
ValueError: zip() argument 2 is shorter than argument 1

Alternatively, you can solve this truncation problem by using the zip_longest function from the itertools built-in module to fill in a missing item with a default value (see Item 24: “Consider itertools for Working with Iterators and Generators”).

Things to Remember

The zip built-in function can be used to iterate over multiple iterators in parallel.
zip creates a lazy generator that produces tuples; it can be used on infinitely long inputs.
zip truncates its output silently to the shortest iterator if you supply it with iterators of different lengths.
Pass the strict keyword argument to zip if you want to ensure that silent truncation is not possible and mismatched iterator lengths should result in a runtime error.

Item 19: Avoid `else` Blocks After `for` and `while` Loops

Python loops have an extra feature that is not available in most other programming languages: You can put an else block immediately after a loop’s repeated interior block:

for i in range(3):
    print("Loop", i)
else:
    print("Else block!")

>>>
Loop 0
Loop 1
Loop 2
Else block!

Surprisingly, the else block runs immediately after the loop finishes. Why is the clause called else? Why not and? In an if/else statement, else means “Do this if the block before this doesn’t happen” (see “Item 7: Consider Conditional Expressions for Simple Inline Logic”. In a try/except statement, except has the same definition: “Do this if trying the block before this failed.”

Similarly, else from try/except/else follows this pattern (see Item 80: “Take Advantage of Each Block in try/except/else/finally”) because it means “Do this if there was no exception to handle.” try/finally is also intuitive because it means “Always do this after trying the block before.”

Given all of the uses of else, except, and finally in Python, a new programmer might assume that the else part of for/else means “Do this if the loop wasn’t completed.” In reality, it does exactly the opposite. Using a break statement in a loop actually skips the else block:

for i in range(3):
    print("Loop", i)
    if i == 1:
        break
else:
    print("Else block!")

>>>
Loop 0
Loop 1

Another surprise is that the else block runs immediately if you loop over an empty sequence:

for x in []:
    print("Never runs")
else:
    print("For else block!")

>>>
For else block!

The else block also runs when while loops are initially False:

Click here to view code image

while False:
    print("Never runs")
else:
    print("While else block!")

>>>
While else block!

The rationale for these behaviors is that else blocks after loops are useful when you’re searching for something. For example, say that I want to determine whether two numbers are coprime (that is, their only common divisor is 1). Here, I iterate through every possible common divisor and test the numbers. After every option has been tried, the loop ends. The else block runs when the numbers are coprime because the loop doesn’t encounter a break:

Click here to view code image

a = 4
b = 9
for i in range(2, min(a, b) + 1):
    print("Testing", i)
    if a % i == 0 and b % i == 0:
        print("Not coprime")
        break
else:
    print("Coprime")

>>>
Testing 2
Testing 3
Testing 4
Coprime

In practice, I wouldn’t write the code this way. Instead, I’d write a helper function to do the calculation. Such a function can be written using either of two common styles.

The first approach is to return early when I match the condition I’m looking for. I only return the default outcome if I fall through the loop:

Click here to view code image

def coprime(a, b):
    for i in range(2, min(a, b) + 1):
        if a % i == 0 and b % i == 0:
            return False
    return True

assert coprime(4, 9)
assert not coprime(3, 6)

The second way is to have a result variable that indicates whether I’ve found what I’m looking for in the loop. Here, I break out of the loop as soon as I find something and then return that indicator variable:

Click here to view code image

def coprime_alternate(a, b):
    is_coprime = True
    for i in range(2, min(a, b) + 1):
        if a % i == 0 and b % i == 0:
            is_coprime = False
            break
    return is_coprime

assert coprime_alternate(4, 9)
assert not coprime_alternate(3, 6)

Both of these approaches are much clearer to readers of unfamiliar code. Depending on the situation, either may be a good choice. However, the expressivity you gain from the else block doesn’t outweigh the burden you put on people (including yourself) who want to understand your code in the future. Simple constructs like loops should be self-evident in Python. You should avoid using else blocks after loops entirely.

Things to Remember

Python has special syntax that allows else blocks to immediately follow for and while loop interior blocks.
The else block after a loop runs only if the loop body did not encounter a break statement.
Avoid using else blocks after loops because their behavior isn’t intuitive and can be confusing.

Item 20: Never Use `for` Loop Variables After the Loop Ends

When you are writing a for loop in Python, you might notice that the variable you create for iteration continues to persist after the loop has finished:

for i in range(3):
    print(f"Inside {i=}")
print(f"After  {i=}")

>>>
Inside i=0
Inside i=1
Inside i=2
After  i=2

It’s possible to use this loop variable assignment behavior to your advantage. For example, here I implement an algorithm for grouping together periodic elements by searching for their indexes in a list:

Click here to view code image

categories = ["Hydrogen", "Uranium", "Iron", "Other"]
for i, name in enumerate(categories):
    if name == "Iron":
        break
print(i)

>>>
2

In the case that a given element isn’t found in the list, the last index will be used after iteration is exhausted to group the item into the "Other" catch-all category (index 3 in this case):

Click here to view code image

for i, name in enumerate(categories):
    if name == "Lithium":
        break
print(i)

>>>
3

The assumption in this algorithm is that either the loop will find a matching item and end early due to a break statement, or the loop will iterate through all the options and fall through. Unfortunately, there’s a third possibility, where the loop never begins because the iterator is initially empty—which can result in a runtime exception:

Click here to view code image

categories = []
for i, name in enumerate(categories):
    if name == "Lithium":
        break
print(i)

>>>
Traceback ...
NameError: name 'i' is not defined

There are alternative approaches for dealing with a loop that never processes anything (see Item 19: “Avoid else Blocks After for and while Loops”). But the point is the same: You can’t always be sure that a loop variable will exist when you try to access it after the loop, so it’s best to never do this in practice.

Fortunately—or perhaps unfortunately—other Python features do not have this problem. The loop variable leakage behavior is not exhibited by list comprehensions or generator expressions (see Item 40: “Use Comprehensions Instead of map and filter” and Item 44: “Consider Generator Expressions for Large List Comprehensions”). If you try to access a comprehension’s inner variables after execution, you’ll find that they’re never present, and thus you can’t inadvertently encounter this pitfall:

Click here to view code image

my_numbers = [37, 13, 128, 21]
found = [i for i in my_numbers if i % 2 == 0]
print(i)  # Always raises

>>>
Traceback ...
NameError: name 'i' is not defined

However, it’s possible for assignment expressions in comprehensions to change this behavior (see Item 42: “Reduce Repetition in Comprehensions with Assignment Expressions”). Exception variables also don’t have this problem of leakage, although they are quirky in their own way (see Item 84: “Beware of Exception Variables Disappearing”).

Things to Remember

The loop variable from for loops can be accessed in the current scope even after the loop terminates.
for loop variables will not be assigned in the current scope if the loop never did a single iteration.
Generator expressions and list comprehensions do not leak loop variables by default.
Exception handlers do not leak exception instance variables.

Item 21: Be Defensive when Iterating over Arguments

When a function takes a list of objects as a parameter, it’s often important to iterate over that list multiple times. For example, say that I want to analyze tourism numbers for the U.S. state of Texas. Imagine that the data set is the number of visitors to each city (in millions per year). I’d like to figure out what percentage of overall tourism each city receives.

To do this, I need a normalization function that sums the inputs to determine the total number of tourists per year and then divides each city’s individual visitor count by the total to find that city’s contribution to the whole:

Click here to view code image

def normalize(numbers):
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

This function works as expected when given a list of visits:

Click here to view code image

visits = [15, 35, 80]
percentages = normalize(visits)
print(percentages)
assert sum(percentages) == 100.0

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

To scale this up, I need to read the data from a file that contains every city in all of Texas. I define a generator to do this because then I can reuse the same function later, when I want to compute tourism numbers for the whole world—a much larger data set with higher memory requirements (see Item 43: “Consider Generators Instead of Returning Lists” for background):

Click here to view code image

def read_visits(data_path):
    with open(data_path) as f:
        for line in f:
            yield int(line)

Surprisingly, calling normalize on the read_visits generator’s return value produces no results:

Click here to view code image

it = read_visits("my_numbers.txt")
percentages = normalize(it)
print(percentages)

>>>
[]

This behavior occurs because an iterator produces its results only a single time. If you iterate over an iterator or a generator that has already raised a StopIteration exception, you won’t get any results the second time around:

Click here to view code image

it = read_visits("my_numbers.txt")
print(list(it))
print(list(it))  # Already exhausted

>>>
[15, 35, 80]
[]

Confusingly, an exception won’t be raised when you iterate over an already exhausted iterator. for loops, the list constructor, and many other functions throughout the Python standard library expect the StopIteration exception to be raised during normal operation. These functions can’t tell the difference between an iterator that has no output and an iterator that had output and is now exhausted.

To solve this problem, you can explicitly exhaust an input iterator and keep a copy of its entire contents in a list. You can then iterate over the list version of the data as many times as you need to. Here’s the same function as before, but now it defensively copies the input iterator:

Click here to view code image

def normalize_copy(numbers):
    numbers_copy = list(numbers)  # Copy the iterator
    total = sum(numbers_copy)
    result = []
    for value in numbers_copy:
        percent = 100 * value / total
        result.append(percent)
    return result

Now the function works correctly on the read_visits generator’s return value:

Click here to view code image

it = read_visits("my_numbers.txt")
percentages = normalize_copy(it)
print(percentages)
assert sum(percentages) == 100.0

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

The problem with this approach is that the copy of the input iterator’s contents could be extremely large. Copying the iterator could cause the program to run out of memory and crash (see Item 115: “Use tracemalloc to Understand Memory Usage and Leaks” on how to debug this). This potential for scalability issues undermines the reason that I wrote read_visits as a generator in the first place. One way around this is to accept a function that returns a new iterator each time it’s called:

Click here to view code image

def normalize_func(get_iter):
    total = sum(get_iter())   # New iterator
    result = []
    for value in get_iter():  # New iterator
        percent = 100 * value / total
        result.append(percent)
    return result

To use normalize_func, I can pass in a lambda expression that produces a new generator iterator each time it’s called:

Click here to view code image

path = "my_numbers.txt"
percentages = normalize_func(lambda: read_visits(path))
print(percentages)
assert sum(percentages) == 100.0

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

Although this works, having to pass a lambda function like this is clumsy. A better way to achieve the same result is to define a new container class that implements the iterator protocol.

The iterator protocol is what Python for loops and related expressions use to traverse the contents of a container type. When Python sees a statement like for x in foo, it actually calls iter(foo) to discover the iterator to loop through. The iter built-in function calls the foo.__iter__ special method in turn. The __iter__ method must return an iterator object (which itself implements the __next__ special method). Then, the for loop repeatedly calls the next built-in function on the iterator object until it’s exhausted (as indicated by a StopIteration exception being raised).

It sounds complicated, but practically speaking, you can enable all of this behavior for your own classes by implementing the __iter__ method as a generator. Here, I define an iterable container class that reads the file containing tourism data and uses yield to produce one line of data at a time:

Click here to view code image

class ReadVisits:
    def __init__(self, data_path):
        self.data_path = data_path

    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

This new container type can be passed to the original function without any modifications:

Click here to view code image

visits = ReadVisits(path)
percentages = normalize(visits)  # Changed
print(percentages)
assert sum(percentages) == 100.0

>>>
[11.538461538461538, 26.923076923076923, 61.53846153846154]

This works because the sum method in normalize calls ReadVisits.__iter__ to allocate a new iterator object. The for loop to normalize the numbers also calls __iter__ to allocate a second iterator object. Each of those iterators will be advanced and exhausted independently, ensuring that each unique iteration sees all of the input data values. The only downside of this approach is that it reads the input data multiple times.

Now that you know how containers like ReadVisits work, you can write your functions and methods to ensure that parameters aren’t just iterators. The protocol states that when an iterator is passed to the iter built-in function, iter returns the iterator itself. In contrast, when a container type is passed to iter, a new iterator object is returned each time. Thus, you can test an input value for this behavior and raise a TypeError to reject arguments that can’t be repeatedly iterated over:

Click here to view code image

def normalize_defensive(numbers):
    if iter(numbers) is numbers:  # An iterator -- bad!
        raise TypeError("Must supply a container")
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

Alternatively, the collections.abc built-in module defines an Iterator class that can be used in an isinstance test to recognize the potential problem (see Item 57: “Inherit from collections.abc Classes for Custom Container Types”):

Click here to view code image

from collections.abc import Iterator

def normalize_defensive(numbers):
    if isinstance(numbers, Iterator):  # Another way to check
        raise TypeError("Must supply a container")
    total = sum(numbers)
    result = []
    for value in numbers:
        percent = 100 * value / total
        result.append(percent)
    return result

The approach of expecting a container is ideal if you don’t want to copy the full input iterator—as in the normalize_copy function above—but you also need to iterate over the input data multiple times. Here, I show how the normalize_defensive function can accept a list, a ReadVisits object, or theoretically any container that follows the iterator protocol:

Click here to view code image

visits_list = [15, 35, 80]
list_percentages = normalize_defensive(visits_list)

visits_obj = ReadVisits(path)
obj_percentages = normalize_defensive(visits_obj)

assert list_percentages == obj_percentages
assert sum(percentages) == 100.0

The normalize_defensive function raises an exception if the input is an iterator rather than a container:

Click here to view code image

visits = [15, 35, 80]
it = iter(visits)
normalize_defensive(it)

>>>
Traceback ...
TypeError: Must supply a container

The same approach of checking for compliance with the iterator protocol can also be used with asynchronous iterators (see Item 76: “Know How to Port Threaded I/O to asyncio” for an example).

Things to Remember

Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you might see strange behavior and missing values.
Python’s iterator protocol defines how containers and iterators interact with the iter and next built-in functions, for loops, and related expressions.
You can easily define your own iterable container type by implementing the __iter__ method as a generator.
You can detect that a value is an iterator (instead of a container) if calling iter on it produces the same value that you passed in. Alternatively, you can use the isinstance built-in function along with the collections.abc.Iterator class.

Item 22: Never Modify Containers While Iterating over Them; Use Copies or Caches Instead

There are many gotchas in Python caused by surprising iteration behaviors (see Item 21: “Be Defensive when Iterating over Arguments” for another common situation). For example, if you add a new item to a dictionary while iterating over it, Python will raise a runtime exception:

Click here to view code image

search_key = "red"
my_dict = {"red": 1, "blue": 2, "green": 3}

for key in my_dict:
    if key == "blue":
        my_dict["yellow"] = 4  # Causes error

>>>
Traceback ...
RuntimeError: dictionary changed size during iteration

A similar error occurs if you delete an item from a dictionary while iterating over it:

Click here to view code image

for key in my_dict:
    if key == "blue":
        del my_dict["green"]  # Causes error

>>>
Traceback ...
RuntimeError: dictionary changed size during iteration

An error won’t occur if, instead of adding or deleting keys from a dictionary, you only change their associated values—which is surprisingly inconsistent with the behaviors above:

Click here to view code image

for key in my_dict:
    if key == "blue":
        my_dict["green"] = 4  # Okay
print(my_dict)

>>>
{'red': 1, 'blue': 2, 'green': 4}

Sets work similarly to dictionaries, and if you change their size by adding or removing items during iteration, you will encounter an exception at runtime:

Click here to view code image

my_set = {"red", "blue", "green"}

for color in my_set:
    if color == "blue":
        my_set.add("yellow")  # Causes error

>>>
Traceback ...
RuntimeError: Set changed size during iteration

However, the behavior of set also seems inconsistent because trying to add an item that already exists in a set won’t cause any problems while you’re iterating over it. Re-adding is allowed because the set’s size didn’t change:

Click here to view code image

for color in my_set:
    if color == "blue":
        my_set.add("green")  # Okay

print(my_set)

>>>
{'green', 'blue', 'red'}

Much as with dictionaries, and also surprisingly inconsistently, lists can have any existing index overwritten during iteration with no problems:

Click here to view code image

my_list = [1, 2, 3]

for number in my_list:
    print(number)
    if number == 2:
        my_list[0] = -1  # Okay

print(my_list)

>>>
1
2
3
[-1, 2, 3]

But if you try to insert an element into a list before the current iterator position, your code will get stuck in an infinite loop:

Click here to view code image

my_list = [1, 2, 3]
for number in my_list:
    print(number)
    if number == 2:
        my_list.insert(0, 4)  # Causes error

>>>
1
2
2
2
2
2
...

However, appending to a list after the current iterator position is not a problem—the index-based iterator hasn’t gotten that far yet—which is, again, surprisingly inconsistent behavior:

Click here to view code image

my_list = [1, 2, 3]

for number in my_list:
    print(number)
    if number == 2:
        my_list.append(4)  # Okay this time

print(my_list)

>>>
1
2
3
4
[1, 2, 3, 4]

Looking at each of the examples above, it can be hard to guess whether the code will work in all cases. Modifying containers during iteration can be especially error prone in situations where the modification point changes based on input to the algorithm. In some cases it’ll work, and in others there will be an error. Thus, my advice is to never modify containers while you iterate over them.

If you still need to make modifications during iteration due to the nature of your algorithm, you should simply make a copy of the container you want to iterate and apply modifications to the original (see Item 30: “Know That Function Arguments Can Be Mutated”). For example, with dictionaries I can copy the keys:

Click here to view code image

my_dict = {"red": 1, "blue": 2, "green": 3}

keys_copy = list(my_dict.keys())  # Copy
for key in keys_copy:             # Iterate over copy
    if key == "blue":
        my_dict["green"] = 4      # Modify original dict

print(my_dict)

>>>
{'red': 1, 'blue': 2, 'green': 4}

For lists I can copy the whole container:

Click here to view code image

my_list = [1, 2, 3]

list_copy = list(my_list)     # Copy
for number in list_copy:      # Iterate over copy
    print(number)
    if number == 2:
        my_list.insert(0, 4)  # Inserts in original list

print(my_list)

>>>
1
2
3
[4, 1, 2, 3]

And the same approach works for sets:

Click here to view code image

my_set = {"red", "blue", "green"}

set_copy = set(my_set)        # Copy
for color in set_copy:        # Iterate over copy
    if color == "blue":
        my_set.add("yellow")  # Add to original set

print(my_set)

>>>
{'yellow', 'green', 'blue', 'red'}

For some extremely large containers, copying might be too slow (see Item 92: “Profile Before Optimizing” to verify your assumptions). One way to deal with poor performance is to stage modifications in a separate container and then merge the changes into the main data structure after iteration. For example, here I modify a separate dictionary and then use the update method to bring the changes into the original dictionary:

Click here to view code image

my_dict = {"red": 1, "blue": 2, "green": 3}
modifications = {}

for key in my_dict:
    if key == "blue":
        modifications["green"] = 4  # Add to staging

my_dict.update(modifications)       # Merge modifications
print(my_dict)

>>>
{'red': 1, 'blue': 2, 'green': 4}

The problem with staging modifications is that they won’t be immediately visible in the original container during iteration. If the logic in the loop relies on modifications being immediately visible, the code won’t work as expected. For example, here the programmer’s intent might have been to cause "yellow" to be in the resulting dictionary, but it won’t be there because the modifications aren’t visible during iteration:

Click here to view code image

my_dict = {"red": 1, "blue": 2, "green": 3}
modifications = {}
for key in my_dict:
    if key == "blue":
        modifications["green"] = 4
    value = my_dict[key]
    if value == 4:               # This condition is never true
        modifications["yellow"] = 5

my_dict.update(modifications)    # Merge modifications
print(my_dict)

>>>
{'red': 1, 'blue': 2, 'green': 4}

This code can be fixed by looking in both the original container (my_dict) and the modifications container (modifications) for the latest value during iteration, essentially treating the staging dictionary as an intermediate cache:

Click here to view code image

my_dict = {"red": 1, "blue": 2, "green": 3}
modifications = {}

for key in my_dict:
    if key == "blue":
        modifications["green"] = 4
    value = my_dict[key]
    other_value = modifications.get(key)  # Check cache
    if value == 4 or other_value == 4:
        modifications["yellow"] = 5

my_dict.update(modifications)             # Merge modifications
print(my_dict)

>>>
{'red': 1, 'blue': 2, 'green': 4, 'yellow': 5}

This type of reconciliation works, but it’s hard to generalize to all situations. When developing an algorithm like this, you’ll need to take your specific constraints into account. This can be quite difficult to get right, especially with all of the edge cases, so I recommend writing automated tests to verify correctness (see Item 109: “Prefer Integration Tests over Unit Tests”). Similarly, you can use microbenchmarks to measure the performance of various approaches and pick the best one (see Item 93: “Optimize Performance-Critical Code Using timeit Microbenchmarks”).

Things to Remember

Adding or removing elements from lists, dictionaries, and sets while you’re iterating over them can cause runtime errors that are often hard to predict.
You can iterate over a copy of a container to avoid runtime errors that might be caused by mutation during iteration.
If you need to avoid copying for better performance, you can stage modifications in a second container cache that you later merge into the original.

Item 23: Pass Iterators to `any` and `all` for Efficient Short-Circuiting Logic

Python is a great language for building programs that do logical reasoning. For example, imagine that I’m trying to analyze the nature of flipping a coin. I can define a function that will return a random coin flip outcome—True for heads or False for tails—each time it’s called:

Click here to view code image

import random

def flip_coin():
    if random.randint(0, 1) == 0:
        return "Heads"
    else:
        return "Tails"

def flip_is_heads():
    return flip_coin() == "Heads"

If I want to flip a coin 20 times and see if every result is consecutively heads, I can use a simple list comprehension (see Item 40: “Use Comprehensions Instead of map and filter”) and membership test with the in operator (see Item 57: “Inherit from collections.abc Classes for Custom Container Types”):

Click here to view code image

flips = [flip_is_heads() for _ in range(20)]
all_heads = False not in flips

However, the chance of this sequence of 20 coin flips producing nothing but heads is roughly one in a million—extremely rare. If coin flips were somehow expensive to do, I’d almost always waste a lot of resources on unnecessary work in the list comprehension because it keeps flipping coins even after seeing a tails result. I can improve this situation by using a loop that terminates the sequence of coin flips as soon as a non-heads outcome is seen:

all_heads = True
for _ in range(100):
    if not flip_is_heads():
        all_heads = False
        break

Although this code is more efficient, it’s much longer than the list comprehension from before. To keep the code short while also ending execution early, I can use the all built-in function. all steps through an iterator, checks whether each item is truthy (see Item 7: Consider Conditional Expressions for Simple Inline Logic” for background), and immediately stops processing if not. all always returns a Boolean value of True or False, which is different from how the and logical operator returns the last value that’s tested:

print("All truthy:")
print(all([1, 2, 3]))
print(1 and 2 and 3)

print("One falsey:")
print(all([1, 0, 3]))
print(1 and 0 and 3)

>>>
All truthy:
True
3
One falsey:
False
0

Using the all built-in function, I can rewrite the coin-flipping loop using a generator expression (see Item 44: “Consider Generator Expressions for Large List Comprehensions”). It will stop doing more coin flips as soon as the flip_is_heads function returns False:

Click here to view code image

all_heads = all(flip_is_heads() for _ in range(20))

Critically, if I pass a list comprehension instead of a generator expression—note the presence of the surrounding [ and ] square brackets—the code will create a list of 20 coin-flip outcomes before passing them to the all function. The computed result will be the same, but the code’s performance will be far worse:

Click here to view code image

all_heads = all([flip_is_heads() for _ in range(20)])  # Wrong

Alternatively, I can use a yielding generator function (see Item 43: “Consider Generators Instead of Returning Lists”) or any other type of iterator to achieve similar efficiency:

Click here to view code image

def repeated_is_heads(count):
    for _ in range(count):
        yield flip_is_heads()  # Generator

all_heads = all(repeated_is_heads(20))

Once repeated_is_heads yields a False value, the all built-in function will stop moving the generator iterator forward and return False. The reference to the generator’s iterator that was passed to all will be thrown away and garbage collected, ensuring that the loop never completes (see Item 89: “Always Pass Resources into Generators and Have Callers Clean Them Up Outside” for details).

Sometimes, you’ll have a function that behaves in the opposite way of flip_is_heads, returning False most of the time and True only when a certain condition is met. Here, I define a function that behaves this way:

Click here to view code image

def flip_is_tails():
    return flip_coin() == "Tails"

In order to use this function to detect consecutive heads, all won’t work. Instead, I can use the any built-in function. any similarly steps through an iterator, but it terminates upon seeing the first truthy value. any always returns a Boolean value, unlike the or logical operator that it mirrors:

print("All falsey:")
print(any([0, False, None]))
print(0 or False or None)

print("One truthy:")
print(any([None, 3, 0]))
print(None or 3 or 0)

>>>
All falsey:
False
None
One truthy:
True
3

With any, I can use flip_is_tails in a generator expression to compute the same results as before:

Click here to view code image

all_heads = not any(flip_is_tails() for _ in range(20))

Or I can create a similar generator function:

Click here to view code image

def repeated_is_tails(count):
    for _ in range(count):
        yield flip_is_tails()

all_heads = not any(repeated_is_tails(20))

When should you choose any vs. all? It depends on what you’re doing and the difficulty of testing the conditions that you care about. If you want to end early with a True value, then use any. If you want to end early with a False value, then use all. Ultimately, these built-in functions are equivalent, as demonstrated by De Morgan’s laws for Boolean logic:

Click here to view code image

for a in (True, False):
    for b in (True, False):
        assert any([a, b]) == (not all([not a, not b]))
        assert all([a, b]) == (not any([not a, not b]))

One way or another, you should be able to find a way to minimize the amount of work being done by using any or all appropriately. There are also additional built-in modules for operating on iterators and generators in intelligent ways to maximize performance and efficiency (see Item 24: “Consider itertools for Working with Iterators and Generators”).

Things to Remember

The all built-in function returns True if all items provided are truthy. It stops processing input and returns False as soon as a falsey item is encountered.
The any built-in function works similarly but with opposite logic: It returns False if all items are falsey and ends early with True as soon as it sees a truthy value.
any and all always return the Boolean values True or False, unlike the or and and logical operators, which return the last item that needed to be tested.
Using list comprehensions with any or all instead of generator expression undermines the efficiency benefits of these functions.

Item 24: Consider `itertools` for Working with Iterators and Generators

The itertools built-in module contains a large number of functions that are useful for organizing and interacting with iterators (see Item 43: “Consider Generators Instead of Returning Lists” and Item 21: “Be Defensive when Iterating over Arguments” for background):

import itertools

Whenever you find yourself dealing with tricky iteration code, it’s worth looking at the itertools documentation again to see if there’s anything in there for you to use (see https://docs.python.org/3/library/itertools.html). The following sections describe the most important functions that you should know in three primary categories.

Linking Iterators Together

The itertools built-in module includes a number of functions for linking iterators together.

`chain`

Use chain to combine multiple iterators into a single sequential iterator. Essentially this flattens the provided input iterators into one iterator of items:

Click here to view code image

it = itertools.chain([1, 2, 3], [4, 5, 6])
print(list(it))

>>>
[1, 2, 3, 4, 5, 6]

There’s also an alternative version of this function, chain.from_iterable, that consumes an iterator of iterators and produces a single flattened output iterator that includes all of the contents of the iterators:

Click here to view code image

it1 = [i * 3 for i in ("a", "b", "c")]
it2 = [j * 2 for j in ("x", "y", "z")]
nested_it = [it1, it2]
output_it = itertools.chain.from_iterable(nested_it)
print(list(output_it))

>>>
['aaa', 'bbb', 'ccc', 'xx', 'yy', 'zz']

`repeat`

Use repeat to output a single value forever or use the second optional parameter to specify a maximum number of times:

Click here to view code image

it = itertools.repeat("hello", 3)
print(list(it))

>>>
['hello', 'hello', 'hello']

`cycle`

Use cycle to repeat an iterator’s items forever:

Click here to view code image

it = itertools.cycle([1, 2])
result = [next(it) for _ in range(10)]
print(result)

>>>
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

`tee`

Use tee to split a single iterator into the number of parallel iterators specified by the second parameter. The memory usage of this function will grow if the iterators don’t progress at the same speed since buffering will be required to temporarily store the pending items:

Click here to view code image

it1, it2, it3 = itertools.tee(["first", "second"], 3)
print(list(it1))
print(list(it2))
print(list(it3))

>>>
['first', 'second']
['first', 'second']
['first', 'second']

`zip_longest`

This variant of the zip built-in function returns a placeholder value when an iterator is exhausted, which may happen if iterators have different lengths (see Item 18: “Use zip to Process Iterators in Parallel” for how the strict argument can provide similar behavior):

Click here to view code image

keys = ["one", "two", "three"]
values = [1, 2]

normal = list(zip(keys, values))
print("zip:        ", normal)

it = itertools.zip_longest(keys, values, fillvalue="nope")
longest = list(it)
print("zip_longest:", longest)

>>>
zip:         [('one', 1), ('two', 2)]
zip_longest: [('one', 1), ('two', 2), ('three', 'nope')]

Filtering Items from an Iterator

The itertools built-in module includes a number of functions for filtering items from an iterator.

`islice`

Use islice to slice an iterator by numerical indexes without copying. You can specify the end, start and end, or start, end, and step sizes. The behavior of islice is similar to that of standard sequence slicing and striding (see Item 14: “Know How to Slice Sequences” and Item 15: “Avoid Striding and Slicing in a Single Expression”):

Click here to view code image

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

first_five = itertools.islice(values, 5)
print("First five: ", list(first_five))

middle_odds = itertools.islice(values, 2, 8, 2)
print("Middle odds:", list(middle_odds))

>>>
First five:  [1, 2, 3, 4, 5]
Middle odds: [3, 5, 7]

`takewhile`

takewhile returns items from an iterator until a predicate function returns False for an item, at which point all items from the iterator will be consumed but not returned (see Item 39: “Prefer functools.partial over lambda Expressions for Glue Functions” for more about defining predicates):

Click here to view code image

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
less_than_seven = lambda x: x < 7
it = itertools.takewhile(less_than_seven, values)
print(list(it))

>>>
[1, 2, 3, 4, 5, 6]

`dropwhile`

dropwhile, which is the opposite of takewhile, skips items from an iterator until the predicate function returns False for the first time, at which point all items from the iterator will be returned:

Click here to view code image

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
less_than_seven = lambda x: x < 7
it = itertools.dropwhile(less_than_seven, values)
print(list(it))

>>>
[7, 8, 9, 10]

`filterfalse`

filterfalse, which is the opposite of the filter built-in function, returns all items from an iterator when a predicate function returns False:

Click here to view code image

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = lambda x: x % 2 == 0

filter_result = filter(evens, values)
print("Filter:      ", list(filter_result))

filter_false_result = itertools.filterfalse(evens, values)
print("Filter false:", list(filter_false_result))

>>>
Filter:       [2, 4, 6, 8, 10]
Filter false: [1, 3, 5, 7, 9]

Producing Combinations of Items from Iterators

The itertools built-in module includes a number of functions for producing combinations of items from iterators.

`batched`

Use batched to create an iterator that outputs fixed sized, non-overlapping groups of items from a single input iterator. The second argument is the batch size. This can be especially useful when processing data together for efficiency or satisfying other constraints, like data size limits:

Click here to view code image

it = itertools.batched([1, 2, 3, 4, 5, 6, 7, 8, 9], 3)
print(list(it))

>>>
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]

The last group produced by the iterator might be smaller than the specified batch size if the items can’t divide perfectly:

Click here to view code image

it = itertools.batched([1, 2, 3], 2)
print(list(it))

>>>
[(1, 2), (3,)]

`pairwise`

Use pairwise when you need to iterate through each pair of adjacent items in the input iterator. The pairs include overlaps, so each item except for the ends appears twice in the output iterator: once in the first position of a pair and another time in the second position. This can be helpful when writing graph-traversal algorithms that need to step through sequential sets of vertexes or endpoints:

Click here to view code image

route = ["Los Angeles", "Bakersfield", "Modesto", "Sacramento"]
it = itertools.pairwise(route)
print(list(it))

>>>
[('Los Angeles', 'Bakersfield'), ('Bakersfield', 'Modesto'),
➥('Modesto', 'Sacramento')]

`accumulate`

accumulate folds an item from the iterator into a running value by applying a function that takes two parameters. It outputs the current accumulated result for each input value:

Click here to view code image

values = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sum_reduce = itertools.accumulate(values)
print("Sum:   ", list(sum_reduce))

def sum_modulo_20(first, second):
    output = first + second
    return output % 20
modulo_reduce = itertools.accumulate(values, sum_modulo_20)
print("Modulo:", list(modulo_reduce))

>>>
Sum:    [1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
Modulo: [1, 3, 6, 10, 15, 1, 8, 16, 5, 15]

This is essentially the same as the reduce function from the functools built-in module but with outputs yielded one step at a time. By default it sums the inputs if no binary function is specified.

`product`

product returns the Cartesian product of items from one or more iterators, which is a nice alternative to using deeply nested list comprehensions (see Item 41: “Avoid More Than Two Control Subexpressions in Comprehensions” for why to avoid those):

Click here to view code image

single = itertools.product([1, 2], repeat=2)
print("Single:  ", list(single))

multiple = itertools.product([1, 2], ["a", "b"])
print("Multiple:", list(multiple))

>>>
Single:   [(1, 1), (1, 2), (2, 1), (2, 2)]
Multiple: [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

`permutations`

permutations returns the unique ordered permutations of length N—the second argument—with items from an iterator:

Click here to view code image

it = itertools.permutations([1, 2, 3, 4], 2)
print(list(it))

>>>
[(1, 2),
 (1, 3),
 (1, 4),
 (2, 1),
 (2, 3),
 (2, 4),
 (3, 1),
 (3, 2),
 (3, 4),
 (4, 1),
 (4, 2),
 (4, 3)]

`combinations`

combinations returns the unordered combinations of length N—the second argument—with unrepeated items from an iterator:

Click here to view code image

it = itertools.combinations([1, 2, 3, 4], 2)
print(list(it))

>>>
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]

`combinations_with_replacement`

combinations_with_replacement is the same as combinations, but repeated values are allowed. The difference between this and the permutations function is that this version allows the same input to be repeated multiple times in the output groups (i.e., see (1, 1) in the output below):

Click here to view code image

it = itertools.combinations_with_replacement([1, 2, 3, 4], 2)
print(list(it))

>>>
[(1, 1),
 (1, 2),
 (1, 3),
 (1, 4),
 (2, 2),
 (2, 3),
 (2, 4),
 (3, 3),
 (3, 4),
 (4, 4)]

Things to Remember

The itertools functions fall into three main categories for working with iterators and generators: linking iterators together, filtering items they output, and producing combinations of items.
There are more advanced functions, additional parameters, and useful recipes available in the official documentation.