Effective Python: 125 Specific Ways to Write Better Python, 3rd Edition

13

Testing and Debugging

The dynamic behavior of Python and its lack of static type checking by default is both a blessing and a curse (see Item 3: “Never Expect Python to Detect Errors at Compile Time” for details). However, large numbers of Python programmers out there say it’s worth using because of the productivity gained from the resulting brevity and simplicity. But most people using Python have at least one horror story about a program encountering a boneheaded error at runtime. One of the worst examples I’ve heard of involved a SyntaxError exception being raised in production as a side effect of a dynamic import (see Item 98: “Lazy-Load Modules with Dynamic Imports to Reduce Startup Time” for an example), resulting in a crashed server process. The programmer I know who was hit by this surprising occurrence has since ruled out using Python ever again.

But I have to wonder: Why wasn’t the code more well tested before the program was deployed to production? Compile-time static type safety isn’t everything. You should always test your code, regardless of what language it’s written in. However, I’ll admit that in Python it may be more important to write tests to verify correctness than in other languages. Luckily, the same dynamic features that create these risks also make it extremely easy to write tests for your code and to debug malfunctioning programs. You can use Python’s dynamic nature and easily overridable behaviors to implement tests and ensure that your programs work as expected.

Item 108: Verify Related Behaviors in `TestCase` Subclasses

The canonical way to write tests in Python is to use the unittest built-in module. For example, say I have the following utility function defined that I would like to verify works correctly across a variety of inputs:

Click here to view code image

# utils.py
def to_str(data):
    if isinstance(data, str):
        return data
    elif isinstance(data, bytes):
        return data.decode("utf-8")
    else:
        raise TypeError(
            f"Must supply str or bytes, found: {data}")

To define tests, I create a second file named test_utils.py or utils_test.py—the naming scheme is a style choice—that contains tests for each behavior I expect:

Click here to view code image

# utils_test.py
from unittest import TestCase, main
from utils import to_str

class UtilsTestCase(TestCase):
    def test_to_str_bytes(self):
        self.assertEqual("hello", to_str(b"hello"))

    def test_to_str_str(self):
        self.assertEqual("hello", to_str("hello"))

    def test_failing(self):
        self.assertEqual("incorrect", to_str("hello"))

if __name__ == "__main__":
    main()

Then I run the test file using the Python command line. In this case, two of the test methods pass and one fails, printing out a helpful error message about what went wrong:

Click here to view code image

$ python3 utils_test.py
F..
===============================================================
FAIL: test_failing (__main__.UtilsTestCase)
---------------------------------------------------------------
Traceback (most recent call last):
  File "utils_test.py", line 15, in test_failing
    self.assertEqual('incorrect', to_str('hello'))
AssertionError: 'incorrect' != 'hello'

- incorrect
+ hello


---------------------------------------------------------------
Ran 3 tests in 0.002s

FAILED (failures=1)

Tests are organized into TestCase subclasses. Each test case is a method that begins with the word test. If a test method runs without raising any kind of exception (including AssertionError from assert statements; see Item 81: “assert Internal Assumptions and raise Missed Expectations”), the test is considered to have passed successfully. If one test fails, the TestCase subclass continues running the other test methods so you can get a full picture of how all your tests are doing instead of stopping at the first sign of trouble.

If you want to iterate quickly to fix or improve a specific test, you can run only that test method by specifying its path within the test module on the command line:

Click here to view code image

$ python3 utils_test.py UtilsTestCase.test_to_str_bytes
.
---------------------------------------------------------------
Ran 1 test in 0.000s

OK

You can also invoke the debugger from directly within test methods at specific breakpoints in order to dig more deeply into the cause of failures (see Item 114: “Consider Interactive Debugging with pdb” for how to do that).

The TestCase class provides helper methods for making assertions in your tests, such as assertEqual for verifying equality, assertTrue for verifying Boolean expressions, assertAlmostEqual for when precision is a concern (see Item 113: “Use assertAlmostEqual to Control Precision in Floating Point Tests”), and many more (see https://docs.python.org/3/library/unittest.html for the full list). These are better than the built-in assert statement because they print out all the inputs and outputs to help you understand the exact reason the test is failing. For example, here I have the same test case written with and without a helper assertion method:

Click here to view code image

# assert_test.py
from unittest import TestCase, main
from utils import to_str

class AssertTestCase(TestCase):
    def test_assert_helper(self):
        expected = 12
        found = 2 * 5
        self.assertEqual(expected, found)

    def test_assert_statement(self):
        expected = 12
        found = 2 * 5
        assert expected == found

if __name__ == "__main__":
    main()

Which of these failure messages seems more helpful to you? Note how the second message doesn’t show the values of expected or found:

Click here to view code image

$ python3 assert_test.py
FF
===============================================================
FAIL: test_assert_helper (__main__.AssertTestCase)
---------------------------------------------------------------
Traceback (most recent call last):
  File "assert_test.py", line 16, in test_assert_helper
    self.assertEqual(expected, found)
AssertionError: 12 != 10

===============================================================
FAIL: test_assert_statement (__main__.AssertTestCase)
---------------------------------------------------------------
Traceback (most recent call last):
  File "assert_test.py", line 11, in test_assert_statement
    assert expected == found
AssertionError

---------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=2)

There’s also an assertRaises helper method for verifying exceptions, which can be used as a context manager in with statements (see Item 82: “Consider contextlib and with Statements for Reusable try/finally Behavior” for how that works). This appears similar to a try/except statement and makes it abundantly clear where the exception is expected to be raised:

Click here to view code image

# utils_error_test.py
from unittest import TestCase, main
from utils import to_str

class UtilsErrorTestCase(TestCase):
    def test_to_str_bad(self):
        with self.assertRaises(TypeError):
            to_str(object())

    def test_to_str_bad_encoding(self):
        with self.assertRaises(UnicodeDecodeError):
            to_str(b"\xfa\xfa")

if __name__ == "__main__":
    main()

You can also define your own helper methods with complex logic in TestCase subclasses to make your tests more readable. Just ensure that your method names don’t begin with the word test, or they’ll be run as if they’re test cases. In addition to calling TestCase assertion methods, these custom test helpers often use the fail method to clarify which assumption or invariant wasn’t met. For example, here I define a custom test helper method for verifying the behavior of a generator:

Click here to view code image

# helper_test.py
from unittest import TestCase, main

def sum_squares(values):
    cumulative = 0
    for value in values:
        cumulative += value**2
        yield cumulative

class HelperTestCase(TestCase):
    def verify_complex_case(self, values, expected):
        expect_it = iter(expected)
        found_it = iter(sum_squares(values))
        test_it = zip(expect_it, found_it, strict=True)

        for i, (expect, found) in enumerate(test_it):
            if found != expect:
                self.fail(f”Index {i} is wrong: “
                          f”{found} != {expect}”)

    def test_too_short(self):
        values = [1.1, 2.2]
        expected = [1.1**2]
        self.verify_complex_case(values, expected)

    def test_too_long(self):
        values = [1.1, 2.2]
        expected = [
            1.1**2,
            1.1**2 + 2.2**2,
            0,  # Value doesn't matter
        ]
        self.verify_complex_case(values, expected)

    def test_wrong_results(self):
        values = [1.1, 2.2, 3.3]
        expected = [
            1.1**2,
            1.1**2 + 2.2**2,
            1.1**2 + 2.2**2 + 3.3**2 + 4.4**2,
        ]
        self.verify_complex_case(values, expected)

if __name__ == "__main__":
    main()

The helper method makes the test cases short and readable, and the outputted error messages are easy to understand:

Click here to view code image

$ python3 helper_test.py
EEF
==============================================================
ERROR: test_too_long (__main__.HelperTestCase.test_too_long)
--------------------------------------------------------------
Traceback (most recent call last):
  File "helper_test.py", line 36, in test_too_long
    self.verify_complex_case(values, expected)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "helper_test.py", line 20, in verify_complex_case
    for i, (expect, found) in enumerate(test_it):
                              ~~~~~~~~~^^^^^^^^^
ValueError: zip() argument 2 is shorter than argument 1

==============================================================
ERROR: test_too_short (__main__.HelperTestCase.test_too_short)
--------------------------------------------------------------
Traceback (most recent call last):
  File "helper_test.py", line 27, in test_too_short
    self.verify_complex_case(values, expected)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "helper_test.py", line 20, in verify_complex_case
    for i, (expect, found) in enumerate(test_it):
                              ~~~~~~~~~^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1

==============================================================
FAIL: test_wrong_results
➥(__main__.HelperTestCase.test_wrong_results)
--------------------------------------------------------------
Traceback (most recent call last):
  File "helper_test.py", line 45, in test_wrong_results
    self.verify_complex_case(values, expected)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "helper_test.py", line 22, in verify_complex_case
    self.fail(f"Index {i} is wrong: {found} != {expect}")
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Index 2 is wrong: 16.939999999999998 != 36.3

--------------------------------------------------------------
Ran 3 tests in 0.001s

FAILED (failures=1, errors=2)

I usually define one TestCase subclass for each set of related tests. Sometimes, I have one TestCase subclass for each function that has many edge cases. Other times, a TestCase subclass spans all functions in a single module. I often create one TestCase subclass for testing each basic class and all of its methods (see Item 109: “Prefer Integration Tests over Unit Tests” for more guidance).

The TestCase class also provides a subTest helper method that enables you to avoid boilerplate by defining multiple tests within a single test method. This is especially helpful for writing data-driven tests, and it allows the test method to continue testing other cases even after one of them fails (similar to the behavior of TestCase with its contained test methods; see Item 110: “Isolate Tests from Each Other with setUp, tearDown, setUpModule, and tearDownModule” for another approach). To show this, here I define an example data-driven test:

Click here to view code image

# data_driven_test.py
from unittest import TestCase, main
from utils import to_str

class DataDrivenTestCase(TestCase):
    def test_good(self):
        good_cases = [
            (b"my bytes", "my bytes"),
            ("no error", b"no error"),  # This one will fail
            ("other str", "other str"),
            ...
        ]
        for value, expected in good_cases:
            with self.subTest(value):
                self.assertEqual(expected, to_str(value))

    def test_bad(self):
        bad_cases = [
            (object(), TypeError),
            (b"\xfa\xfa", UnicodeDecodeError),
            ...
        ]
        for value, exception in bad_cases:
            with self.subTest(value):
                with self.assertRaises(exception):
                    to_str(value)

if __name__ == "__main__":
    main()

The "no error" test case fails, printing a helpful error message, but all of the other cases are still tested and confirmed to pass:

Click here to view code image

$ python3 data_driven_test.py
.

===============================================================
FAIL: test_good (__main__.DataDrivenTestCase) [no error]
---------------------------------------------------------------
Traceback (most recent call last):
  File "testing/data_driven_test.py", line 18, in test_good
    self.assertEqual(expected, to_str(value))
AssertionError: b'no error' != 'no error'

---------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=1)

At some point, depending on your project’s complexity and testing requirements, you might outgrow unittest and its capabilities. If and when that happens, the pytest (https://pytest.org) open source package and its large number of community plug-ins can be especially useful as an alternative test runner.

Things to Remember

You can create tests by subclassing the TestCase class from the unittest built-in module and defining one method per behavior you’d like to test. A test method on a TestCase class must start with the word test.
Use the various helper methods defined by the TestCase class, such as assertEqual, to confirm expected behaviors in your tests instead of using the built-in assert statement.
Consider writing data-driven tests using the subTest helper method in order to reduce boilerplate.

Item 109: Prefer Integration Tests over Unit Tests

There are many approaches to software testing that are far broader than Python, including test-driven development, property-based testing, mutation testing, and code and branch coverage reporting. You will find great tools for writing every type and style of automated test imaginable in Python’s built-in and community packages (see Item 116: “Know Where to Find Community-Built Modules”). So the question in Python isn’t whether you can and should write tests, but instead: How much testing is enough, and what exactly should your tests verify?

It’s best to think of tests in Python as an insurance policy on your code. Good tests give you confidence that your code is correct. If you refactor or expand your code, tests that verify behavior—not implementation—make it easy to identify what’s changed. It sounds counterintuitive, but having well-built tests actually makes it easier to modify Python code, not harder.

As in other languages, testing can exercise many different levels of a codebase. Unit tests verify focused pieces of a much larger system. They are useful when you have a lot of edge cases and you need to ensure that everything is handled properly. They are fast to run because they use only a small part of the program. Often they’re built using mocks (see Item 111: “Use Mocks to Test Code with Complex Dependencies”).

Integration tests verify that multiple components work together. They’re often slower to run and harder to write (see Item 110: “Isolate Tests from Each Other with setUp, tearDown, setUpModule, and tearDownModule” for an example). However, integration tests are especially important in Python because you have no guarantee that your subsystems will actually interoperate unless you prove it (see Item 3: “Never Expect Python to Detect Errors at Compile Time”). Statically typed languages can use type information to approximate rough fitting of components, but similarly leveraging types can be much more difficult in dynamic languages (see Item 124: “Consider Static Analysis via typing to Obviate Bugs”) or practically infeasible.

Generally in Python it’s best to write integration tests. But if you notice that some parts of your code also have a lot of boundary conditions to explore, then it might be worth writing unit tests for those behaviors as well. What you don’t want to do is only write unit tests. For example, imagine that I’m building embedded software to control a toaster. Here I define a toaster class that lets me set the “doneness” level, push down the bread, or pop up the toast:

Click here to view code image

class Toaster:
    def __init__(self, timer):
        self.timer = timer
        self.doneness = 3
        self.hot = False

    def _get_duration(self):
        return max(0.1, min(120, self.doneness * 10))

    def push_down(self):
        if self.hot:
            return

        self.hot = True
        self.timer.countdown(self._get_duration(), self.pop_up)

    def pop_up(self):
        print("Pop!")  # Release the spring
        self.hot = False
        self.timer.end()

The Toaster class relies on a timer that ejects the toast when it’s done. It should be possible to reset the timer any number of times. Here I use the Timer class from the threading built-in module to implement this:

Click here to view code image

import threading

class ReusableTimer:
    def __init__(self):
        self.timer = None

    def countdown(self, duration, callback):
        self.end()
        self.timer = threading.Timer(duration, callback)
        self.timer.start()

    def end(self):
        if self.timer:
            self.timer.cancel()

With these two classes defined, I can easily exercise the toaster’s functionality to show that it can apply heat to bread and pop it up before burning it:

Click here to view code image

toaster = Toaster(ReusableTimer())
print("Initially hot:  ", toaster.hot)
toaster.doneness = 5
toaster.push_down()
print("After push down:", toaster.hot)

# Time passes
...
print("After time:     ", toaster.hot)

>>>
Initially hot:   False
After push down: True
Pop!
After time:      False

If I wanted to write a unit test for the Toaster class, I might do something like this with the built-in unittest module (see Item 108: “Verify Related Behaviors in TestCase Subclasses”), where I mock out the ReusableTimer class entirely:

Click here to view code image

from unittest import TestCase
from unittest.mock import Mock

class ToasterUnitTest(TestCase):

    def test_start(self):
        timer = Mock(spec=ReusableTimer)
        toaster = Toaster(timer)
        toaster.push_down()
        self.assertTrue(toaster.hot)
        timer.countdown.assert_called_once_with(
            30, toaster.pop_up)

    def test_end(self):
        timer = Mock(spec=ReusableTimer)
        toaster = Toaster(timer)
        toaster.hot = True
        toaster.pop_up()
        self.assertFalse(toaster.hot)
        timer.end.assert_called_once()

...

>>>
Pop!
..
---------------------------------------------------------------
Ran 2 tests in 0.000s

OK

Writing a unit test for the ReusableTimer class could similarly mock its dependencies:

Click here to view code image

from unittest import mock

class ReusableTimerUnitTest(TestCase):

    def test_countdown(self):
        my_func = lambda: None
        with mock.patch("threading.Timer"):
            timer = ReusableTimer()
            timer.countdown(0.1, my_func)
            threading.Timer.assert_called_once_with(0.1, my_func)
            timer.timer.start.assert_called_once()

    def test_end(self):
        my_func = lambda: None
        with mock.patch("threading.Timer"):
            timer = ReusableTimer()
            timer.countdown(0.1, my_func)
            timer.end()
            timer.timer.cancel.assert_called_once()

...

>>>
..
---------------------------------------------------------------
Ran 2 tests in 0.001s

OK

These unit tests work, but they require quite a lot of set up and fiddling with mocks. Instead, consider this single integration test that verifies the Toaster and ReusableTimer classes together, without using any mocks:

Click here to view code image

class ToasterIntegrationTest(TestCase):

    def setUp(self):
        self.timer = ReusableTimer()
        self.toaster = Toaster(self.timer)
        self.toaster.doneness = 0

    def test_wait_finish(self):
        self.assertFalse(self.toaster.hot)
        self.toaster.push_down()
        self.assertTrue(self.toaster.hot)
        self.timer.timer.join()
        self.assertFalse(self.toaster.hot)

    def test_cancel_early(self):
        self.assertFalse(self.toaster.hot)
        self.toaster.push_down()
        self.assertTrue(self.toaster.hot)
        self.toaster.pop_up()
        self.assertFalse(self.toaster.hot)

...

>>>
Pop!
.Pop!
.
---------------------------------------------------------------
Ran 2 tests in 0.108s

OK

This test is clear, concise, and focused on the end-to-end behavior instead of implementation details. Perhaps the only gripe I have with it is that it accesses the internals of the ReusableTimer class in order to properly wait for the threading.Timer instance to finish (using the join method). But this is Python, and having that kind of access for testing is one of the language’s primary benefits.

The earlier unit tests for the Toaster and ReusableTimer classes, respectively, appear redundant and unnecessarily complex in comparison to this single integration test. However, there is one potential benefit that a unit test could bring to this code: testing the boundaries of the doneness setting to make sure it’s never too long or too short:

Click here to view code image

class DonenessUnitTest(TestCase):
    def setUp(self):
        self.toaster = Toaster(ReusableTimer())

    def test_min(self):
        self.toaster.doneness = 0
        self.assertEqual(0.1, self.toaster._get_duration())

    def test_max(self):
        self.toaster.doneness = 1000
        self.assertEqual(120, self.toaster._get_duration())

...

>>>
..
---------------------------------------------------------------
Ran 2 tests in 0.000s

OK

This is the right balance for the tests you should write in Python: Definitely have integration tests for end-to-end behaviors and maybe have unit tests for intricate edge cases. It’s easy to avoid mocks most of the time and use them only when there’s a compelling reason (see Item 112: “Encapsulate Dependencies to Facilitate Mocking and Testing”). Otherwise, don’t forget that you’ll still need even larger system tests to verify how your Python programs interact with corresponding web clients, API endpoints, mobile applications, databases, and so on.

Things to Remember

An integration test verifies the behavior of multiple components together, whereas a unit test verifies only an individual component on its own.
Due to the highly dynamic nature of Python, integration tests are the best way—sometimes the only way—to gain confidence about the correctness of a program.
Unit tests can be used in addition to integration tests for verifying parts of a codebase that have a lot of edge cases or boundary conditions.

Item 110: Isolate Tests from Each Other with `setUp`, `tearDown`, `setUpModule`, and `tearDownModule`

TestCase subclasses (see Item 108: “Verify Related Behaviors in TestCase Subclasses”) often need to have the test environment set up before test methods can be run; this is sometimes called the test harness. To do this setup, you can override the setUp and tearDown methods of the TestCase parent class. These methods are called before and after each test method, respectively, allowing you to ensure that each test runs in isolation, which is an important best practice of proper testing.

For example, here I define a TestCase subclass that creates a temporary directory before each test and deletes its contents after each test finishes:

Click here to view code image

# environment_test.py
from pathlib import Path
from tempfile import TemporaryDirectory
from unittest import TestCase, main

class EnvironmentTest(TestCase):
    def setUp(self):
        self.test_dir = TemporaryDirectory()
        self.test_path = Path(self.test_dir.name)

    def tearDown(self):
        self.test_dir.cleanup()

    def test_modify_file(self):
        with open(self.test_path / "data.bin", "w") as f:
            ...

if __name__ == "__main__":
    main()

When programs get complicated, you can use additional tests to verify the end-to-end interactions between your modules instead of only testing code in isolation (see Item 109: “Prefer Integration Tests over Unit Tests”). One common problem is that setting up your test environment for integration tests can be computationally expensive and may require a lot of wall-clock time. For example, you might need to start a database process and wait for it to finish loading indexes before you can run your integration tests. This type of latency makes it impractical to do test preparation and cleanup for every test in the TestCase subclass’s setUp and tearDown methods.

To handle this situation, the unittest module also supports module-level test harness initialization. You can configure expensive resources a single time, and then have all TestCase classes and their test methods run without repeating that initialization. Later, when all tests in the module are finished, the test harness can be torn down a single time. Here I take advantage of this behavior by defining setUpModule and tearDownModule functions within the module that contains the TestCase subclasses:

Click here to view code image

# integration_test.py
from unittest import TestCase, main

def setUpModule():
    print("* Module setup")

def tearDownModule():
    print("* Module clean-up")

class IntegrationTest(TestCase):
    def setUp(self):
        print("* Test setup")

    def tearDown(self):
        print("* Test clean-up")

    def test_end_to_end1(self):
        print("* Test 1")

    def test_end_to_end2(self):
        print("* Test 2")

if __name__ == "__main__":
    main()

$ python3 integration_test.py
* Module setup
* Test setup
* Test 1
* Test clean-up
.* Test setup
* Test 2
* Test clean-up
.* Module clean-up

---------------------------------------------------------------
Ran 2 tests in 0.000s

OK

The setUpModule function is run by unittest only once, and it happens before any setUp methods are called. Similarly, tearDownModule happens after the tearDown method is called.

Things to Remember

Use the setUp and tearDown methods of TestCase to make sure your tests are isolated from each other and ensure a clean test environment.
For integration tests, use the setUpModule and tearDownModule module-level functions to manage any test harnesses you need for the entire lifetime of a test module and all the TestCase subclasses that it contains.

Item 111: Use Mocks to Test Code with Complex Dependencies

Another common need when writing tests (see Item 108: “Verify Related Behaviors in TestCase Subclasses”) is to use mocked functions and classes to simulate behaviors when it’s too difficult or slow to use the real thing. For example, say that I need a program to maintain the feeding schedule for animals at the zoo. Here I define a function to query a database for all the animals of a certain species and then return when they most recently ate:

Click here to view code image

class DatabaseConnection:
    ...

def get_animals(database, species):
    # Query the Database
    ...
    # Return a list of (name, last_mealtime) tuples

How do I get a DatabaseConnection instance to use for testing this function? Here I try to create one and pass it into the function being tested:

Click here to view code image

database = DatabaseConnection("localhost", "4444")

get_animals(database, "Meerkat")

>>>
Traceback ...
DatabaseConnectionError: Not connected

There’s no database running, so of course this fails. One solution is to actually stand up a database server and connect to it in the test. However, it’s a lot of work to fully automate starting up a database, configuring its schema, populating it with data, and so on in order to just run a simple unit test. Further, it will probably take a lot of wall-clock time to set up a database server, which would slow down these unit tests and make them harder to maintain (see Item 110: “Isolate Tests from Each Other with setUp, tearDown, setUpModule, and tearDownModule” for one potential solution).

An alternative approach is to mock out the database. A mock lets you provide expected responses for dependent functions, given a set of expected calls. It’s important not to confuse mocks with fakes. A fake would provide most of the behavior of the DatabaseConnection class but with a simpler implementation, such as a basic in-memory, single-threaded database with no persistence.

Python has the unittest.mock built-in module for creating mocks and using them in tests. Here I define a Mock instance that simulates the get_animals function without actually connecting to the database:

Click here to view code image

from datetime import datetime
from unittest.mock import Mock

mock = Mock(spec=get_animals)
expected = [
    ("Spot", datetime(2024, 6, 5, 11, 15)),
    ("Fluffy", datetime(2024, 6, 5, 12, 30)),
    ("Jojo", datetime(2024, 6, 5, 12, 45)),
]
mock.return_value = expected

The Mock class creates a mock function. The return_value attribute of the mock is the value to return when it is called. The spec argument indicates that the mock should act like the given object, which is a function in this case, and error if it’s used the wrong way. For example, here I try to treat the mock function as if it were a mock object with attributes:

Click here to view code image

mock.does_not_exist

>>>
Traceback ...
AttributeError: Mock object has no attribute 'does_not_exist'

Once it’s created, I can call the mock, get its return value, and verify that what it returns matches expectations. I use a unique object() value as the database argument because it won’t actually be used by the mock to do anything; all I care about is that the database parameter was correctly plumbed through to any dependent functions that needed a DatabaseConnection instance in order to work:

Click here to view code image

database = object()
result = mock(database, "Meerkat")
assert result == expected

This verifies that the mock responded correctly, but how do I know if the code that called the mock provided the correct arguments? For this, the Mock class provides the assert_called_once_with method, which verifies that a single call with exactly the given parameters was made:

Click here to view code image

mock.assert_called_once_with(database, "Meerkat")

If I supply the wrong parameters, an exception is raised, and any TestCase that the assertion is used in fails:

Click here to view code image

mock.assert_called_once_with(database, "Giraffe")

>>>
Traceback ...
AssertionError: expected call not found.
Expected: mock(<object object at 0x104728900>, 'Giraffe')
  Actual: mock(<object object at 0x104728900>, 'Meerkat')

If I actually don’t care about some of the individual parameters, such as exactly which database object was used, then I can indicate that any value is okay for an argument by using the unittest.mock.ANY constant. I can also use the assert_called_with method of Mock to verify that the most recent call to the mock—and there may have been multiple calls in this case—matches my expectations:

Click here to view code image

from unittest.mock import ANY

mock = Mock(spec=get_animals)
mock("database 1", "Rabbit")
mock("database 2", "Bison")
mock("database 3", "Meerkat")

mock.assert_called_with(ANY, "Meerkat")

ANY is useful in tests when a parameter is not core to the behavior that’s being tested. It’s often worth erring on the side of underspecifying tests by using ANY more liberally instead of overspecifying tests and having to plumb through various test parameter expectations.

The Mock class also makes it easy to mock exceptions being raised:

Click here to view code image

class MyError(Exception):
    pass

mock = Mock(spec=get_animals)
mock.side_effect = MyError("Whoops! Big problem")
result = mock(database, "Meerkat")

>>>
Traceback ...
MyError: Whoops! Big problem

There are many more features available, so be sure to see the module documentation for the full range of options (https://docs.python.org/3/library/unittest.mock.html).

Now that I’ve shown the mechanics of how a Mock object works, I can apply it to an actual testing situation to show how to use it effectively in writing tests. Here I define a function to do the rounds of feeding animals at the zoo, given a set of database-interacting functions:

Click here to view code image

def get_food_period(database, species):
    # Query the Database
    ...
    # Return a time delta

def feed_animal(database, name, when):
    # Write to the Database
    ...

def do_rounds(database, species):
    now = datetime.now()
    feeding_timedelta = get_food_period(database, species)
    animals = get_animals(database, species)
    fed = 0

    for name, last_mealtime in animals:
        if (now - last_mealtime) > feeding_timedelta:
            feed_animal(database, name, now)
            fed += 1

    return fed

The goal of my test is to verify that when do_rounds is run, the right animals got fed, that the latest feeding time was recorded to the database, and that the total number of animals fed returned by the function matches the correct total. In order to do all this, I need to mock out datetime.now so my tests can expect a stable time that isn’t affected by when the program is executed. I need to mock out get_food_period and get_animals to return values that would have come from the database. And I need to mock out feed_animal to accept data that would have been written back to the database.

The question is: Even if I know how to create these mock functions and set expectations, how do I get the do_rounds function that’s being tested to use the mock dependent functions instead of the real versions? One approach is to inject everything as keyword-only arguments (see Item 37: “Enforce Clarity with Keyword-Only and Positional-Only Arguments” for background):

Click here to view code image

def do_rounds(
    database,
    species,
    *,
    now_func=datetime.now,
    food_func=get_food_period,
    animals_func=get_animals,
    feed_func=feed_animal
):
    now = now_func()
    feeding_timedelta = food_func(database, species)
    animals = animals_func(database, species)
    fed = 0

    for name, last_mealtime in animals:
        if (now - last_mealtime) > feeding_timedelta:
            feed_func(database, name, now)
            fed += 1

    return fed

To test this function, I need to create all the Mock instances upfront and set their expectations:

Click here to view code image

from datetime import timedelta

now_func = Mock(spec=datetime.now)
now_func.return_value = datetime(2024, 6, 5, 15, 45)

food_func = Mock(spec=get_food_period)
food_func.return_value = timedelta(hours=3)

animals_func = Mock(spec=get_animals)
animals_func.return_value = [
    ("Spot", datetime(2024, 6, 5, 11, 15)),
    ("Fluffy", datetime(2024, 6, 5, 12, 30)),
    ("Jojo", datetime(2024, 6, 5, 12, 45)),
]

feed_func = Mock(spec=feed_animal)

Then I can run the test by passing the mocks into the do_rounds function to override the defaults:

Click here to view code image

result = do_rounds(
    database,
    "Meerkat",
    now_func=now_func,
    food_func=food_func,
    animals_func=animals_func,
    feed_func=feed_func,
)

assert result == 2

Finally, I can verify that all the calls to dependent functions matched my expectations:

Click here to view code image

from unittest.mock import call

food_func.assert_called_once_with(database, "Meerkat")

animals_func.assert_called_once_with(database, "Meerkat")

feed_func.assert_has_calls(
    [
        call(database, "Spot", now_func.return_value),
        call(database, "Fluffy", now_func.return_value),
    ],
    any_order=True,
)

I don’t verify the parameters to the datetime.now mock or how many times it was called because that’s indirectly verified by the return value of the function. For get_food_period and get_animals, I verify a single call with the specified parameters by using assert_called_once_with. For the feed_animal function, I verify that two calls were made—and their order didn’t matter—to write to the database using the unittest.mock.call helper and the assert_has_calls method.

This approach of using keyword-only arguments for injecting mocks works, but it’s quite verbose and requires changing every function you want to test. The unittest.mock.patch family of functions makes injecting mocks easier. It temporarily reassigns an attribute of a module or class, such as the database-accessing functions that I defined above. For example, here I override get_animals to be a mock using patch:

Click here to view code image

from unittest.mock import patch

print("Outside patch:", get_animals)

with patch("__main__.get_animals"):
    print("Inside patch: ", get_animals)

print("Outside again:", get_animals)

>>>
Outside patch: <function get_animals at 0x104eda160>
Inside patch:  <MagicMock name='get_animals' id='4397863264'>
Outside again: <function get_animals at 0x104eda160>

patch works for many modules, classes, and attributes. It can be used in with statements (see Item 82: “Consider contextlib and with Statements for Reusable try/finally Behavior”), as a function decorator (see Item 38: “Define Function Decorators with functools.wraps”), or in the setUp and tearDown methods of TestCase classes (see Item 110: “Isolate Tests from Each Other with setUp, tearDown, setUpModule, and tearDownModule”).

However, patch doesn’t work in all cases. For example, to test do_rounds, I need to mock out the current time returned by the datetime.now class method. Python won’t let me do that because the datetime class is defined in a C-extension module, which can’t be modified in this way:

Click here to view code image

fake_now = datetime(2024, 6, 5, 15, 45)

with patch("datetime.datetime.now"):
    datetime.now.return_value = fake_now

>>>
Traceback ...
TypeError: cannot set 'now' attribute of immutable type
➥'datetime.datetime'

The above exception was the direct cause of the following
➥exception:

Traceback ...
TypeError: cannot set 'now' attribute of immutable type
➥'datetime.datetime'

To work around this, I can create another helper function to fetch time that can be patched:

Click here to view code image

def get_do_rounds_time():
    return datetime.now()

def do_rounds(database, species):
    now = get_do_rounds_time()
    ...

with patch("__main__.get_do_rounds_time"):
    ...

Alternatively, I can use a keyword-only argument for the datetime.now mock and use patch for all of the other mocks:

Click here to view code image

def do_rounds(database, species, *, now_func=datetime.now):
    now = now_func()
    feeding_timedelta = get_food_period(database, species)
    animals = get_animals(database, species)
    fed = 0

    for name, last_mealtime in animals:
        if (now - last_mealtime) > feeding_timedelta:
            feed_animal(database, name, now)
            fed += 1

    return fed

I’m going to go with the latter approach. Now I can use the patch.multiple function to create many mocks and then set their expectations:

Click here to view code image

from unittest.mock import DEFAULT

with patch.multiple(
    "__main__",
    autospec=True,
    get_food_period=DEFAULT,
    get_animals=DEFAULT,
    feed_animal=DEFAULT,
):
    now_func = Mock(spec=datetime.now)
    now_func.return_value = datetime(2024, 6, 5, 15, 45)
    get_food_period.return_value = timedelta(hours=3)
    get_animals.return_value = [
        ("Spot", datetime(2024, 6, 5, 11, 15)),
        ("Fluffy", datetime(2024, 6, 5, 12, 30)),
        ("Jojo", datetime(2024, 6, 5, 12, 45)),
    ]

The keyword arguments to patch.multiple correspond to names in the __main__ module that I want to override during the test. The DEFAULT value indicates that I want a standard Mock instance to be created for each name. All the generated mocks will adhere to the specification of the objects they are meant to simulate, thanks to the autospec=True parameter.

With the setup ready, I can run the test and verify that the calls were correct inside the with statement that used patch.multiple:

Click here to view code image

    result = do_rounds(database, "Meerkat", now_func=now_func)
    assert result == 2

    get_food_period.assert_called_once_with(database, "Meerkat")
    get_animals.assert_called_once_with(database, "Meerkat")
    feed_animal.assert_has_calls(
        [
            call(database, "Spot", now_func.return_value),
            call(database, "Fluffy", now_func.return_value),
        ],
        any_order=True,
    )

These mocks work as expected, but it’s important to realize that it’s possible to further improve the readability of these tests and reduce boilerplate by refactoring your code to be more testable by design (see Item 112: “Encapsulate Dependencies to Facilitate Mocking and Testing”).

Things to Remember

The unittest.mock module provides a way to simulate the behavior of interfaces using the Mock class. Mocks are useful in tests when it’s difficult to set up the dependencies that are required by the code that’s being tested.
When using mocks, it’s important to verify both the behavior of the code being tested and how dependent functions were called by that code, using the Mock.assert_called_once_with family of methods.
Keyword-only arguments and the unittest.mock.patch family of functions can be used to inject mocks into the code being tested.

Item 112: Encapsulate Dependencies to Facilitate Mocking and Testing

In the previous item (see Item 111: “Use Mocks to Test Code with Complex Dependencies”), I showed how to use the facilities of the unittest.mock built-in module—including the Mock class and patch family of functions—to write tests that have complex dependencies, such as a database. However, the resulting test code requires a lot of boilerplate, which could make it more difficult for new readers of the code to understand what the tests are trying to verify.

One way to improve these tests is to use a wrapper object to encapsulate the database’s interface instead of passing a DatabaseConnection object to functions as an argument. It’s often worth refactoring your code (see Item 123: “Consider warnings to Refactor and Migrate Usage” for one approach) to use better abstractions because it facilitates creating mocks and writing tests. Here I redefine the various database helper functions from the previous item as methods on a class instead of as independent functions:

Click here to view code image

class ZooDatabase:
    ...

    def get_animals(self, species):
        ...

    def get_food_period(self, species):
        ...

    def feed_animal(self, name, when):
        ...

Now I can redefine the do_rounds function to call methods on a ZooDatabase object:

Click here to view code image

from datetime import datetime

def do_rounds(database, species, *, now_func=datetime.now):
    now = now_func()
    feeding_timedelta = database.get_food_period(species)
    animals = database.get_animals(species)
    fed = 0

    for name, last_mealtime in animals:
        if (now - last_mealtime) >= feeding_timedelta:
            database.feed_animal(name, now)
            fed += 1

    return fed

Writing a test for do_rounds is now a lot easier because I no longer need to use unittest.mock.patch to inject the mock into the code being tested. Instead, I can create a Mock instance to represent a ZooDatabase and pass that in as the database parameter. The Mock class returns a mock object for any attribute name that is accessed. Those attributes can be called like methods, which I can then use to set expectations and verify calls. This makes it easy to mock out all the methods of a class:

Click here to view code image

from unittest.mock import Mock

database = Mock(spec=ZooDatabase)
print(database.feed_animal)
database.feed_animal()
database.feed_animal.assert_any_call()

>>>
<Mock name='mock.feed_animal' id='4386901024'>

I can rewrite the Mock setup code using the ZooDatabase encapsulation:

Click here to view code image

from datetime import timedelta
from unittest.mock import call

now_func = Mock(spec=datetime.now)
now_func.return_value = datetime(2019, 6, 5, 15, 45)

database = Mock(spec=ZooDatabase)
database.get_food_period.return_value = timedelta(hours=3)
database.get_animals.return_value = [
    ("Spot", datetime(2019, 6, 5, 11, 15)),
    ("Fluffy", datetime(2019, 6, 5, 12, 30)),
    ("Jojo", datetime(2019, 6, 5, 12, 55)),
]

Then I can run the function being tested and verify that all dependent methods were called as expected:

Click here to view code image

result = do_rounds(database, "Meerkat", now_func=now_func)
assert result == 2

database.get_food_period.assert_called_once_with("Meerkat")
database.get_animals.assert_called_once_with("Meerkat")
database.feed_animal.assert_has_calls(
    [
        call("Spot", now_func.return_value),
        call("Fluffy", now_func.return_value),
    ],
    any_order=True,
)

Using the spec parameter to Mock is especially useful when mocking classes because it ensures that the code under test doesn’t call a misspelled method name by accident. This allows you to avoid a common pitfall where the same bug is present in both the code and the unit test, masking a real error that will later reveal itself in production:

Click here to view code image

database.bad_method_name()

>>>
Traceback ...
AttributeError: Mock object has no attribute 'bad_method_name'

If I want to test this program end-to-end with a midlevel integration test (see Item 109: “Prefer Integration Tests over Unit Tests”), I still need a way to inject a mock ZooDatabase into the program. I can do this by creating a helper function that acts as a seam for dependency injection. Here I define such a helper function that caches a ZooDatabase object in module scope by using a global statement (see Item 120: “Consider Module-Scoped Code to Configure Deployment Environments” for background):

Click here to view code image

DATABASE = None

def get_database():
    global DATABASE
    if DATABASE is None:
        DATABASE = ZooDatabase()
    return DATABASE

def main(argv):
    database = get_database()
    species = argv[1]
    count = do_rounds(database, species)
    print(f"Fed {count} {species}(s)")
    return 0

Now I can inject the mock ZooDatabase using patch, run the test, and verify the program’s output. I’m not using a mock datetime.now here; instead, I’m relying on the database records returned by the mock to be relative to the current time in order to produce similar behavior to the unit test. This approach is more flaky than mocking everything, but it also tests more surface area:

Click here to view code image

import contextlib
import io
from unittest.mock import patch

with patch("__main__.DATABASE", spec=ZooDatabase):
    now = datetime.now()

    DATABASE.get_food_period.return_value = timedelta(hours=3)
    DATABASE.get_animals.return_value = [
        ("Spot", now - timedelta(minutes=4.5)),
        ("Fluffy", now - timedelta(hours=3.25)),
        ("Jojo", now - timedelta(hours=3)),
    ]

    fake_stdout = io.StringIO()
    with contextlib.redirect_stdout(fake_stdout):
        main(["program name", "Meerkat"])

    found = fake_stdout.getvalue()
    expected = "Fed 2 Meerkat(s)\n"

    assert found == expected

The results match my expectations. Creating this integration test was straightforward because I designed the implementation to make it easier to test.

Things to Remember

When unit tests require a lot of repeated boilerplate to set up mocks, one solution may be to encapsulate the functionality of dependencies into classes that are more easily mocked.
The Mock class of the unittest.mock built-in module simulates classes by returning a new mock, which can act as a mock method, for each attribute that is accessed.
For end-to-end tests, it’s valuable to refactor your code to have more helper functions that can act as explicit seams to use for injecting mock dependencies in tests.

Item 113: Use `assertAlmostEqual` to Control Precision in Floating Point Tests

Python’s float type is a double-precision floating point number (following the IEEE 754 standard). This scheme has limitations (see Item 106: “Use decimal when Precision Is Paramount”), but floating point numbers are useful for many purposes and are well supported in Python.

Often, it’s important to test mathematical code for boundary conditions and other potential sources of error (see Item 109: “Prefer Integration Tests over Unit Tests” for details). Unfortunately, writing automated tests involving floating point numbers can be tricky. For example, here I use the unittest built-in module to define a test that tries (and fails) to verify the result of the expression 5 / 3:

Click here to view code image

import unittest

class MyTestCase(unittest.TestCase):
    def test_equal(self):
        n = 5
        d = 3
        self.assertEqual(1.667, n / d)  # Raises

...

>>>
Traceback ...
AssertionError: 1.667 != 1.6666666666666667

The issue is that in Python the expression 5 / 3 results in a number that can't be represented exactly as a float value (which is evidenced by the repeating 6 after the decimal point). The expected value passed to assertEqual, 1.667, isn’t sufficiently precise to exactly match the calculated result. (They’re different by 0.000333….) Thus, the assertEqual method call fails. I could solve this problem by making the expected result more precise, such as the literal 1.6666666666666667. But in practice, using this level of precision makes numerical tests hard to maintain. The order of operations can produce different results due to rounding behavior. It’s also possible for architectural differences (such as x86 vs. AArch64) to affect the results.

Here I show this rounding problem by reordering a calculation in a way that doesn’t look like it should affect the results but it does (note the last digit):

print(5 / 3 * 0.1)
print(0.1 * 5 / 3)

>>>
0.16666666666666669
0.16666666666666666

To deal with this in automated tests, the assertAlmostEqual helper method in the TestCase class can be used to do approximate comparisons between floating point numbers. It properly deals with infinity and NaN conditions, and minimizes the introduction of error due to rounding. Here I use this method to verify that the numbers are equal when rounded to two decimal places after the decimal point:

Click here to view code image

class MyTestCase2(unittest.TestCase):
    def test_equal(self):
        ...
        # Changed
        self.assertAlmostEqual(1.667, n / d, places=2)
...

>>>
.
---------------------------------------------------------------
Ran 1 test in 0.000s

OK

The places parameter for assertAlmostEqual works well in verifying numbers with a fractional portion between zero and one. But floating point behavior and repeating decimals might affect larger numbers as well. For example, consider the large difference, in absolute terms, between these two calculations, even though the only change is the addition of 0.001 to one coefficient:

print(1e24 / 1.1e16)
print(1e24 / 1.101e16)

>>>
90909090.9090909
90826521.34423251

The difference between these values is approximately 82,569. Depending on the use case, that margin might matter, or it might not. To enable you to express your tolerance for imprecision, you can provide a delta argument to the assertAlmostEqual helper method. This parameter causes the method to consider the absolute difference between the numbers and raise an AssertionError exception only if it’s larger than the delta provided.

Here I use this option to specify a tolerance of 100,000, which is more than the 82,569 difference, allowing both assertions to pass:

Click here to view code image

class MyTestCase3(unittest.TestCase):
    def test_equal(self):
        a = 1e24 / 1.1e16
        b = 1e24 / 1.101e16
        self.assertAlmostEqual(90.9e6, a, delta=0.1e6)
        self.assertAlmostEqual(90.9e6, b, delta=0.1e6)

In some situations, you might need to assert the opposite: that two numbers are not close to each other given a tolerance or number of decimal places. The TestCase class also provides the assertNotAlmostEqual method to make this easy. To handle more complex use cases when comparing numbers in test code or outside tests, the math built-in module provides the isclose function, which has similar functionality, and more.

Things to Remember

Due to rounding behavior, floating point numbers, especially their fractional parts, might change as a result of the order of operations applied.
Testing floating point values with assertEqual can lead to flaky tests because this method considers the full precision of the numbers being compared.
The assertAlmostEqual and assertNotAlmostEqual methods allow you to specify places or delta parameters to indicate your tolerance for differences when comparing floating point numbers.

Item 114: Consider Interactive Debugging with `pdb`

Everyone encounters bugs in their code while developing programs. Using the print function can help you track down the sources of many issues (see Item 12: “Understand the Difference Between repr and str when Printing Objects”). Writing tests for specific cases that cause trouble is another great way to identify problems (see Item 109: “Prefer Integration Tests over Unit Tests”).

But these tools aren’t enough to find every root cause. When you need something more powerful, it’s time to try Python’s built-in interactive debugger. The debugger lets you inspect the state of a running program, print local variables, and step through execution one statement at a time.

In most other programming languages, you use a debugger by specifying what line of a source file you’d like to stop on and then execute the program. In contrast, with Python, the easiest way to use the debugger is by modifying your program to directly initiate the debugger just before you think you’ll have an issue worth investigating. This means there is no difference between starting a Python program in order to run the debugger and starting it normally.

To initiate the debugger, all you have to do is call the breakpoint built-in function. This is equivalent to importing the pdb built-in module and running its set_trace function:

Click here to view code image

# always_breakpoint.py
import math

def compute_rmse(observed, ideal):
    total_err_2 = 0
    count = 0
    for got, wanted in zip(observed, ideal):
        err_2 = (got - wanted) ** 2
        breakpoint()  # Start the debugger here
        total_err_2 += err_2
        count += 1

    mean_err = total_err_2 / count
    rmse = math.sqrt(mean_err)
    return rmse

result = compute_rmse(
    [1.8, 1.7, 3.2, 6],
    [2, 1.5, 3, 5],
)
print(result)

As soon as the breakpoint function runs, the program pauses its execution before the line of code immediately following the breakpoint call. The terminal that started the program will turn into a Python debugging shell:

Click here to view code image

$ python3 always_breakpoint.py
> always_breakpoint.py(12)compute_rmse()
-> total_err_2 += err_2
(Pdb)

At the (Pdb) prompt, you can type in the names of local variables to see their values printed out (or use p <name>). You can see a list of all local variables by calling the locals built-in function. You can import modules, inspect global state, construct new objects, and even modify parts of the running program. Some Python statements and language features aren’t supported in this debugging prompt, but you can access a standard Python REPL with access to program state by using the interact command.

In addition, the debugger has a variety of special commands to control and understand program execution; type help to see the full list. Three very useful commands make inspecting the running program easier:

where: Print the current execution call stack. This lets you figure out where you are in your program and how you arrived at the breakpoint trigger.
up: Move your scope up the execution call stack to the caller of the current function. This allows you to inspect the local variables in higher levels of the program that led to the breakpoint.
down: Move your scope back down the execution call stack one level.

When you’re done inspecting the current state, you can use these five debugger commands to control the program’s execution in different ways:

step: Run the program until the next line of execution in the program and then return control to the debugger prompt. If the next line of execution includes calling a function, the debugger stops within the function that was called.
next: Run the program until the next line of execution in the current function and then return control to the debugger prompt. If the next line of execution includes calling a function, the debugger will not stop until the called function has returned.
return: Run the program until the current function returns and then return control to the debugger prompt.
continue: Continue running the program until the next breakpoint is hit (either through an explicit breakpoint call or when encountering a breakpoint added by a prior debugger command).
quit: Exit the debugger and end the program. Run this command if you’ve found the problem, gone too far, or need to make program modifications and try again.

The breakpoint function can be called anywhere in a program. If you know that the problem you’re trying to debug happens only under special circumstances, then you can just write plain old Python code to call breakpoint after a specific condition is met. For example, here I start the debugger only if the squared error for a datapoint is more than 1:

Click here to view code image

# conditional_breakpoint.py
def compute_rmse(observed, ideal):
    ...
    for got, wanted in zip(observed, ideal):
        err_2 = (got - wanted) ** 2
        if err_2 >= 1:  # Start the debugger if True
            breakpoint()
        total_err_2 += err_2
        count += 1
    ...

result = compute_rmse(
    [1.8, 1.7, 3.2, 7],
    [2, 1.5, 3, 5],
)
print(result)

When I run the program and it enters the debugger, I can confirm that the condition was true by inspecting local variables:

Click here to view code image

$ python3 conditional_breakpoint.py
> conditional_breakpoint.py(14)compute_rmse()
-> total_err_2 += err_2
(Pdb) wanted
5
(Pdb) got
7
(Pdb) err_2
4

Another useful way to reach the debugger prompt is by using postmortem debugging. This enables you to debug a program after it’s already raised an exception and crashed. This is especially helpful when you’re not quite sure where to put the breakpoint function call. Here I have a script that will crash due to the 7j complex number being present in one of the function’s arguments:

Click here to view code image

# postmortem_breakpoint.py
import math

def compute_rmse(observed, ideal):
    ...

result = compute_rmse(
    [1.8, 1.7, 3.2, 7j],  # Bad input
    [2, 1.5, 3, 5],
)
print(result)

I use the command line python3 -m pdb -c continue <program path> to run the program under control of the pdb module. The continue command tells pdb to get the program started immediately. Once it’s running, the program hits a problem and automatically enters the interactive debugger, at which point I can inspect the program state:

Click here to view code image

$ python3 -m pdb -c continue postmortem_breakpoint.py
Traceback (most recent call last):
  File "pdb.py", line 1944, in main
    pdb._run(target)
  File "pdb.py", line 1738, in _run
    self.run(target.code)
  File "bdb.py", line 606, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "postmortem_breakpoint.py", line 22, in <module>
    result = compute_rmse(
        [1.8, 1.7, 3.2, 7j],  # Bad input
        [2, 1.5, 3, 5],
    )
  File "postmortem_breakpoint.py", line 17, in compute_rmse
    rmse = math.sqrt(mean_err)
TypeError: must be real number, not complex
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> postmortem_breakpoint.py(17)compute_rmse()
-> rmse = math.sqrt(mean_err)
(Pdb) mean_err
(-5.97-17.5j)

You can also use postmortem debugging after hitting an uncaught exception in the interactive Python interpreter by calling the pm function of the pdb module (which is often done in a single line as import pdb; pdb.pm()):

Click here to view code image

$ python3
>>> import my_module
>>> my_module.compute_stddev([5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "my_module.py", line 20, in compute_stddev
    variance = compute_variance(data)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "my_module.py", line 15, in compute_variance
    variance = err_2_sum / (len(data) - 1)
               ~~~~~~~~~~^~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
>>> import pdb; pdb.pm()
> my_module.py(15)compute_variance()
-> variance = err_2_sum / (len(data) - 1)
(Pdb) err_2_sum
0.0
(Pdb) len(data)
1

Things to Remember

You can initiate the Python interactive debugger at a point of interest directly in your program by calling the breakpoint built-in function.
pdb shell commands let you precisely control program execution and allow you to alternate between inspecting program state and progressing program execution.
The pdb module can be used to debug exceptions after they happen in independent Python programs (using python -m pdb -c continue <program path>) or the interactive Python interpreter (using import pdb; pdb.pm()).

Item 115: Use `tracemalloc` to Understand Memory Usage and Leaks

Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the referenced object is also cleared from memory, freeing up that space for other data. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected.

In theory, this means that most Python developers don’t have to worry about allocating or deallocating memory in their programs. It’s taken care of automatically by the language and the CPython runtime. However, in practice, programs eventually do run out of memory when references that are no longer useful are still being held. Figuring out where a Python program is using or leaking memory can be challenging.

One way to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector. Although it’s quite a blunt tool, this approach lets you quickly get a sense of where your program’s memory is being used. Here I define a module that fills up memory by keeping references:

Click here to view code image

# waste_memory.py
import os

class MyObject:
    def __init__(self):
        self.data = os.urandom(100)

def get_data():
    values = []
    for _ in range(100):
        obj = MyObject()
        values.append(obj)
    return values

def run():
    deep_values = []
    for _ in range(100):
        deep_values.append(get_data())
    return deep_values

Then I run a program that uses the gc built-in module to print out how many objects were created during execution, along with a small sample of allocated objects:

Click here to view code image

# using_gc.py
import gc

found_objects = gc.get_objects()
print("Before:", len(found_objects))

import waste_memory

hold_reference = waste_memory.run()

found_objects = gc.get_objects()
print("After: ", len(found_objects))
for obj in found_objects[:3]:
    print(repr(obj)[:100])

>>>
Before: 6207
After:  16801
<waste_memory.MyObject object at 0x10390aeb8>
<waste_memory.MyObject object at 0x10390aef0>
<waste_memory.MyObject object at 0x10390af28>
...

The problem with gc.get_objects is that it doesn’t tell you anything about how the objects were allocated. In complicated programs, objects of a specific class could be allocated many different ways. Knowing the overall number of objects isn’t nearly as important as identifying the code responsible for allocating the objects that are leaking memory.

Python 3.4 introduced a new tracemalloc built-in module for solving this problem. tracemalloc makes it possible to connect an object back to where it was allocated. You use it by taking before and after snapshots of memory usage and comparing them to see what’s changed. Here I use this approach to print out the top three memory usage offenders in a program:

Click here to view code image

# top_n.py
import tracemalloc

tracemalloc.start(10)                      # Set stack depth
time1 = tracemalloc.take_snapshot()        # Before snapshot

import waste_memory

x = waste_memory.run()                     # Usage to debug
time2 = tracemalloc.take_snapshot()        # After snapshot

stats = time2.compare_to(time1, "lineno")  # Compare snapshots
for stat in stats[:3]:
    print(stat)

>>>
waste_memory.py:5: size=2392 KiB (+2392 KiB), count=29994
➥(+29994), average=82 B
waste_memory.py:10: size=547 KiB (+547 KiB), count=10001
➥(+10001), average=56 B
waste_memory.py:11: size=82.8 KiB (+82.8 KiB), count=100
➥(+100), average=848 B

The size and count labels in the output make it immediately clear which objects are dominating my program’s memory usage and where in the source code they were allocated.

The tracemalloc module can also print out the full stack trace of each allocation (up to the number of frames passed to the tracemalloc.start function). Here I print out the stack trace of the biggest source of memory usage in the program:

Click here to view code image

# with_trace.py
import tracemalloc

tracemalloc.start(10)
time1 = tracemalloc.take_snapshot()

import waste_memory

x = waste_memory.run()
time2 = tracemalloc.take_snapshot()

stats = time2.compare_to(time1, "traceback")
top = stats[0]
print("Biggest offender is:")
print("\n".join(top.traceback.format()))

>>>
Biggest offender is:
  File "with_trace.py", line 11
    x = waste_memory.run()
  File "waste_memory.py", line 20
    deep_values.append(get_data())
  File "waste_memory.py", line 12
    obj = MyObject()
  File "waste_memory.py", line 6
    self.data = os.urandom(100)

A stack trace like this is most valuable for figuring out which particular usage of a common function or class is responsible for memory consumption in a program.

For more advanced memory profiling needs there are also community packages (see Item 116: “Know Where to Find Community-Built Modules”) to consider, such as Memray (https://github.com/bloomberg/memray).

Things to Remember

It can be difficult to understand how Python programs use and leak memory.
The gc module can help you understand which objects exist, but it has no information about how they were allocated.
The tracemalloc built-in module provides powerful tools for understanding the sources of memory usage.