Python has language features that help you construct well-defined APIs with clear interface boundaries. The Python community has established best practices to maximize the maintainability of code over time. In addition, some standard tools that ship with Python enable large teams to work together across disparate environments.
Collaborating with others on Python programs requires being deliberate in how you write your code. Even if you’re working on your own, chances are you’ll be using code written by someone else via the standard library or open source packages. It’s important to understand the mechanisms that make it easy to collaborate with other Python programmers.
Python has a central repository of modules (https://pypi.org) that you can install and use in your programs. These modules are built and maintained by people like you: the Python community. When you find yourself facing an unfamiliar challenge, the Python Package Index (PyPI) is a great place to look for code that will get you closer to your goal.
To use the Package Index, you need to use the command-line tool named pip (a recursive acronym for “pip installs packages”). You can run python3 -m pip to ensure that packages are installed for the correct version of Python on your system (see Item 1: “Know Which Version of Python You’re Using”). Using pip to install a new module is simple. For example, here I install the numpy module (see Item 94: “Know When and How to Replace Python with Another Programming Language” for related info):
$ python3 -m pip install numpy
Collecting numpy
Downloading ...
Installing collected packages: numpy
Successfully installed numpy-2.0.0
pip is best used together with the built-in module venv to consistently track sets of packages to install for your projects (see Item 117: “Use Virtual Environments for Isolated and Reproducible Dependencies”). You can also create your own PyPI packages to share with the Python community or host your own private package repositories for use with pip.
Each module in PyPI has its own software license. For most of the packages, especially the popular ones, the licenses are free or open source (see https://opensource.org for details). Such a license typically allows you to include a copy of the module with your program (including for end-user distribution; see Item 125: “Prefer Open Source Projects for Bundling Python Programs over zipimport and zipapp”); when in doubt, talk to a lawyer.
The Python Package Index (PyPI) contains a wealth of common packages that are built and maintained by the Python community.
pip is the command-line tool you can use to install packages from PyPI.
The majority of PyPI modules are free and open source software.
Building larger and more complex programs often leads you to rely on various packages from the Python community (see Item 116: “Know Where to Find Community-Built Modules”). You’ll find yourself running the python3 -m pip command-line tool to install packages like numpy, pandas, and many others.
The problem is that, by default, pip installs new packages in a global location. That causes all Python programs on your system to be affected by these installed modules. In theory, this shouldn’t be an issue. If you install a package and never import it, how could it affect your programs?
The trouble comes from transitive dependencies: the packages that the packages you install depend on. For example, after installing the Sphinx package, you can see what it depends on by asking pip:
$ python3 -m pip show Sphinx
Name: Sphinx
Version: 7.4.6
Summary: Python documentation generator
Location: /usr/local/lib/python3.13/site-packages
Requires: alabaster, babel, docutils, imagesize, Jinja2,
➥ packaging, Pygments, requests, snowballstemmer,
➥ sphinxcontrib-applehelp, sphinxcontrib-devhelp,
➥ sphinxcontrib-htmlhelp, sphinxcontrib-jsmath,
➥ sphinxcontrib-qthelp, sphinxcontrib-serializinghtml
If you install another package like flask, you can see that it, too, depends on the Jinja2 package:
$ python3 -m pip show flask
Name: Flask
Version: 3.0.3
Summary: A simple framework for building complex web applications.
Location: /usr/local/lib/python3.13/site-packages
Requires: blinker, click, itsdangerous, Jinja2, Werkzeug
A dependency conflict can arise as Sphinx and flask diverge over time. Perhaps right now they both require the same version of Jinja2, and everything is fine. But six months or a year from now, Jinja2 may release a new version that makes breaking changes to users of the library. If you update your global version of Jinja2 with python3 -m pip install --upgrade Jinja2, you may find that Sphinx breaks, while flask keeps working.
The cause of such breakage is that Python can have only a single global version of a module installed at a time. If one of your installed packages must use the new version and another package must use the old version, your system isn’t going to work properly; this situation is often called dependency hell.
Such breakage can even happen when package maintainers try their best to preserve API compatibility between releases (see Item 119: “Use Packages to Organize Modules and Provide Stable APIs”). New versions of a library can subtly change behaviors that API-consuming code relies on. Users on a system may upgrade one package to a new version but not others, which could break dependencies. If you’re not careful, there’s a constant risk of the ground moving beneath your feet.
These difficulties are magnified when you collaborate with other developers who do their work on separate computers. It’s best to assume the worst: that the versions of Python and global packages that they have installed on their machines will be slightly different than yours. These differences can cause frustrating situations such as a codebase working perfectly on one programmer’s machine and being completely broken on another’s.
The solution to all of these problems is to use a tool called venv, which provides virtual environments. Since Python 3.4, pip and the venv module have been available by default along with the Python installation (accessible with python -m venv).
venv allows you to create isolated versions of the Python environment. Using venv, you can have many different versions of the same package installed on the same system at the same time without conflicts. This means you can work on many different projects and use many different tools on the same computer.
venv does this by installing explicit versions of packages and their dependencies into completely separate directory structures. This makes it possible to reproduce a Python environment that you know will work with your code. It’s a reliable way to avoid surprising breakages.
venv on the Command LineHere’s a quick tutorial on how to use venv effectively. Before using the tool, it’s important to note the meaning of the python3 command line on your system. On my computer, python3 is located in the /usr/local/bin directory and evaluates to version 3.13 (see Item 1: “Know Which Version of Python You’re Using”):
$ which python3
/usr/local/bin/python3
$ python3 --version
Python 3.13.0
To demonstrate the setup of my environment, I can test that running a command to import the numpy module doesn’t cause an error. This works because I already have the numpy package installed as a global module:
$ python3 -c 'import numpy'
$
Now I use venv to create a new virtual environment called myproject. Each virtual environment must live in its own unique directory. The result of the command is a tree of directories and files that are used to manage the virtual environment:
$ python3 -m venv myproject
$ cd myproject
$ ls
bin include lib pyvenv.cfg
To start using the virtual environment, I use the source command from my shell on the bin/activate script. activate modifies all my environment variables to match the virtual environment. It also updates my command-line prompt to include the virtual environment name (myproject) to make it extremely clear what I’m working on:
$ source bin/activate
(myproject)$
On Windows the same script is available as:
C:\> myproject\Scripts\activate.bat
(myproject) C:>
And with PowerShell it is available as:
PS C:\> myproject\Scripts\activate.ps1
(myproject) PS C:>
After activation, you can see that the path to the python3 command-line tool has moved to within the virtual environment directory:
(myproject)$ which python3
/tmp/myproject/bin/python3
(myproject)$ ls -l /tmp/myproject/bin/python3
... -> /usr/local/bin/python3
This ensures that changes to the outside system will not affect the virtual environment. Even if the outer system upgrades its default python3 to version 3.14, my virtual environment will still explicitly point to version 3.13.
The virtual environment I created with venv starts with no packages installed except for pip and setuptools. Trying to use the numpy package that was installed as a global module in the outside system will fail because it’s unknown to the virtual environment:
(myproject)$ python3 -c 'import numpy'
Traceback (most recent call last):
File "<string>", line 1, in <module>
import numpy
ModuleNotFoundError: No module named 'numpy'
I can use the pip command-line tool to install the numpy module into my virtual environment:
(myproject)$ python3 -m pip install numpy
Collecting numpy
Downloading ...
Installing collected packages: numpy
Successfully installed numpy-2.0.0
Once it’s installed, I can verify that it’s working by using the same test import command:
(myproject)$ python3 -c 'import numpy'
(myproject)$
When I’m done with a virtual environment and want to go back to my default system, I use the deactivate command. This restores my environment to the system defaults, including the location of the python3 command-line tool:
(myproject)$ which python3
/tmp/myproject/bin/python3
(myproject)$ deactivate
$ which python3
/usr/local/bin/python3
If I ever want to work in the myproject environment again, I can just run source bin/activate (or the similar command on Windows) in the directory, as before.
Once you are in a virtual environment, you can continue installing packages in it with pip as you need them. Eventually, you might want to copy your environment somewhere else. For example, say that I want to reproduce the development environment from my workstation on a server in a datacenter. Or maybe I want to clone someone else’s environment on my own machine so I can help debug their code. venv makes such tasks easy.
I can use the python3 -m pip freeze command to save all my explicit package dependencies into a file (which, by convention, is named requirements.txt):
(myproject)$ python3 -m pip freeze > requirements.txt
(myproject)$ cat requirements.txt
certifi==2024.7.4
charset-normalizer==3.3.2
idna==3.7
numpy==2.0.0
requests==2.32.3
urllib3==2.2.2
Now imagine that I’d like to have another virtual environment that matches the myproject environment. I can create a new directory as before by using venv and activating it:
$ python3 -m venv otherproject
$ cd otherproject
$ source bin/activate
(otherproject)$
The new environment will have no extra packages installed:
(otherproject)$ python3 -m pip list
Package Version
------- -------
pip 24.1.1
I can install all of the packages from the first environment by running python3 -m pip install on the requirements.txt file that I generated with the python3 -m pip freeze command:
(otherproject)$ python3 -m pip install -r
➥/tmp/myproject/ requirements.txt
This command cranks along for a little while as it retrieves and installs all the packages required to reproduce the first environment. When it’s done, I can list the set of installed packages in the second virtual environment and should see the same list of dependencies found in the first virtual environment:
(otherproject)$ python3 -m pip list
Package Version
------------------ --------
certifi 2024.7.4
charset-normalizer 3.3.2
idna 3.7
pip 24.1.1
urllib3 2.2.2
Using a requirements.txt file is ideal for collaborating with others through a revision control system. You can commit changes to your code at the same time you update your list of package dependencies, ensuring that they move in lockstep. However, it’s important to note that the specific version of Python you’re using is not included in the requirements.txt file, so that must be managed separately.
The gotcha with virtual environments is that moving them breaks everything because all the paths, including the python3 command-line tool, are hard-coded to the environment’s install directory. But ultimately this limitation doesn’t matter. The whole purpose of virtual environments is to make it easy to reproduce a setup. Instead of moving a virtual environment directory, just use python3 -m pip freeze on the old one, create a new virtual environment somewhere else, and reinstall everything from the requirements.txt file.
Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts.
Virtual environments are created with python -m venv, enabled with source bin/activate, and disabled with deactivate.
You can dump all the requirements of an environment with python3 -m pip freeze. You can reproduce an environment by running python3 -m pip install -r requirements.txt.
Documentation in Python is extremely important because of the dynamic nature of the language. Python provides built-in support for attaching documentation to blocks of code. Unlike with many other languages, the documentation from a program’s source code is directly accessible as the program runs.
For example, you can add documentation by providing a docstring immediately after the def statement of a function:
def palindrome(word):
"""Return True if the given word is a palindrome."""
return word == word[::-1]
assert palindrome("tacocat")
assert not palindrome("banana")
You can retrieve the docstring from within the Python program itself by accessing the function’s __doc__ special attribute:
print(palindrome.__doc__)
>>>
Return True if the given word is a palindrome.
You can also use the built-in pydoc module from the command line to run a local web server on your computer that hosts all of the Python documentation that’s accessible to your interpreter, including modules that you’ve written:
$ python3 -m pydoc -p 1234
Server ready at http://localhost:1234/
Server commands: [b]rowser, [q]uit
server> b
Docstrings can be attached to functions, classes, and modules. Making such a connection is part of the process of compiling and running a Python program. Support for docstrings and the __doc__ attribute has three consequences:
The accessibility of documentation makes interactive development easier. You can inspect functions, classes, and modules to see their documentation by using the help built-in function. This makes the Python interactive interpreter and tools like Jupyter (https://jupyter.org) a joy to use while you’re developing algorithms, testing APIs, and writing code snippets.
Using a standard way of defining documentation makes it easy to build tools that convert the text into more appealing formats (like HTML). This has led to excellent documentation-generating tools for the Python community, such as Sphinx (https://www.sphinx-doc.org). It has also enabled services like Read the Docs (https://readthedocs.org) that provide free hosting of beautiful-looking documentation for open source Python projects.
Python’s first-class, accessible, and good-looking documentation encourages people to write more documentation. The members of the Python community have a strong belief in the importance of documentation. There’s an assumption that “good code” also means well-documented code, and so you can expect most open source Python libraries to have decent documentation.
To participate in this excellent culture of documentation, you need to follow a few guidelines when you write docstrings. The full details are discussed online in PEP 257 (https://www.python.org/dev/peps/pep-0257/). Here are some of the best practices you should be sure to follow.
Each module should have a top-level docstring—a string literal that is the first statement in a source file. It should use three double quotes ("""). The goal of this docstring is to introduce the module and its contents.
The first line of the docstring should be a single sentence describing the module’s purpose. The paragraphs that follow should contain the details that all users of the module should know about its operation. The module docstring is also a jumping-off point where you can highlight important classes and functions found in the module.
Here’s an example of a module docstring:
# words.py
#!/usr/bin/env python3
"""Library for finding linguistic patterns in words.
Testing how words relate to each other can be tricky sometimes!
This module provides easy ways to determine when words you've
found have special properties.
Available functions:
- palindrome: Determine if a word is a palindrome.
- check_anagram: Determine if two words are anagrams.
...
"""
...
If the module is a command-line utility, the module docstring is also a great place to put usage information for running the tool.
Each class should have a class-level docstring that largely follows the same pattern as the module-level docstring. The first line is the single-sentence purpose of the class. Paragraphs that follow discuss important details of the class’s operation.
Important public attributes and methods of the class should be highlighted in the class-level docstring. It should also provide guidance to subclasses on how to properly interact with protected attributes (see Item 55: “Prefer Public Attributes over Private Ones”) and the superclass’s methods (see Item 53: “Initialize Parent Classes with super”).
Here’s an example of a class docstring:
class Player:
"""Represents a player of the game.
Subclasses may override the 'tick' method to provide
custom animations for the player's movement depending
on their power level, etc.
Public attributes:
- power: Unused power-ups (float between 0 and 1).
- coins: Coins found during the level (integer).
"""
...
Each public function and method should have a docstring that follows the same pattern as the docstrings for modules and classes. The first line is a single-sentence description of what the function does. The paragraphs that follow describe any specific behaviors and the arguments for the function. Any return values should be mentioned, and any exceptions that callers must handle as part of the function’s interface should be explained (see Item 32: “Prefer Raising Exceptions to Returning None” for an example).
Here’s an example of a function docstring:
def find_anagrams(word, dictionary):
"""Find all anagrams for a word.
This function only runs as fast as the test for
membership in the 'dictionary' container.
Args:
word: String of the target word.
dictionary: collections.abc.Container with all
strings that are known to be actual words.
Returns:
List of anagrams that were found. Empty if
none were found.
"""
...
There are also some special cases in writing docstrings for functions that are important to know:
If a function has no arguments and a simple return value, a single-sentence description is probably good enough.
If a function doesn’t return anything, it’s better to leave out any mention of the return value than to say “returns None.”
If a function’s interface includes raising exceptions, then your docstring should describe each exception that’s raised and when it’s raised.
If you don’t expect a function to raise an exception during normal operation, don’t mention that fact.
If a function accepts a variable number of arguments (see Item 34: “Reduce Visual Noise with Variable Positional Arguments”) or keyword arguments (see Item 35: “Provide Optional Behavior with Keyword Arguments”), use *args and **kwargs in the documented list of arguments to describe their purpose.
If a function has arguments with default values, those defaults should be mentioned (see Item 36: “Use None and Docstrings to Specify Dynamic Default Arguments”).
If a function is a generator (see Item 43: “Consider Generators Instead of Returning Lists”), its docstring should describe what the generator yields when it’s iterated.
If a function is an asynchronous coroutine (see Item 75: “Achieve Highly Concurrent I/O with Coroutines”), its docstring should explain when it will stop execution.
Python now supports type annotations for a variety of purposes (see Item 124: “Consider Static Analysis via typing to Obviate Bugs” for how to use them). The information they contain may be redundant with typical docstrings. For example, here is the function signature for find_anagrams with type annotations applied:
from collections.abc import Container
def find_anagrams(word: str,
dictionary: Container[str]) -> list[str]:
...
There is no longer a need to specify in the docstring that the word argument is a string, since the type annotation has that information. The same goes for the dictionary argument being a collections.abc.Container. There’s no reason to mention that the return type will be a list, since this fact is clearly annotated as such. And when no anagrams are found, the return value still must be a list, so it’s implied that it will be empty; that doesn’t need to be noted in the docstring. Here I write the same function signature from above along with the docstring that has been shortened accordingly:
def find_anagrams(word: str,
dictionary: Container[str]) -> list[str]:
"""Find all anagrams for a word.
This function only runs as fast as the test for
membership in the 'dictionary' container.
Args:
word: Target word.
dictionary: All known actual words.
Returns:
Anagrams that were found.
"""
...
The redundancy between type annotations and docstrings should be similarly avoided for instance fields, class attributes, and methods. It’s best to have type information in only one place so there’s less risk that it will skew from the actual implementation.
Write documentation for every module, class, method, and function using docstrings. Keep them up-to-date as your code changes.
For a module, introduce the contents of the module and any important classes or functions that all users should know about.
For a class, document behavior, important attributes, and subclass behavior in the docstring following the class statement.
For a function or method, document every argument, returned value, raised exception, and other behaviors in the docstring following the def statement.
If you’re using type annotations, omit that information from docstrings since it would be redundant to have it in both places.
As the size of a program’s codebase grows, it’s natural for you to reorganize its structure. You’ll split larger functions into smaller functions. You’ll refactor data structures into helper classes (see Item 29: “Compose Classes Instead of Deeply Nesting Dictionaries, Lists, and Tuples” for an example). You’ll separate functionality into various modules that depend on each other.
At some point, you’ll find yourself with so many modules that you need another layer in your program to make it understandable. For this purpose, Python provides packages. Packages are modules that contain other modules.
In most cases, packages are defined by putting an empty file named __init__.py into a directory. Once __init__.py is present, any other Python files in that directory will be available for import, using a path relative to the directory. For example, imagine that I have the following directory structure in a program:
main.py
mypackage/__init__.py
mypackage/models.py
mypackage/utils.py
To import the utils module, I can use the absolute module name that includes the package directory’s name:
# main.py
import mypackage.utils
I can also import a child module name relative to its containing package using the from clause:
# main2.py
from mypackage import utils
This dotted path pattern for the import statement continues when I have package directories nested within other packages (like import mypackage.foo.bar and from mypackage.foo import bar).
The functionality provided by packages has two primary purposes in Python programs.
The first use of packages is to help divide your modules into separate namespaces. They enable you to have many modules with the same filename but different absolute paths that are unique. For example, here’s a program that imports attributes from two modules with the same filename, utils.py:
# main.py
from analysis.utils import log_base2_bucket
from frontend.utils import stringify
bucket = stringify(log_base2_bucket(33))
This approach breaks when the functions, classes, or submodules defined in packages have the same names. For example, say that I want to use the inspect function from both the analysis.utils and the frontend.utils modules. Importing the attributes directly won’t work because the second import statement will overwrite the value of inspect in the current scope.
# main2.py
from analysis.utils import inspect
from frontend.utils import inspect # Overwrites!
The solution is to use the as clause of the import statement to rename whatever I’ve imported for the current scope:
# main3.py
from analysis.utils import inspect as analysis_inspect
from frontend.utils import inspect as frontend_inspect
value = 33
if analysis_inspect(value) == frontend_inspect(value):
print("Inspection equal!")
The as clause can be used to rename anything retrieved with the import statement, including entire modules. This facilitates accessing namespaced code and makes its identity clear when you use it.
Another approach for avoiding imported name conflicts is to always access names by their highest unique module name. For the example above, this means I’d use basic import statements instead of the from clause:
# main4.py
import analysis.utils
import frontend.utils
value = 33
if (analysis.utils.inspect(value) ==
frontend.utils.inspect(value)):
print("Inspection equal!")
This approach allows you to avoid the as clause altogether. It also makes it abundantly clear to new readers of the code where each of the similarly named functions is defined.
The second use of packages in Python is to provide strict, stable APIs for external consumers.
When you’re writing an API for wider consumption, such as an open source package (see Item 116: “Know Where to Find Community-Built Modules” for examples), you’ll want to provide stable functionality that doesn’t change between releases. To ensure that happens, it’s important to hide your internal code organization from external users. This way, you can refactor and improve your package’s internal modules without breaking existing users.
Python can limit the surface area exposed to API consumers by using the __all__ special attribute of a module or package. The value of __all__ is a list of every name to export from the module as part of its public API. When consuming code executes from foo import *—details on this below—only the attributes in foo.__all__ will be imported from foo. If __all__ isn’t present in foo, then only public attributes—those without a leading underscore—are imported (see Item 55: “Prefer Public Attributes over Private Ones” for details about that convention).
For example, say that I want to provide a package for calculating collisions between moving projectiles. Here I define the models module of mypackage to contain the representation of projectiles:
# models.py
__all__ = ["Projectile"]
class Projectile:
def __init__(self, mass, velocity):
self.mass = mass
self.velocity = velocity
I also define a utils module in mypackage to perform operations on the Projectile instances, such as simulating collisions between them:
# utils.py
from .models import Projectile
__all__ = ["simulate_collision"]
def _dot_product(a, b):
...
def simulate_collision(a, b):
...
Now I’d like to provide all of the public parts of this API as a set of attributes that are available on the mypackage module. This will allow downstream consumers to always import directly from mypackage instead of importing from mypackage.models or mypackage.utils. This ensures that the API consumer’s code will continue to work even if the internal organization of mypackage changes (e.g., if models.py is deleted).
To do this with Python packages, you need to modify the __init__.py file in the mypackage directory. This file is what actually becomes the contents of the mypackage module when it’s imported. Thus, you can specify an explicit API for mypackage by limiting what you import into __init__.py. Since all of my internal modules already specify __all__, I can expose the public interface of mypackage by simply importing everything from the internal modules and updating __all__ accordingly:
# __init__.py
__all__ = []
from .models import *
__all__ += models.__all__
from .utils import *
__all__ += utils.__all__
Here’s a consumer of the API that directly imports from mypackage instead of accessing the inner modules:
# api_consumer.py
from mypackage import *
a = Projectile(1.5, 3)
b = Projectile(4, 1.7)
after_a, after_b = simulate_collision(a, b)
Notably, internal-only functions like mypackage.utils._dot_product will not be available to the API consumer on mypackage because they weren’t present in __all__. Being omitted from __all__ also means that they weren’t imported by the from mypackage import * statement. The internal-only names are effectively hidden.
This whole approach works great when it’s important to provide an explicit, stable API. However, if you’re building an API for use between your own modules, the functionality of __all__ is probably unnecessary and should be avoided. The namespacing provided by packages is usually enough for a team of programmers to collaborate on large amounts of code they control while maintaining reasonable interface boundaries.
Beware of import *
Import statements like from x import y are clear because the source of y is explicitly the x package or module. Wildcard imports like from foo import * can also be useful, especially in interactive Python sessions. However, wildcards make code more difficult to understand:
from foo import * hides the sources of names from new readers of the code. If a module has multiple import * statements, the reader needs to check all of the referenced modules to figure out where a name was defined.
Names from import * statements will overwrite any conflicting names within the containing module. This can lead to strange bugs caused by accidental interactions between your code and names reassigned by successive import * statements.
The safest approach is to avoid import * in your code and explicitly import names with the from x import y style.
Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names.
Simple packages are defined by adding an __init__.py file to a directory that contains other source files. These files become the child modules of the directory’s package. Package directories may also contain other packages.
You can provide an explicit API for a module by listing its publicly visible names in its __all__ special attribute.
You can hide a package’s internal implementation by only importing public names in the package’s __init__.py file or by naming internal-only members with a leading underscore.
When collaborating within a single team or on a single codebase, using __all__ for explicit APIs is probably unnecessary.
A deployment environment is a configuration in which a program runs. Every program has at least one deployment environment: the production environment. The goal of writing a program in the first place is to put it to work in the production environment and achieve some kind of outcome.
Writing or modifying a program requires being able to run it on the computer you use for developing. The configuration of your development environment might be very different from that of your production environment. For example, you might be using a tiny single-board computer to develop a program that’s meant to run on enormous supercomputers.
Tools like venv (see Item 117: “Use Virtual Environments for Isolated and Reproducible Dependencies”) make it easy to ensure that all environments have the same Python packages installed. The trouble is that production environments often require many external assumptions that are hard to reproduce in development environments.
For example, say that I want to run a program in a web server container and give it access to a database. Every time I want to modify my program’s code, I need to run a server container, the database schema must be set up properly, and my program needs the password for access. This is a very high cost if all I’m trying to do is verify that a one-line change to my program works correctly.
The best way to work around such issues is to override parts of a program at startup time to provide different functionality depending on the deployment environment. For example, I can have two different __main__ files—one for production and one for development:
# dev_main.py
TESTING = True
import db_connection
db = db_connection.Database()
# prod_main.py
TESTING = False
import db_connection
db = db_connection.Database()
The only difference between the two files is the value of the TESTING constant. Other modules in my program can then import the __main__ module and use the value of TESTING to decide how they define their own attributes:
# db_connection.py
import __main__
class TestingDatabase:
...
class RealDatabase:
...
if __main__.TESTING:
Database = TestingDatabase
else:
Database = RealDatabase
The key behavior to notice here is that code running in module scope—not inside a function or method—is just normal Python code (see Item 98: “Lazy-Load Modules with Dynamic Imports to Reduce Startup Time” for details). You can use an if statement at the module level to decide how the module will define names. This makes it easy to tailor modules to your various deployment environments. You can avoid having to reproduce costly assumptions like database configurations when they aren’t needed. You can inject local or fake implementations that ease interactive development, or you can use mocks for writing tests (see Item 111: “Use Mocks to Test Code with Complex Dependencies”).
Note
When your deployment environment configuration gets really complicated, you should consider moving it out of Python constants (like TESTING) and into dedicated configuration files. Tools like the configparser built-in module let you maintain production configurations separately from code, a distinction that’s crucial for collaborating with an operations team.
Module-scoped code can be used for more than dealing with external configurations. For example, if I know that my program must work differently depending on its host platform, I can inspect the sys module before defining top-level constructs in a module:
# db_connection.py
import sys
class Win32Database:
...
class PosixDatabase:
...
if sys.platform.startswith("win32"):
Database = Win32Database
else:
Database = PosixDatabase
Similarly, I could use environment variables from os.environ to guide my module definitions to match other constraints and requirements of the system.
Programs often need to run in multiple deployment environments that each have unique assumptions and configurations.
You can tailor a module’s contents to different deployment environments by using normal Python statements in module scope.
Module contents can be the product of any external condition, including host introspection through the sys and os modules.
Exception to Insulate Callers from APIsWhen you’re defining a module’s API, the exceptions you raise are just as much a part of your interface as the functions and classes you define (see Item 32: “Prefer Raising Exceptions to Returning None” for an example of why).
Python has a built-in hierarchy of exceptions for the language and standard library (see Item 86: “Understand the Difference Between Exception and BaseException” for background). There’s a draw to using the built-in exception types for reporting errors instead of defining your own new types. For example, I could raise a ValueError exception whenever an invalid parameter is passed to a function in one of my modules:
# my_module.py
def determine_weight(volume, density):
if density <= 0:
raise ValueError("Density must be positive")
...
In some cases, using ValueError makes sense, but for APIs, it’s much more powerful to define a new hierarchy of exceptions. I can do this by providing a root Exception class in my module and having all other exceptions raised by that module inherit from the root exception:
# my_module.py
class Error(Exception):
"""Base-class for all exceptions raised by this module."""
class InvalidDensityError(Error):
"""There was a problem with a provided density value."""
class InvalidVolumeError(Error):
"""There was a problem with the provided weight value."""
def determine_weight(volume, density):
if density < 0:
raise InvalidDensityError("Density must be positive")
if volume < 0:
raise InvalidVolumeError("Volume must be positive")
if volume == 0:
density / volume
Having a root exception in a module makes it easy for consumers of an API to catch all the exceptions that were raised deliberately. For example, here a consumer of my API makes a function call with a try/except statement that catches my root exception:
try:
weight = my_module.determine_weight(1, -1)
except my_module.Error:
logging.exception("Unexpected error")
>>>
Unexpected error
Traceback (most recent call last):
File ".../example.py", line 3, in <module>
weight = my_module.determine_weight(1, -1)
File ".../my_module.py", line 10, in determine_weight
raise InvalidDensityError("Density must be positive")
InvalidDensityError: Density must be positive
The logging.exception function prints the full stack trace of the caught exception so it’s easier to debug in this situation. The try/except also prevents my API’s exceptions from propagating too far upward and breaking the calling program. It insulates the calling code from my API. This insulation has three helpful effects.
First, root exceptions let callers understand when there’s a problem with their usage of an API. If callers are using my API properly, they should catch the various exceptions that I deliberately raise. If they don’t handle such an exception, it will propagate all the way up to the insulating except block that catches my module’s root exception. That block can bring the exception to the attention of the API consumer, providing an opportunity for them to add proper handling of the missed exception type:
try:
weight = my_module.determine_weight(-1, 1)
except my_module.InvalidDensityError:
weight = 0
except my_module.Error:
logging.exception("Bug in the calling code")
>>>
Bug in the calling code
Traceback (most recent call last):
File ".../example.py", line 3, in <module>
weight = my_module.determine_weight(-1, 1)
File ".../my_module.py", line 12, in determine_weight
raise InvalidVolumeError("Volume must be positive")
InvalidVolumeError: Volume must be positive
The second advantage of using root exceptions is that they can help find bugs in an API module’s code. If my code only deliberately raises exceptions that I define within my module’s hierarchy, then all other types of exceptions raised by my module must be the ones that I didn’t intend to raise. These are bugs in my API’s code.
Using the try/except statement above will not insulate API consumers from bugs in my API module’s code. To do that, the caller needs to add another except block that catches Python’s Exception base class (see Item 85: “Beware of Catching the Exception Class” for details). This allows the API consumer to detect when there’s a bug in the API module’s implementation that needs to be fixed.
The output for this example includes both the logging.exception message and the default interpreter output for the exception since it was re-raised:
try:
weight = my_module.determine_weight(0, 1)
except my_module.InvalidDensityError:
weight = 0
except my_module.Error:
logging.exception("Bug in the calling code")
except Exception:
logging.exception("Bug in the API code!")
raise # Re-raise exception to the caller
>>>
Bug in the API code!
Traceback (most recent call last):
File ".../example.py", line 3, in <module>
weight = my_module.determine_weight(0, 1)
File ".../my_module.py", line 14, in determine_weight
density / volume
~~~~~~~~^~~~~~~~
ZeroDivisionError: division by zero
Traceback ...
ZeroDivisionError: division by zero
The third impact of using root exceptions is future-proofing an API. Over time, I might want to expand my API to provide more specific exceptions in certain situations. For example, I could add an Exception subclass that indicates the error condition of supplying negative densities:
# my_module.py
...
class NegativeDensityError(InvalidDensityError):
"""A provided density value was negative."""
...
def determine_weight(volume, density):
if density < 0:
raise NegativeDensityError("Density must be positive")
...
The calling code would continue to work exactly as before because it already catches InvalidDensityError exceptions (the parent class of NegativeDensityError). In the future, the caller could decide to special-case the new type of exception and change the handling behavior accordingly:
try:
weight = my_module.determine_weight(1, -1)
except my_module.NegativeDensityError:
raise ValueError("Must supply non-negative density")
except my_module.InvalidDensityError:
weight = 0
except my_module.Error:
logging.exception("Bug in the calling code")
except Exception:
logging.exception("Bug in the API code!")
raise
>>>
Traceback ...
NegativeDensityError: Density must be positive
The above exception was the direct cause of the following
➥ exception:
Traceback ...
ValueError: Must supply non-negative density
I can take API future-proofing further by providing a broader set of exceptions directly below the root exception. For example, imagine that that I have one set of errors related to calculating weights, another related to calculating volume, and a third related to calculating density:
# my_module.py
class Error(Exception):
"""Base-class for all exceptions raised by this module."""
class WeightError(Error):
"""Base-class for weight calculation errors."""
class VolumeError(Error):
"""Base-class for volume calculation errors."""
class DensityError(Error):
"""Base-class for density calculation errors."""
...
Specific exceptions would inherit from these general exceptions. Each intermediate exception acts as its own kind of root exception. This makes it easier to insulate layers of calling code from API code based on broad functionality. This is much better than having all callers catch a long list of very specific Exception subclasses.
When a module defines a root exception and only raises its child classes, API consumers have a simple way to isolate themselves from unexpected situations encountered by the module.
Catching root exceptions can help you find bugs in code that consumes an API.
Catching the Python Exception base class can help you find bugs in API implementations.
Intermediate root exceptions let you raise more specific types of exceptions in the future without breaking API consumers.
Inevitably, while you’re collaborating with others, you’ll find a mutual interdependence between modules. This can even happen when you work by yourself on the various parts of a single program.
For example, say that I want my GUI application to show a dialog box for choosing where to save a document. The data displayed by the dialog could be specified through arguments to my event handlers. But the dialog also needs to read global state, such as user preferences, to know how to render properly.
Here I define a dialog that retrieves the default document save location from global preferences:
# dialog.py
import app
class Dialog:
def __init__(self, save_dir):
self.save_dir = save_dir
...
save_dialog = Dialog(app.prefs.get("save_dir"))
def show():
...
The problem is that the app module that contains the prefs object also imports the dialog class in order to show the same dialog on program start:
# app.py
import dialog
class Prefs:
...
def get(self, name):
...
prefs = Prefs()
dialog.show()
It’s a circular dependency. If I try to import the app module from my main program like this:
# main.py
import app
I get an exception:
$ python3 main.py
Traceback (most recent call last):
File ".../main.py", line 4, in <module>
import app
File ".../app.py", line 4, in <module>
import dialog
File ".../dialog.py", line 15, in <module>
save_dialog = Dialog(app.prefs.get("save_dir"))
^^^^^^^^^
AttributeError: partially initialized module 'app' has no
➥ attribute 'prefs' (most likely due to a circular import)
To understand what’s happening here, you need to know how Python’s import machinery works in general. When a module is imported, here’s what Python actually does, in depth-first order (see https://docs.python.org/3/library/importlib.html for the full details):
Searches for the module in locations from sys.path
Loads the code from the module and ensures that it compiles
Creates a corresponding empty module object
Inserts the module into sys.modules
Runs the code in the module object to define its contents
The problem with a circular dependency is that the attributes of a module aren’t defined until the code for those attributes has executed (after step 5). But the module can be loaded with the import statement immediately after it’s inserted into sys.modules (after step 4).
In the example above, the app module imports dialog before defining anything. Then the dialog module imports app. Since app still hasn’t finished running—it’s currently importing dialog—the app module is empty (from step 4). The AttributeError exception is raised (during step 5 for dialog) because the code that defines prefs hasn’t run yet (i.e., step 5 for app isn’t complete).
The best solution to this problem is to refactor the code so that the prefs data structure is at the bottom of the dependency tree. Then both app and dialog can import the same utility module and avoid any circular dependencies. But such a clear division isn’t always possible or could require too much refactoring to be worth the effort.
There are three other ways to break circular dependencies.
The first approach to the circular imports problem is to change the order of imports. For example, if I import the dialog module toward the bottom of the app module, after the app module’s other contents have run, the AttributeError exception goes away:
# app.py
class Prefs:
...
prefs = Prefs()
import dialog # Moved
dialog.show()
This works because, when the dialog module is loaded late, its recursive import of app finds that app.prefs has already been defined (i.e., step 5 is mostly done for app).
Although this solution avoids the AttributeError exception, it goes against the PEP 8 style guide (see Item 2: “Follow the PEP 8 Style Guide”). The style guide suggests that you always put imports at the top of your Python files. This makes your module’s dependencies clear to new readers of the code. It also ensures that any module you depend on is in scope and available to all the code in your module.
Having imports later in a file can be brittle and can cause small changes in the ordering of your code to break the module entirely. I suggest not using import reordering to solve your circular dependency issues.
A second solution to the circular imports problem is to have modules minimize side effects at import time. For example, I can have my modules only define functions, classes, and constants. I specifically avoid running any functions at import time. Then I have each module provide a configure function that I can call when all other modules have finished importing. The purpose of configure is to prepare each module’s state by accessing the attributes of other modules. I run configure after all modules have been imported (i.e., when step 5 is complete), so all attributes must be defined.
Here I redefine the dialog module to only access the prefs object when configure is called:
# dialog.py
import app
class Dialog:
...
save_dialog = Dialog()
def show():
...
def configure():
save_dialog.save_dir = app.prefs.get("save_dir")
I also redefine the app module to not run any activities on import:
# app.py
import dialog
class Prefs:
...
prefs = Prefs()
def configure():
...
Finally, the main module has three distinct phases of execution—import everything, configure everything, and run the first activity:
# main.py
import app
import dialog
app.configure()
dialog.configure()
dialog.show()
This works well in many situations and enables patterns like dependency injection (see Item 112: “Encapsulate Dependencies to Facilitate Mocking and Testing” for a similar example). But sometimes it can be difficult to structure your code so that an explicit configure step is possible. Having two distinct phases within a module can also make your code harder to read because it separates the definition of objects from their configuration.
The third—and often simplest—solution to the circular imports problem is to use an import statement within a function or method. This is called a dynamic import because the module import happens while the program is running, not while the program is first starting up and initializing its modules.
Here I redefine the dialog module to use a dynamic import. The dialog.show function imports the app module at runtime instead of the dialog module importing app at initialization time:
# dialog.py
class Dialog:
...
save_dialog = Dialog()
def show():
import app # Dynamic import
save_dialog.save_dir = app.prefs.get("save_dir")
...
The app module can now be the same as it was in the original example. It imports dialog at the top and calls dialog.show at the bottom:
# app.py
import dialog
class Prefs:
...
prefs = Prefs()
dialog.show()
This approach has a similar effect to the import, configure, and run steps from before. The difference is that it requires no structural changes to the way the modules are defined and imported. I’m simply delaying the circular import until the moment I must access the other module. At that point, I can be pretty sure that all other modules have already been initialized (as step 5 is complete for everything).
In general, it’s good to avoid dynamic imports like this. The cost of the import statement is not negligible and can be especially bad in tight loops (see Item 98: “Lazy-Load Modules with Dynamic Imports to Reduce Startup Time” for an example). By delaying execution, dynamic imports also set you up for surprising failures at runtime, such as SyntaxError exceptions long after your program has started running (see Item 108: “Verify Related Behaviors in TestCase Subclasses” for how to avoid that). However, these downsides are often better than the alternative of restructuring your entire program.
Circular dependencies happen when two modules must call into each other at import time. They can cause your program to crash at startup.
The best way to break a circular dependency is to refactor mutual dependencies into a separate module at the bottom of the dependency tree.
Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity.
warnings to Refactor and Migrate UsageIt’s natural for APIs to change in order to satisfy new requirements that meet formerly unanticipated needs. When an API is small and has few upstream or downstream dependencies, making such changes is straightforward. One programmer can often update a small API and all of its callers in a single commit to the source code repository.
However, as a codebase grows, the number of callers of an API can become so large or fragmented across repositories that it’s infeasible or impractical to make API changes in lockstep with updating callers to match. Instead, you need a way to notify and encourage the people you collaborate with to refactor their code and migrate their API usage to the latest forms.
For example, say that I want to provide a module for calculating how far a car will travel at a given average speed and duration. Here I define such a function and assume that speed is in miles per hour and duration is in hours:
def print_distance(speed, duration):
distance = speed * duration
print(f"{distance} miles")
print_distance(5, 2.5)
>>>
12.5 miles
Imagine that this works so well that I quickly gather a large number of dependencies on this function. Other programmers that I collaborate with need to calculate and print distances like this all across our shared codebase.
Despite its success, this implementation is error prone because the units for the arguments are implicit. For example, if I wanted to see how far a bullet travels in 3 seconds at 1000 meters per second, I would get the wrong result:
print_distance(1000, 3)
>>>
3000 miles
I can address this problem by expanding the API of print_distance to include optional keyword arguments (see Item 37: “Enforce Clarity with Keyword-Only and Positional-Only Arguments”) for the units of speed, duration, and the computed distance to print out:
CONVERSIONS = {
"mph": 1.60934 / 3600 * 1000, # m/s
"hours": 3600, # seconds
"miles": 1.60934 * 1000, # m
"meters": 1, # m
"m/s": 1, # m/s
"seconds": 1, # s
}
def convert(value, units):
rate = CONVERSIONS[units]
return rate * value
def localize(value, units):
rate = CONVERSIONS[units]
return value / rate
def print_distance(
speed,
duration,
*,
speed_units="mph",
time_units="hours",
distance_units="miles",
):
norm_speed = convert(speed, speed_units)
norm_duration = convert(duration, time_units)
norm_distance = norm_speed * norm_duration
distance = localize(norm_distance, distance_units)
print(f"{distance} {distance_units}")
Now I can modify the speeding bullet call and produce an accurate result with a unit conversion to miles:
print_distance(
1000,
3,
speed_units="meters",
time_units="seconds",
)
>>>
1.8641182099494205 miles
It seems like requiring units to be specified for this function is a much better way to go. Making them explicit reduces the likelihood of errors and is easier for new readers of the code to understand. But how can I migrate all callers of the API over to always specifying units? How do I minimize breakage of any code that’s dependent on print_distance while also encouraging callers to adopt the new units arguments as soon as possible?
For this purpose, Python provides the built-in warnings module. Using warnings is a programmatic way to inform other programmers that their code needs to be modified due to a change to an underlying library that they depend on. While exceptions are primarily for automated error handling by machines (see Item 81: “assert Internal Assumptions and raise Missed Expectations”), warnings are all about communication between humans about what to expect in their collaboration with each other.
I can modify print_distance to issue warnings when the optional keyword arguments for specifying units are not supplied. This way, the arguments can continue being optional temporarily, while providing an explicit notice to people running dependent programs that they should expect breakage in the future if they fail to take action:
import warnings
def print_distance(
speed,
duration,
*,
speed_units=None,
time_units=None,
distance_units=None,
):
if speed_units is None:
warnings.warn(
"speed_units required",
DeprecationWarning,
)
speed_units = "mph"
if time_units is None:
warnings.warn(
"time_units required",
DeprecationWarning,
)
time_units = "hours"
if distance_units is None:
warnings.warn(
"distance_units required",
DeprecationWarning,
)
distance_units = "miles"
norm_speed = convert(speed, speed_units)
norm_duration = convert(duration, time_units)
norm_distance = norm_speed * norm_duration
distance = localize(norm_distance, distance_units)
print(f"{distance} {distance_units}")
I can verify that this code issues a warning by calling the function with the same arguments as before and capturing the sys.stderr output from the warnings module:
import contextlib
import io
fake_stderr = io.StringIO()
with contextlib.redirect_stderr(fake_stderr):
print_distance(
1000,
3,
speed_units="meters",
time_units="seconds",
)
print(fake_stderr.getvalue())
>>>
1.8641182099494205 miles
.../example.py:121: DeprecationWarning: distance_units required
warnings.warn(
Adding warnings to this function required quite a lot of repetitive boilerplate that’s hard to read and maintain. Also, the warning message indicates the line where warning.warn was called, but what I really want to point out is where the call to print_distance was made without soon-to-be-required keyword arguments.
Luckily, the warnings.warn function supports the stacklevel parameter, which makes it possible to report the correct place in the stack as the cause of the warning. stacklevel also makes it easy to write functions that can issue warnings on behalf of other code, reducing boilerplate. Here I define a helper function that warns if an optional argument wasn’t supplied and then provides a default value for it:
def require(name, value, default):
if value is not None:
return value
warnings.warn(
f"{name} will be required soon, update your code",
DeprecationWarning,
stacklevel=3,
)
return default
def print_distance(
speed,
duration,
*,
speed_units=None,
time_units=None,
distance_units=None,
):
speed_units = require(
"speed_units",
speed_units,
"mph",
)
time_units = require(
"time_units",
time_units,
"hours",
)
distance_units = require(
"distance_units",
distance_units,
"miles",
)
norm_speed = convert(speed, speed_units)
norm_duration = convert(duration, time_units)
norm_distance = norm_speed * norm_duration
distance = localize(norm_distance, distance_units)
print(f"{distance} {distance_units}")
I can verify that this propagates the proper offending line by inspecting the captured output:
import contextlib
import io
fake_stderr = io.StringIO()
with contextlib.redirect_stderr(fake_stderr):
print_distance(
1000,
3,
speed_units="meters",
time_units="seconds",
)
print(fake_stderr.getvalue())
>>>
1.8641182099494205 miles
.../example.py:208: DeprecationWarning: distance_units will be
➥required soon, update your code
print_distance(
The warnings module also lets you configure what should happen when a warning is encountered. One option is to make all warnings become errors, which raises the warning as an exception instead of printing it out to sys.stderr:
warnings.simplefilter("error")
try:
warnings.warn(
"This usage is deprecated",
DeprecationWarning,
)
except DeprecationWarning:
pass # Expected
This exception-raising behavior is especially useful for automated tests in order to detect changes in upstream dependencies and fail tests accordingly. Using such test failures is a great way to make it clear to the people you collaborate with that they will need to update their code. You can use the -W error command-line argument to the Python interpreter or the PYTHONWARNINGS environment variable to apply this policy:
$ python3 -W error example_test.py
Traceback (most recent call last):
File ".../example_test.py", line 6, in <module>
warnings.warn("This might raise an exception!")
UserWarning: This might raise an exception!
Once the people responsible for code that depends on a deprecated API are aware that they’ll need to do a migration, they can tell the warnings module to ignore the error by using the simplefilter and filterwarnings functions (see https://docs.python.org/3/library/warnings for details):
warnings.simplefilter("ignore")
warnings.warn("This will not be printed to stderr")
After a program is deployed into production, it doesn’t make sense for warnings to cause errors because they might crash the program at a critical time. Instead, a better approach is to replicate warnings into the logging built-in module. Here I accomplish this by calling the logging.captureWarnings function and configuring the corresponding "py.warnings" logger:
import logging
fake_stderr = io.StringIO()
handler = logging.StreamHandler(fake_stderr)
formatter = logging.Formatter(
"%(asctime)-15s WARNING] %(message)s")
handler.setFormatter(formatter)
logging.captureWarnings(True)
logger = logging.getLogger("py.warnings")
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
warnings.resetwarnings()
warnings.simplefilter("default")
warnings.warn("This will go to the logs output")
print(fake_stderr.getvalue())
>>>
2019-06-11 19:48:19,132 WARNING] .../example.py:227:
➥UserWarning: This will go to the logs output
warnings.warn("This will go to the logs output")
Using logging to capture warnings ensures that any error-reporting systems that my program already has in place will also receive notice of important warnings in production. This can be especially useful if my tests don’t cover every edge case that I might see when the program is undergoing real usage.
API library maintainers should also write unit tests to verify that warnings are generated under the correct circumstances with clear and actionable messages (see Item 108: “Verify Related Behaviors in TestCase Subclasses”). Here I use the warnings.catch_warnings function as a context manager (see Item 82: “Consider contextlib and with Statements for Reusable try/finally Behavior” for background) to wrap a call to the require function that I defined above:
with warnings.catch_warnings(record=True) as found_warnings:
found = require("my_arg", None, "fake units")
expected = "fake units"
assert found == expected
Once I’ve collected the warning messages, I can verify that their number, detail messages, and categories match my expectations:
assert len(found_warnings) == 1
single_warning = found_warnings[0]
assert str(single_warning.message) == (
"my_arg will be required soon, update your code"
)
assert single_warning.category == DeprecationWarning
The warnings module can be used to notify callers of your API about deprecated usage. Warning messages encourage such callers to fix their code before later changes break their programs.
Raise warnings as errors by using the -W error command-line argument to the Python interpreter. This is especially useful in automated tests to catch potential regressions of dependencies.
In production, you can replicate warnings into the logging module to ensure that your existing error-reporting systems will capture warnings at runtime.
It’s useful to write tests for the warnings that your code generates to make sure they’ll be triggered at the right time in any of your downstream dependencies.
typing to Obviate BugsProviding documentation is a great way to help users of an API understand how to use it properly (see Item 118: “Write Docstrings for Every Function, Class, and Module”), but often it’s not enough, and incorrect usage still causes bugs. Ideally, there would be a programmatic mechanism to verify that callers are using your APIs the right way and that you are using your downstream dependencies correctly as well. Many programming languages address part of this need with compile-time type checking, which can identify and eliminate some categories of bugs.
Historically Python has focused on dynamic features and has not provided compile-time type safety of any kind (see Item 3: “Never Expect Python to Detect Errors at Compile Time”). However, more recently Python has introduced special syntax and the built-in typing module, which allow you to annotate variables, class fields, functions, and methods with type information. These type hints allow for gradual typing, which means a codebase can be progressively updated to specify types as desired.
The benefit of adding type information to a Python program is that you can run static analysis tools to ingest a program’s source code and identify where bugs are most likely to occur. The typing built-in module doesn’t actually implement any of the type checking functionality itself. It merely provides a common library for defining types that can be applied to Python code and consumed by separate tools.
Much as there are multiple distinct implementations of the Python interpreter (e.g., CPython, PyPy; see Item 1: “Know Which Version of Python You’re Using”), there are multiple implementations of static analysis tools for Python that use typing. As of this writing, the most popular tools are mypy (https://github.com/python/mypy), pyright (https://github.com/microsoft/pyright), pyre (https://pyre-check.org), and pytype (https://github.com/google/pytype). For the typing examples in this book, I’ve used mypy with the --strict flag, which enables all the various warnings supported by the tool. Here’s an example of what running the command line looks like:
$ python3 -m mypy --strict example.py
These tools can be used to detect a large number of common errors before a program is ever run, which can provide an added layer of safety in addition to having good tests (see Item 109: “Prefer Integration Tests over Unit Tests”). For example, can you find the bug that causes this simple function to compile fine but throw an exception at runtime?
def subtract(a, b):
return a - b
subtract(10, "5")
>>>
Traceback ...
TypeError: unsupported operand type(s) for -: 'int' and 'str'
Parameter and variable type annotations are delineated with a colon (such as name: type). Return value types are specified with -> type following the argument list. Using such type annotations and mypy, I can easily spot the bug:
def subtract(a: int, b: int) -> int: # Function annotation
return a - b
subtract(10, "5") # Oops: passed string value
$ python3 -m mypy --strict example.py
.../example.py:4: error: Argument 2 to "subtract" has
➥ incompatible type "str"; expected "int" [arg-type]
Found 1 error in 1 file (checked 1 source file)
Type annotations can also be applied to classes. For example, this class has two bugs in it that will raise exceptions when the program is run:
class Counter:
def __init__(self):
self.value = 0
def add(self, offset):
value += offset
def get(self) -> int:
self.value
The first one happens when I call the add method:
counter = Counter()
counter.add(5)
>>>
Traceback ...
UnboundLocalError: cannot access local variable 'value' where
➥it is not associated with a value
The second bug happens when I call get:
counter = Counter()
found = counter.get()
assert found == 0, found
>>>
Traceback ...
AssertionError: None
Both of these problems are easily found in advance by mypy:
class Counter:
def __init__(self) -> None:
self.value: int = 0 # Field / variable annotation
def add(self, offset: int) -> None:
value += offset # Oops: forgot "self."
def get(self) -> int:
self.value # Oops: forgot "return"
counter = Counter()
counter.add(5)
counter.add(3)
assert counter.get() == 8
$ python3 -m mypy --strict example.py
.../example.py:9: error: Name "value" is not defined
➥[name-defined]
.../example.py:11: error: Missing return statement [return]
Found 2 errors in 1 file (checked 1 source file)
One of the strengths of Python’s dynamism is the ability to write generic functionality that operates on duck types (see Item 25: “Be Cautious when Relying on Dictionary Insertion Ordering” and Item 57: “Inherit from collections.abc Classes for Custom Container Types”). This allows one implementation to accept a wide range of types, saving a lot of duplicative effort and simplifying testing. Here I’ve defined such a generic function for combining values from a list together, but the assert statement on the last line fails for a non-obvious reason:
def combine(func, values):
assert len(values) > 0
result = values[0]
for next_value in values[1:]:
result = func(result, next_value)
return result
def add(x, y):
return x + y
inputs = [1, 2, 3, 4j]
result = combine(add, inputs)
assert result == 10, result # Fails
>>>
Traceback ...
AssertionError: (6+4j)
I can use the typing module’s support for generics to annotate this function and detect the problem statically:
from collections.abc import Callable
from typing import TypeVar
Value = TypeVar("Value")
Func = Callable[[Value, Value], Value]
def combine(func: Func[Value], values: list[Value]) -> Value:
assert len(values) > 0
result = values[0]
for next_value in values[1:]:
result = func(result, next_value)
return result
Real = TypeVar("Real", int, float)
def add(x: Real, y: Real) -> Real:
return x + y
inputs = [1, 2, 3, 4j] # Oops: included a complex number
result = combine(add, inputs)
assert result == 10
$ python3 -m mypy --strict example.py
.../example.py:22: error: Argument 1 to "combine" has
➥incompatible type "Callable[[Real, Real], Real]"; expected
➥"Callable[[complex, complex], complex]" [arg-type]
Found 1 error in 1 file (checked 1 source file)
Another extremely common error is a None value appearing where you thought you’d have a valid object (see Item 32: “Prefer Raising Exceptions to Returning None”). This problem can affect seemingly simple code, like the following snippet where the last assert statement fails:
def get_or_default(value, default):
if value is not None:
return value
return value
found = get_or_default(3, 5)
assert found == 3
found = get_or_default(None, 5)
assert found == 5, found # Fails
>>>
Traceback ...
AssertionError: None
The typing module supports option types—indicated with type | None—which ensure that programs only interact with values after proper null checks have been performed. This allows mypy to infer that there’s a bug in this code. The type used in the return statement must be None, and that doesn’t match the int type required by the function signature:
def get_or_default(value: int | None, default: int) -> int:
if value is not None:
return value
return value # Oops: should have returned "default"
$ python3 -m mypy --strict example.py
.../example.py:4: error: Incompatible return value type
➥ (got "None", expected "int") [return-value]
Found 1 error in 1 file (checked 1 source file)
A wide variety of other options are available in the typing module. (See https://docs.python.org/3/library/typing.html for all the details.) Notably, exceptions are not included. Unlike Java, which has checked exceptions that are enforced at the API boundary of every method, Python’s type annotations are more similar to those of C#: Exceptions are not considered part of an interface’s definition. Thus, if you want to verify that you’re raising and catching exceptions properly, you need to write tests.
One common gotcha in using the typing module occurs when you need to deal with forward references (see Item 122: “Know How to Break Circular Dependencies” for a similar problem). For example, imagine that I have two classes where one holds a reference to the other. Usually the definition order of these classes doesn’t matter because they will both be defined before their instances are created later in the program:
class FirstClass:
def __init__(self, value):
self.value = value
class SecondClass:
def __init__(self, value):
self.value = value
second = SecondClass(5)
first = FirstClass(second)
If you apply type hints to this program and run mypy, it will say that there are no issues:
class FirstClass:
def __init__(self, value: SecondClass) -> None:
self.value = value
class SecondClass:
def __init__(self, value: int) -> None:
self.value = value
second = SecondClass(5)
first = FirstClass(second)
$ python3 -m mypy --strict example.py
Success: no issues found in 1 source file
However, if you actually try to run this code, it fails because SecondClass is referenced by the type annotation in the FirstClass.__init__ method’s parameters before it’s actually defined:
class FirstClass:
def __init__(self, value: SecondClass) -> None: # Breaks
self.value = value
class SecondClass:
def __init__(self, value: int) -> None:
self.value = value
second = SecondClass(5)
first = FirstClass(second)
>>>
Traceback ...
NameError: name 'SecondClass' is not defined
The recommended workaround that’s supported by these static analysis tools is to use a string as the type annotation that contains the forward reference. The string value is later parsed and evaluated to extract the type information to check:
class FirstClass:
def __init__(self, value: "SecondClass") -> None: # OK
self.value = value
class SecondClass:
def __init__(self, value: int) -> None:
self.value = value
second = SecondClass(5)
first = FirstClass(second)
Now that you’ve seen how to use type hints and their potential benefits, it’s important to be thoughtful about when to use them. Here are some of the best practices to keep in mind:
It’s going to slow you down if you try to use type annotations from the start when writing a new piece of code. A general strategy is to write a first version without annotations, then write tests, and then add type information where it’s most valuable.
Type hints are most important at the boundaries of a codebase, such as an API you provide that many callers (and thus other people) depend on. Type hints complement tests (see Item 108: “Verify Related Behaviors in TestCase Subclasses”) and warnings (see Item 123: “Consider warnings to Refactor and Migrate Usage”) to ensure that your API callers aren’t surprised or broken by your changes.
It can be useful to apply type hints to the most complex and error-prone parts of your codebase that aren’t part of an API. However, it may not be worth striving for 100% coverage in your type annotations because you’ll quickly encounter diminishing returns.
If possible, you should include static analysis as part of your automated build and test system to ensure that every commit to your codebase is vetted for errors. In addition, the configuration used for type checking should be maintained in the repository to ensure that all the people you collaborate with are using the same rules.
It’s important to run the type checker as you add type information to your code. If you don’t, you may nearly finish sprinkling type hints everywhere and then be hit by a huge wall of errors from the type checking tool, which can be disheartening and make you want to abandon type hints altogether.
Finally, it’s important to acknowledge that in many situations, you might not need or want to use type annotations at all. For small to medium-sized programs, ad hoc code, legacy codebases, and prototypes, type hints may require far more effort than they’re worth.
Python has special syntax and the typing built-in module for annotating variables, fields, functions, and methods with type information.
Static type checkers can leverage this type information to help you avoid many common bugs that would otherwise happen at runtime.
There are a variety of best practices for adopting types in your programs, using them in APIs, and making sure they don’t get in the way of your productivity.
zipimport and zipappImagine that you’ve finished building a web application in Python using the flask open source project, and it’s time to ship it to production for real users (see Item 120: “Consider Module-Scoped Code to Configure Deployment Environments” for background). There are a variety of options for doing this with package managers (see Item 116: “Know Where to Find Community-Built Modules”). However, an often easier way is to simply copy the source code and dependencies to a server (or into a container image).
To that end, I’ve pulled together my application and all of its related modules into a directory—similar to the site-packages directory created by tools like pip (see Item 117: “Use Virtual Environments for Isolated and Reproducible Dependencies”):
$ ls flask_deps
Jinja2-3.1.3.dist-info
MarkupSafe-2.1.5.dist-info
blinker
blinker-1.7.0.dist-info
click
click-8.1.7.dist-info
flask
flask-3.0.2.dist-info
itsdangerous
itsdangerous-2.1.2.dist-info
jinja2
markupsafe
myapp.py
werkzeug
werkzeug-3.0.1.dist-info
These dependencies include more than 330 files and 56,000 lines of source code, with an uncompressed size of 5.1 MB. Copying this many relatively small files to another server can be annoyingly slow. Such transfers can also unexpectedly change important details like file permissions. In the past, a common way to work around these pitfalls was to archive a codebase into a zip file before deployment.
To make archives like this easier to work with, Python has the zipimport built-in module. It enables programs to be decompressed and loaded on the fly from zip files that appear in the PYTHONPATH environment variable or sys.path list. Here I create a zip archive of the flask_deps directory and then verify that it’s working correctly when executed directly from a zip file:
$ cd flask_deps
$ zip -r ../flask_deps.zip *
$ cd ..
$ PYTHONPATH=flask_deps.zip python3 -m flask --app=myapp routes
Endpoint Methods Rule
----------- ------- -----------------------
hello_world GET /
static GET /static/<path:filename>
You might expect that there’s a performance penalty in loading Python modules from a zip file due to the CPU overhead of decompression. Here I measure the startup time loading from a zip archive:
$ time PYTHONPATH=flask_deps.zip python3 -m flask --app=myapp
➥routes
...
real 0m0.123s
user 0m0.097s
sys 0m0.022s
And here I measure startup time loading from plain files on disk:
$ time PYTHONPATH=flask_deps python3 -m flask --app=myapp
➥routes
Endpoint Methods Rule
----------- ------- -----------------------
hello_world GET /
static GET /static/<path:filename>
real 0m0.126s
user 0m0.098s
sys 0m0.023s
The performance is nearly identical. There are two main reasons for this. First, modern computers have a huge amount of processing power compared to their I/O capacity and memory bandwidth, so the slowdown from additional decompression is often negligible. Second, large file system caches and SSD (solid-state drive) performance can practically hide I/O delays for relatively small amounts of data (see Item 97: “Rely on Precompiled Bytecode and File System Caching to Improve Startup Time” for details). Although the flask_deps.zip file is 1.6 MB compared to the uncompressed directory size of 5.1 MB, the performance difference is effectively zero.
One conclusion might be that you should always compress your Python programs into zip files as it seems like there would be no downsides. Python even provides the zipapp built-in module for rapidly archiving whole applications because of this benefit. Here I use this tool to create a compressed, single-file executable (with the .pyz suffix) for my web application that’s easy to copy around and interact with:
$ python -m zipapp flask_deps -m "flask.__main__:main" -p
➥'/usr/bin/env python3' -c
$ ./flask_deps.pyz --app myapp routes
Endpoint Methods Rule
----------- ------- -----------------------
hello_world GET /
static GET /static/<path:filename>
Unfortunately, executing Python code from zip files causes real programs to break in two ways: data file accesses and extension modules.
As an example of the first issue, here I create a zip archive of the Django web framework and try to run a web application that depends on it:
$ python3 -m compileall django
$ zip -r django.zip Django-5.0.3.dist-info django
$ rm -R Django-5.0.3.dist-info django
$ PYTHONPATH=django.zip python3 django_project/manage.py check
Traceback (most recent call last):
...
OSError: No translation files found for default language en-us.
This didn’t work because Django is looking for the translations data file next to the source files. In the Django code excerpt below, the value of the localedir variable is ".../django.zip/django/conf/locale", which is not a directory on the filesystem. When that path is passed to the gettext module to load translations, the files can’t be found by the Django library code, causing the OSError exception shown above:
# trans_real.py
# Copyright (c) Django Software Foundation and
# individual contributors. All rights reserved.
class DjangoTranslation(gettext_module.GNUTranslations):
...
def _init_translation_catalog(self):
settingsfile = \ sys.modules[settings.__module__].__file__
localedir = os.path.join(
os.path.dirname(settingsfile),
"locale",
)
translation = self._new_gnu_trans(localedir)
self.merge(translation)
...
Python provides the pkgutil built-in module to work around this problem. It intelligently inspects modules to determine how to properly access their data resources even if they’re in zip archives or require a custom module loader. Here I use pkgutil to load the translations file that Django couldn’t find due to the zip archive:
# django_pkgutil.py
import pkgutil
data = pkgutil.get_data(
"django.conf.locale",
"en/LC_MESSAGES/django.po",
)
print(data.decode("utf-8"))
>>>
# This file is distributed under the same license as the Django
➥ package.
#
msgid ""
msgstr ""
"Project-Id-Version: Django\n"
"Report-Msgid-Bugs-To: \n"
...
Few projects actually use pkgutil; even an extremely popular project like Django doesn’t use it. Python programs are most commonly executed as files on disk with their original directory structure. In contrast, other languages compile programs into an executable that is placed into a separate build artifacts directory, far from the code. This causes programmers to assume that they can’t access the source tree and need to handle data dependencies more explicitly. With Python code, however, the assumption is that the code is nearby, and thus the data files in the source tree must also be nearby. Don’t expect common packages to work when imported from zip archives.
The second issue is that you can’t import native extension modules (see Item 96: “Consider Extension Modules to Maximize Performance and Ergonomics”) from zip archives due to operating system constraints. Here I show how this breaks for the NumPy package:
$ zip -r ./numpy.zip numpy numpy-1.26.4.dist-info
$ rm -R numpy numpy-1.26.4.dist-info
$ PYTHONPATH=numpy.zip python -c 'import numpy'
Traceback (most recent call last):
...
ModuleNotFoundError: No module named
➥'numpy.core._multiarray_umath'
During handling of the above exception, another exception
➥ occurred:
Traceback (most recent call last):
...
ImportError:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS
➥ ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
...
Extension modules are extremely valuable and popular because they help Python go faster for CPU-intensive tasks (see Item 96: “Consider Extension Modules to Maximize Performance and Ergonomics”). This is ultimately the biggest deal-breaker for both zipimport and zipapps.
Fortunately, the Python community has built a variety of open source solutions that are better at deploying Python applications. The Pex tool (https://github.com/pex-tool/pex) and a derivative project, Shiv (https://github.com/linkedin/shiv), provide similar functionality to zipapp, but these tools automatically work around the problems with data files and native modules. For example, here I use Pex to create a single executable file for the same Django web application from earlier—and this one actually works:
$ pip install -e django_project
$ pex django_project -o myapp.pex
$ ./django_project.pex -m manage check
System check identified no issues (0 silenced).
Another alternative is PyInstaller (https://pyinstaller.org), which goes even further by bundling the Python executable itself so the user doesn’t need anything else installed on their system in order to run an application. Whatever route you decide to take, be sure to read the documentation carefully and experimentally verify that it’s compatible with the modules you need to use and the assumptions they make about their execution environment.
Python has the ability to load modules directly from zip archives, which makes it easier to deploy whole applications as a single file.
Many common open source Python packages break when imported from a zip archive due to reliance on data files and extension modules.
The community has built alternatives to Python’s built-in zipapp module, such as Pex, which provide the deployment benefits of zip archives without the downsides.