The idioms of a programming language are defined by its users. Over the years, the Python community has come to use the adjective Pythonic to describe code that follows a particular style. The Pythonic style isn’t regimented or enforced by the compiler. It has emerged over time through experience using the language and working with others. Python programmers prefer to be explicit, to choose simple over complex, and to maximize readability. (Type import this into your interpreter to read The Zen of Python.)
Programmers familiar with other languages may try to write Python as if it’s C++, Java, or whatever they know best. New programmers may still be getting comfortable with the vast range of concepts that can be expressed in Python. It’s important for you to know the best—the Pythonic—way to do the most common things in Python. These patterns will affect every program you write.
Throughout this book, the majority of example code is for Python 3.13 (released in October 2024). This book does not cover Python 2, although it sometimes mentions older versions of Python 3 to provide background information about how the language has evolved over time.
Many computer operating systems ship with multiple versions of the standard CPython interpreter preinstalled. However, the default meaning of python on the command line may not be clear. python is usually an alias for python2.7, but it can sometimes be an alias for even older versions, like python2.6 or python2.5. To find out exactly which version of Python you’re using, you can use the --version flag:
$ python --version
Python 2.7.10
On many systems, Python 2 is no longer installed, and the python command causes an error:
$ python --version
-bash: python: command not found
Python 3 is usually available under the name python3:
$ python3 --version
Python 3.12.3
To use alternative Python runtimes, such as PyPy (https://www.pypy.org), to run Python programs, you need to use their specific commands:
$ pypy3 --version
Python 3.10.14 (75b3de9d9035, May 28 2024, 18:06:40)
[PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.3.9.4)]
You can also figure out the version of Python you’re using at runtime by inspecting values in the sys built-in module:
import sys
print(sys.platform)
print(sys.implementation.name)
print(sys.version_info)
print(sys.version)
>>>
darwin
cpython
sys.version_info(major=3, minor=12, micro=3,
➥releaselevel='final', serial=0)
3.12.3 (main, Apr 9 2024, 08:09:14)
➥[Clang 15.0.0 (clang-1500.3.9.4)]
For a long time, the Python core developers and community were actively maintaining support for both Python 2 and Python 3. The versions are different in significant ways and have incompatibilities that made porting difficult. The migration from version 2 to version 3 was an extremely long and painful period that finally came to an end on April 20, 2020, when Python version 2.7.18 was published. This was the final official release of Python 2. For anyone who still needs security patches and bug fixes for Python 2, the only remaining options are to pay a commercial software vendor for support or do it yourself.
Since then, the Python core developers and community have been focused on Python version 3. The functionality of the core language, the standard library, and the ecosystem of packages and tools are constantly being improved. Keeping up with all the changes and innovations that are happening can be overwhelming. One good way to find out about what’s new is to read the release notes (https://docs.python.org/3/whatsnew/index.html), which highlight additions and changes for each version. There are other websites out there that will also notify you when the community packages you rely on are updated (see Item 116: “Know Where to Find Community-Built Modules”).
Python 3 is the most up-to-date and well-supported version of Python, and you should use it for your projects.
Be sure that the command-line executable for running Python on your system is the version you expect it to be.
Python 2 is no longer officially maintained by the core developers.
Python Enhancement Proposal #8, otherwise known as PEP 8, is the style guide for how to format Python code. You are welcome to write Python code any way you want, as long as it has valid syntax. However, using a consistent style makes your code more approachable and easier to read. Sharing a common style with other Python programmers in the larger community facilitates collaboration on projects. But even if you are the only one who will ever read your code, following the style guide will make it easier for you to change things later and can help you avoid many common errors.
PEP 8 provides a wealth of details about how to write clear Python code. It continues to be updated as the Python language evolves. It’s worth reading the whole guide online (https://www.python.org/dev/peps/pep-0008/). Here are a few rules you should be sure to follow.
In Python, whitespace is syntactically significant. Python programmers are especially sensitive to the effects of whitespace on code clarity. Follow these guidelines related to whitespace:
Use spaces instead of tabs for indentation.
Use four spaces for each level of syntactically significant indenting.
Lines should be 79 characters in length or less.
Continuations of long expressions onto additional lines should be indented by four extra spaces from their normal indentation level.
In a file, functions and classes should be separated by two blank lines.
In a class, methods should be separated by one blank line.
In a dictionary, put no whitespace between each key and colon; put a single space before the corresponding value if it fits on the same line.
Put one—and only one—space before and after the = operator in a variable assignment.
For type annotations, ensure that there is no separation between the variable name and the colon, and use a space before the type information.
PEP 8 suggests unique styles of naming for different parts in the language. These conventions make it easy to distinguish which type corresponds to each name when reading code. Follow these guidelines related to naming:
Functions, variables, and attributes should be in lowercase_underscore format.
Protected instance attributes should be in _leading_underscore format.
Private instance attributes should be in __double_leading_underscore format.
Classes (including exceptions) should be in CapitalizedWord format.
Module-level constants should be in ALL_CAPS format.
Instance methods in classes should use self, which refers to the object, as the name of the first parameter.
Class methods should use cls, which refers to the class, as the name of the first parameter.
The Zen of Python states: “There should be one—and preferably only one—obvious way to do it.” PEP 8 attempts to codify this style in its guidance for expressions and statements:
Use inline negation (if a is not b) instead of negation of positive expressions (if not a is b).
Don’t check for empty containers or sequences (like [] or "") by comparing the length to zero (if len(somelist) == 0). Use if not somelist and assume that empty values will implicitly evaluate to False.
The same thing goes for non-empty containers or sequences (like [1] or "hi"). The statement if somelist is implicitly True for non-empty values.
Avoid single-line if statements, for and while loops, and except compound statements. Spread these over multiple lines for clarity.
If you can’t fit an expression on one line, surround it with parentheses and add line breaks and indentation to make it easier to read.
Prefer surrounding multiline expressions with parentheses over using the \ line continuation character.
PEP 8 suggests some guidelines for how to import modules and use them in your code:
Always put import statements (including from x import y) at the top of a file.
Always use absolute names for modules when importing them, not names relative to the current module’s own path. For example, to import the foo module from within the bar package, you should use from bar import foo, not just import foo.
If you must do relative imports, use the explicit syntax from . import foo.
Imports should be in sections in the following order: standard library modules, third-party modules, your own modules. Each subsection should have imports in alphabetical order.
If what you’ve read so far seems like a lot to remember, I have good news: The Python community is coalescing around a common tool for automatic PEP 8 formatting: It’s called black (https://github.com/psf/black), and it’s an official Python Software Foundation project. black provides very few configuration options, which makes it easy for developers working on the same codebase to agree on the style of code. Installing and using black is straightforward:
$ pip install black
$ python -m black example.py
reformatted example.py
All done!
1 file reformatted.
Besides black, there are many other community tools to help you improve your source code automatically. Many IDEs and editors include style-checking tools, auto-formatters, and similar plug-ins. One popular code analyzer is pylint (https://github.com/pylint-dev/pylint); it helps enforce the PEP 8 style guide and detects many other types of common errors in Python programs (see Item 3: “Never Expect Python to Detect Errors at Compile Time” for more examples).
Always follow the Python Enhancement Proposal #8 (PEP 8) style guide when writing Python code.
Sharing a common style with the larger Python community facilitates collaboration with others.
Using a consistent style makes it easier to modify your own code later.
Community tools like black and pylint can automate compliance with PEP 8, making it easy to keep your source code in good style.
When loading a Python program and preparing for execution, the source code is parsed into abstract syntax trees and checked for obvious structural errors. For example, a poorly constructed if statement will raise a SyntaxError exception indicating what’s wrong with the code:
if True # Bad syntax
print('hello')
>>>
Traceback ...
SyntaxError: expected ':'
Errors in value literals will also be detected early and raise exceptions:
1.3j5 # Bad number
>>>
Traceback ...
SyntaxError: invalid imaginary literal
Unfortunately, that’s about all the protection you can expect from Python before execution. Anything beyond basic tokenization errors and parse errors will not be flagged as a problem.
Even simple functions that seem to have obvious errors will not be reported as having problems before program execution due to the highly dynamic nature of Python. For example, here I define a function where the my_var variable is clearly not assigned before it’s passed to print:
def bad_reference():
print(my_var)
my_var = 123
But this won’t raise an exception until the function is executed:
bad_reference()
>>>
Traceback ...
UnboundLocalError: cannot access local variable 'my_var'
➥where it is not associated with a value
The reason this isn’t considered a static error is because it’s valid for Python programs to dynamically assign local and global variables. For example, here I define a function that is valid or not depending on the input argument:
def sometimes_ok(x):
if x:
my_var = 123
print(my_var)
This call runs fine:
sometimes_ok(True)
>>>
123
This one causes a runtime exception:
sometimes_ok(False)
>>>
Traceback ...
UnboundLocalError: cannot access local variable 'my_var'
➥where it is not associated with a value
Python also won’t catch math errors upfront. It would seem that this is clearly an error before the program executes:
def bad_math():
return 1 / 0
But it’s possible for the meaning of the division operator to vary based on the values involved, so checking for errors like this is similarly deferred until runtime:
bad_math()
>>>
Traceback ...
ZeroDivisionError: division by zero
Python also won’t statically detect problems with undefined methods, too many or too few supplied arguments, mismatched return types, and many more seemingly obvious issues. There are community tools that can help you detect some of these errors before execution, such as the flake8 linter (https://github.com/PyCQA/flake8) and type checkers that work with the typing built-in module (see Item 124: “Consider Static Analysis via typing to Obviate Bugs”).
Ultimately, when writing idiomatic Python, you’re going to encounter most errors at runtime. The Python language prioritizes runtime flexibility over compile-time error detection. For this reason, it’s important to check that your assumptions are correct at runtime (see Item 81: “assert Internal Assumptions and raise Missed Expectations”) and verify the correctness of your code with automated tests (see Item 109: “Prefer Integration Tests over Unit Tests”).
Python defers nearly all error checking until runtime, including detection of problems that seem like they should be obvious during program startup.
Community projects like linters and static analysis tools can help catch some of the most common sources of errors before program execution.
Python’s pithy syntax makes it easy to write single-line expressions that implement a lot of logic. For example, say that I want to decode the query string from a website URL. Here each query string parameter represents an integer value:
from urllib.parse import parse_qs
my_values = parse_qs("red=5&blue=0&green=",
keep_blank_values=True)
print(repr(my_values))
>>>
{'red': ['5'], 'blue': ['0'], 'green': ['']}
Some query string parameters may have multiple values, some may have single values, some may be present but have blank values, and some may be missing entirely. Using the get method on the result dictionary will return different values in each circumstance:
print("Red: ", my_values.get("red"))
print("Green: ", my_values.get("green"))
print("Opacity:", my_values.get("opacity"))
>>>
Red: ['5']
Green: ['']
Opacity: None
It’d be nice if a default value of 0 were assigned when a parameter isn’t supplied or is blank. I might initially choose to do this with Boolean expressions because it feels like this logic doesn’t merit a whole if statement or helper function quite yet.
Python’s syntax makes this choice all too easy. The trick here is that the empty string, the empty list, and zero all evaluate to False implicitly. Thus, the expressions below will evaluate to the subexpression after the or operator when the first subexpression is False:
# For query string 'red=5&blue=0&green='
red = my_values.get("red", [""])[0] or 0
green = my_values.get("green", [""])[0] or 0
opacity = my_values.get("opacity", [""])[0] or 0
print(f"Red: {red!r}")
print(f"Green: {green!r}")
print(f"Opacity: {opacity!r}")
>>>
Red: '5'
Green: 0
Opacity: 0
The red case works because the key "red" is present in the my_values dictionary. The value retrieved by the get method is a list with one member: the string "5". This item is retrieved by accessing index 0 in the list. Then the or expression determines that the string is not empty and thus is the resulting value of that operation. Finally, the variable red is assigned to the value "5".
The green case works because the value in the my_values dictionary is a list with one member: an empty string. The item at index 0 in the list is retrieved. The or expression determines that the string is empty, and thus its return value should be the right-side argument to the operation, which is 0. Finally, the variable green is assigned to the value 0.
The opacity case works because the value in the my_values dictionary is missing altogether. The behavior of the get method is to return its second argument if the key doesn’t exist in the dictionary (see Item 26: “Prefer get over in and KeyError to Handle Missing Dictionary Keys”). The default value in this case is a list with one member: an empty string. Thus, when opacity isn’t found in the dictionary, this code does exactly the same thing as the green case.
The complex expression with get, [""], [0], and or is difficult to read, and yet it still doesn’t do everything I need. I also want to ensure that all the parameter values are converted to integers so I can immediately use them in mathematical expressions. To do that, I wrap each expression with the int built-in function to parse the string as an integer:
red = int(my_values.get("red", [""])[0] or 0)
This logic is now extremely hard to read. There’s so much visual noise. The code isn’t approachable. A new reader of the code would have to spend too much time picking apart the expression to figure out what it actually does. Even though it’s nice to keep things short, it’s not worth trying to fit this all on one line.
Although Python does support conditional expressions for inline if/else behavior, using them in this situation results in code that’s not much clearer than the Boolean operator example above (see Item 7: “Consider Conditional Expressions for Simple Inline Logic”):
red_str = my_values.get("red", [""])
red = int(red_str[0]) if red_str[0] else 0
Alternatively, I can use a full if statement over multiple lines to implement the same logic. Seeing all of the steps spread out like this makes the dense version seem even more complex:
green_str = my_values.get("green", [""])
if green_str[0]:
green = int(green_str[0])
else:
green = 0
Now that this logic is spread across multiple lines, it’s a bit harder to copy and paste for assigning other variables (e.g., red). If I want to reuse this functionality repeatedly—even just two or three times, as in this example—then writing a helper function is the way to go:
def get_first_int(values, key, default=0):
found = values.get(key, [""])
if found[0]:
return int(found[0])
return default
The calling code is much clearer than the complex expression using or and the two-line version using the conditional expression:
green = get_first_int(my_values, "green")
As soon as your expressions get complicated, it’s time to consider splitting them into smaller pieces—such as intermediate variables—and moving logic into helper functions. What you gain in readability always outweighs what brevity may have afforded you. Avoid letting Python’s pithy syntax for complex expressions get you into a mess like this. Follow the DRY principle: Don’t repeat yourself.
Python’s syntax makes it all too easy to write single-line expressions that are overly complicated and difficult to read.
Move complex expressions into helper functions, especially if you need to use the same logic repeatedly.
Python has a built-in tuple type that can be used to create immutable, ordered sequences of values (see Item 56: “Prefer dataclasses for Creating Immutable Objects” for similar data structures). A tuple can be empty, or it can contain a single item:
no_snack = ()
snack = ("chips",)
A tuple can also include multiple items, as in these key/value pairs from a dictionary:
snack_calories = {
"chips": 140,
"popcorn": 80,
"nuts": 190,
}
items = list(snack_calories.items())
print(items)
>>>
[('chips', 140), ('popcorn', 80), ('nuts', 190)]
The members in tuples can be accessed through numerical indexes and slices, just like in a list:
item = ("Peanut butter", "Jelly")
first_item = item[0] # Index
first_half = item[:1] # Slice
print(first_item)
print(first_half)
>>>
Peanut butter
('Peanut butter',)
Once a tuple is created, you can’t modify it by assigning a new value to an index:
pair = ("Chocolate", "Peanut butter")
pair[0] = "Honey"
>>>
Traceback ...
TypeError: 'tuple' object does not support item assignment
Python also has syntax for unpacking, which allows for assigning multiple values in a single statement. The patterns that you specify in unpacking assignments look a lot like trying to mutate tuples—which isn’t allowed—but they actually work quite differently. For example, if you know that a tuple is a pair, instead of using indexes to access its values, you can assign it to a tuple of two variable names:
item = ("Peanut butter", "Jelly")
first, second = item # Unpacking
print(first, "and", second)
>>>
Peanut butter and Jelly
Unpacking has less visual noise than accessing the tuple’s indexes, and it often requires fewer lines of code. The same pattern-matching syntax of unpacking works when assigning to lists, sequences, and multiple levels of arbitrary iterables within iterables. I don’t recommend doing the following in your code, but it’s important to know that it’s possible and how it works:
favorite_snacks = {
"salty": ("pretzels", 100),
"sweet": ("cookies", 180),
"veggie": ("carrots", 20),
}
((type1, (name1, cals1)),
(type2, (name2, cals2)),
(type3, (name3, cals3))) = favorite_snacks.items()
print(f"Favorite {type1} is {name1} with {cals1} calories")
print(f"Favorite {type2} is {name2} with {cals2} calories")
print(f"Favorite {type3} is {name3} with {cals3} calories")
>>>
Favorite salty is pretzels with 100 calories
Favorite sweet is cookies with 180 calories
Favorite veggie is carrots with 20 calories
Newcomers to Python may be surprised to learn that unpacking can even be used to swap values in place without the need to create temporary variables. Here I use typical syntax with indexes to swap the values between two positions in a list as part of an ascending-order sorting algorithm:
def bubble_sort(a):
for _ in range(len(a)):
for i in range(1, len(a)):
if a[i] < a[i - 1]:
temp = a[i]
a[i] = a[i - 1]
a[i - 1] = temp
names = ["pretzels", "carrots", "arugula", "bacon"]
bubble_sort(names)
print(names)
>>>
['arugula', 'bacon', 'carrots', 'pretzels']
However, with unpacking syntax, it’s possible to swap indexes in a single line:
def bubble_sort(a):
for _ in range(len(a)):
for i in range(1, len(a)):
if a[i] < a[i - 1]:
a[i - 1], a[i] = a[i], a[i - 1] # Swap
names = ["pretzels", "carrots", "arugula", "bacon"]
bubble_sort(names)
print(names)
>>>
['arugula', 'bacon', 'carrots', 'pretzels']
The way this swap works is that the right side of the assignment (a[i], a[i-1]) is evaluated first, and its values are put into a new temporary, unnamed tuple (such as ("carrots", "pretzels") on the first iteration of the loops). Then the unpacking pattern from the left side of the assignment (a[i-1], a[i]) is used to receive that tuple value and assign it to the variable names a[i-1] and a[i], respectively. This replaces "pretzels" with "carrots" at index 0 and "carrots" with "pretzels" at index 1. Finally, the temporary unnamed tuple silently goes away.
Another valuable application of unpacking is in the target lists of for loops and similar constructs, such as comprehensions and generator expressions (see Item 40: “Use Comprehensions Instead of map and filter” and Item 44: “Consider Generator Expressions for Large List Comprehensions”). For example, here I iterate over a list of snacks without using unpacking:
snacks = [("bacon", 350), ("donut", 240), ("muffin", 190)]
for i in range(len(snacks)):
item = snacks[i]
name = item[0]
calories = item[1]
print(f"#{i+1}: {name} has {calories} calories")
>>>
#1: bacon has 350 calories
#2: donut has 240 calories
#3: muffin has 190 calories
This works, but it’s noisy. There are a lot of extra characters required in order to index into the various levels of the snacks structure. Now I achieve the same output by using unpacking along with the enumerate built-in function (see Item 17: “Prefer enumerate over range”):
for rank, (name, calories) in enumerate(snacks, 1):
print(f"#{rank}: {name} has {calories} calories")
>>>
#1: bacon has 350 calories
#2: donut has 240 calories
#3: muffin has 190 calories
This is the Pythonic way to write this type of loop; it’s short and easy to understand. There’s usually no need to access anything by using indexes.
Python provides additional unpacking functionality for list construction (see Item 16: “Prefer Catch-All Unpacking over Slicing”), function arguments (see Item 34: “Reduce Visual Noise with Variable Positional Arguments”), keyword arguments (see Item 35: “Provide Optional Behavior with Keyword Arguments”), multiple return values (see Item 31: “Return Dedicated Result Objects Instead of Requiring Function Callers to Unpack More Than Three Variables”), structural pattern matching (see Item 9: “Consider match for Destructuring in Flow Control; Avoid When if Statements Are Sufficient”), and more.
Using unpacking wisely will enable you to avoid indexing when possible, resulting in clearer and more Pythonic code. However, these features are not without pitfalls to consider (see Item 6: “Always Surround Single-Element Tuples with Parentheses”). Unpacking also doesn’t work in assignment expressions (see Item 8: “Prevent Repetition with Assignment Expressions”).
Python has special syntax called unpacking for assigning multiple values in a single statement.
Unpacking is generalized in Python and can be applied to any iterable, including many levels of iterables within iterables.
You can reduce visual noise and increase code clarity by using unpacking to avoid explicitly indexing into sequences.
In Python there are four kinds of tuple literal values. The first kind is a comma-separated list of items inside open and close parentheses:
first = (1, 2, 3)
The second kind is just like the first but with an optional trailing comma included, which allows for consistency when going across multiple lines and eases editing:
second = (1, 2, 3,)
second_wrapped = (
1,
2,
3, # Optional comma
)
The third kind is a comma-separated list of items without any surrounding parentheses:
third = 1, 2, 3
And finally, the fourth kind is just like the third but with an optional trailing comma:
fourth = 1, 2, 3,
Python treats all of these constructions as the same value:
assert first == second == third == fourth
However, there are also three special cases in creating tuples that need to be considered. The first case is the empty tuple, which is merely open and close parentheses:
empty = ()
The second special case is the form of single-element tuples: You must include a trailing comma. If you leave out the trailing comma, then what you have is a parenthesized expression instead of a tuple:
single_with = (1,)
single_without = (1)
assert single_with != single_without
assert single_with[0] == single_without
And the third special case is similar to the second one except without the parentheses:
single_parens = (1,)
single_no_parens = 1,
assert single_parens == single_no_parens
This third special case—a trailing comma with no parentheses—can cause unexpected problems that are hard to diagnose. Consider the following function call from an e-commerce website that has a difficult-to-spot bug:
to_refund = calculate_refund(
get_order_value(user, order.id),
get_tax(user.address, order.dest),
adjust_discount(user) + 0.1),
You might expect that the return type is an integer, a float, or a decimal number containing the amount of money to be refunded to a customer. But, in fact, it’s a tuple!
print(type(to_refund))
>>>
<class 'tuple'>
The problem is the extraneous comma at the end of the final line. Removing the comma fixes the code:
to_refund2 = calculate_refund(
get_order_value(user, order.id),
get_tax(user.address, order.dest),
adjust_discount(user) + 0.1) # No trailing comma
print(type(to_refund2))
>>>
<class 'int'>
A comma character like this could be inserted into your code by accident, causing a change in behavior that’s hard to track down even upon close inspection. The errant separator could also be left over from editing the items in a tuple, list, set, or function call and forgetting to remove a leftover comma. This happens more often than you might expect!
Another problem with single-element tuples without surrounding parentheses is that they can’t be easily moved from assignments into expressions. For example, if I want to copy the single-element tuple 1, into a list, I have to surround it with parentheses. If I forget to do that, I end up passing more items or arguments to the surrounding form instead of a tuple:
value_a = 1, # No parentheses, right
list_b = [1,] # No parentheses, wrong
list_c = [(1,)] # Parentheses, right
print('A:', value_a)
print('B:', list_b)
print('C:', list_c)
>>>
A: (1,)
B: [1]
C: [(1,)]
A single-element tuple may also be on the left side of an assignment as part of the unpacking syntax (see Item 5: “Prefer Multiple-Assignment Unpacking over Indexing,” Item 31: “Return Dedicated Result Objects Instead of Requiring Function Callers to Unpack More Than Three Variables,” and Item 16: “Prefer Catch-All Unpacking over Slicing”). Surprisingly, all of these assignments are allowed, depending on the value returned, but they produce three different results:
def get_coupon_codes(user):
...
return [['DEAL20']]
...
(a1,), = get_coupon_codes(user)
(a2,) = get_coupon_codes(user)
(a3), = get_coupon_codes(user)
(a4) = get_coupon_codes(user)
a5, = get_coupon_codes(user)
a6 = get_coupon_codes(user)
assert a1 not in (a2, a3, a4, a5, a6)
assert a2 == a3 == a5
assert a4 == a6
Sometimes automatic source code formatting tools (see Item 2: “Follow the PEP 8 Style Guide”) and static analysis tools (see Item 3: “Never Expect Python to Detect Errors at Compile Time”) can make the trailing comma problem more visible. But often it goes unnoticed until a program or test suite starts acting strange. The best way to avoid this situation is to always write single-element tuples with surrounding parentheses, whether they’re on the left or the right of an assignment.
Tuple literal values in Python may have optional surrounding parentheses and optional trailing commas, except in a few special cases.
A single-element tuple requires a trailing comma after the one item it contains and may have optional surrounding parentheses.
It’s all too easy to have an extraneous trailing comma at the end of an expression, which changes the meaning of the expression into a single-element tuple that breaks a program.
Python if statements are not expressions. The if block, elif blocks, and else block each can contain a number of additional statements. The whole group of blocks doesn’t evaluate to a single value that can be stored in a variable or passed as a function argument.
Python also supports conditional expressions that let you insert if/elif/else behavior nearly anywhere an expression is allowed. For example, here I use a conditional expression to assign a variable’s value depending on a Boolean test:
i = 3
x = "even" if i % 2 == 0 else "odd"
print(x)
>>>
odd
This expression structure seems convenient, especially for one-of-a-kind uses, and is reminiscent of the ternary operator you might know from C and other languages (e.g., condition ? true_value : false_value). For simple assignments like this, or even in function call argument lists (e.g., my_func(1 if x else 2)), conditional expressions can be a good choice for balancing brevity with flexibility in code.
It’s important to note one key detail about how conditional expressions in Python are different from ternary operators in other languages: In C, the test expression comes first; in Python, the expression to evaluate when the test expression is truthy comes first. For example, you might expect that the following code calls the fail function and raises an exception; instead, the fail function is never executed because the test condition is False:
def fail():
raise Exception("Oops")
x = fail() if False else 20
print(x)
>>>
20
if clauses in Python comprehensions have similar syntax and behavior for filtering (see Item 40: “Use Comprehensions Instead of map and filter” and Item 44: “Consider Generator Expressions for Large List Comprehensions”). For example, here I use the if clause in a list comprehension to only include even values of x when computing the resulting list:
result = [x / 4 for x in range(10) if x % 2 == 0]
print(result)
>>>
[0.0, 0.5, 1.0, 1.5, 2.0]
The expression to evaluate (x / 4) comes before the if test expression (x % 2 == 0), just like in a conditional expression.
Before conditional expressions were available in Python, people would sometimes use Boolean logic to implement similar behavior (see Item 4: “Write Helper Functions Instead of Complex Expressions” for details). For example, the following expression is equivalent to the conditional expression above:
x = (i % 2 == 0 and "even") or "odd"
This form of logic is quite confusing because you need to know that and returns the first falsey value or the last truthy value, while or returns the first truthy value or the last falsey value (see Item 23: “Pass Iterators to any and all for Efficient Short-Circuiting Logic” for details).
Also, the approach of using Boolean operators doesn’t work if you want to return a falsey value as a result of a truthy condition (e.g., x = (i % 2 == 0 and []) or [1] always evaluates to [1]). It’s all non-obvious and error prone, which is part of why conditional expressions were added to the language in the first place.
Now consider the same logic as a four-line if statement instead of the earlier single-line example:
if i % 2 == 0:
x = "even"
else:
x = "odd"
Although this is longer, it can be better for a few reasons. First, if I later want to do more inside each of the condition branches, like printing debugging information, I can without structurally changing the code:
if i % 2 == 0:
x = "even"
print("It was even!") # Added
else:
x = "odd"
I can also insert additional branches with elif blocks in the same statement:
if i % 2 == 0:
x = "even"
elif i % 3 == 0: # Added
x = "divisible by three"
else:
x = "odd"
If I really need to achieve brevity and put this logic in a single expression, I can do that by moving it all into a helper function that I call inline:
def number_group(i):
if i % 2 == 0:
return "even"
else:
return "odd"
x = number_group(i) # Short call
As an added benefit, the helper function can be reused in multiple places instead of being a one-off, as a conditional expression would be.
Whether you should use conditional expressions, full if statements, or if statements wrapped in helper functions is going to depend on the specific situation.
You should avoid conditional expressions when they must be split over multiple lines. For example, here the function calls I make are so long that the conditional expression must be line-wrapped with surrounding parentheses:
x = (my_long_function_call(1, 2, 3) if i % 2 == 0
else my_other_long_function_call(4, 5, 6))
This is quite difficult to read. And if you apply an auto-formatter (see Item 2: “Follow the PEP 8 Style Guide”) to this code, the conditional expression will likely be rewritten to use more lines of code than a standard if/else statement anyway:
x = (
my_long_function_call(1, 2, 3)
if i % 2 == 0
else my_other_long_function_call(4, 5, 6)
)
Another Python language feature to compare with conditional expressions is assignment expressions (see Item 8: “Prevent Repetition with Assignment Expressions”), which also allow statement-like behavior in expressions. The critical difference is that assignment expressions must be surrounded by parentheses when they’re used in an ambiguous context; conditional expressions do not require surrounding parentheses, and lacking parentheses can hurt readability.
For example, this if statement with an assignment expression in parentheses is permitted:
x = 2
y = 1
if x and (z := x > y):
...
But this if statement without wrapping parentheses is a syntax error:
if x and z := x > y:
...
>>>
Traceback ...
SyntaxError: cannot use assignment expressions with expression
With conditional expressions, parentheses aren’t required. It’s difficult to decipher what the original intent of the programmer was since both of these forms are allowed:
if x > y if z else w: # Ambiguous
...
if x > (y if z else w): # Clear
...
Assignment expressions also need surrounding parentheses when used inside a function call argument list:
z = dict(
your_value=(y := 1),
)
Leaving out the parentheses is a syntax error:
w = dict(
other_value=y := 1,
)
>>>
Traceback ...
SyntaxError: invalid syntax
Conditional expressions, in contrast, don’t require surrounding parentheses in this context, and the lack of parentheses can make code noisier and hard to read:
v = dict(
my_value=1 if x else 3,
)
The bottom line: Use your judgment. In many situations, conditional expressions can be valuable and improve clarity. Sometimes they’re better with surrounding parentheses and sometimes not. Conditional expressions can all too easily be overused to write obfuscated code that’s difficult for new readers to understand. When in doubt, choose a normal if statement.
Conditional expressions in Python allow you to put an if statement nearly anywhere an expression would normally go.
The order of the test expression, true result expression, and false result expression in a conditional expression is different than the order with ternary operators in other languages.
Don’t use conditional expressions in places where they increase ambiguity or harm readability for new readers of the code.
Prefer standard if statements and helper functions when it’s unclear whether conditional expressions provide a compelling benefit.
An assignment expression—also known as the walrus operator—is a new syntax feature introduced in Python 3.8 to solve a long-standing problem with the language that can cause code duplication. Whereas normal assignment statements are written a = b and pronounced “a equals b,” these assignments are written a := b and pronounced “a walrus b” (because := looks like a pair of eyeballs and tusks).
Assignment expressions are useful because they enable you to assign variables in places where assignment statements are disallowed, such as in the test expression of an if statement. An assignment expression’s value evaluates to whatever was assigned to the identifier on the left side of the walrus operator.
For example, say that I have a basket of fresh fruit that I’m trying to manage for a juice bar. Here I define the contents of the basket:
fresh_fruit = {
"apple": 10,
"banana": 8,
"lemon": 5,
}
When a customer comes to the counter to order some lemonade, I need to make sure there is at least one lemon in the basket to squeeze. Here I do this by retrieving the count of lemons and then using an if statement to check for a nonzero value:
def make_lemonade(count):
...
def out_of_stock():
...
count = fresh_fruit.get("lemon", 0)
if count:
make_lemonade(count)
else:
out_of_stock()
>>>
Making 5 lemons into lemonade
The problem with this seemingly simple code is that it’s noisier than it needs to be. The count variable is used only within the first block of the if statement. Defining count above the if statement causes it to appear to be more important than it really is—as if all code that follows, including the else block, will need to access the count variable, when that is not the case.
This pattern of fetching a value, checking to see if it’s truthy, and then using it is extremely common in Python. Many programmers try to work around the multiple references to count with a variety of tricks that hurt readability (see Item 4: “Write Helper Functions Instead of Complex Expressions” and Item 7: “Consider Conditional Expressions for Simple Inline Logic”). Luckily, assignment expressions were added to the language to streamline this type of code. Here I rewrite the example above using the walrus operator:
if count := fresh_fruit.get("lemon", 0):
make_lemonade(count)
else:
out_of_stock()
Although this is only one line shorter, it’s a lot more readable because it’s now clear that count is only relevant to the first block of the if statement. The assignment expression first assigns a value to the count variable and then evaluates that value in the context of the if statement to determine how to proceed with flow control. This two-step behavior—assign and then evaluate—is the fundamental nature of the walrus operator.
Lemons are quite potent, so only one is needed for my lemonade recipe, which means a nonzero, truthy check is good enough. If a customer orders a cider, though, I need to make sure that I have at least four apples. Here I do this by fetching the count value from the fresh_fruit dictionary and then using a comparison in the if statement test expression:
def make_cider(count):
...
count = fresh_fruit.get("apple", 0)
if count >= 4:
make_cider(count)
else:
out_of_stock()
>>>
Making cider with 10 apples
This has the same problem as the lemonade example, where the assignment of count puts distracting emphasis on that variable. Here I improve the clarity of this code by also using the walrus operator:
if (count := fresh_fruit.get("apple", 0)) >= 4:
make_cider(count)
else:
out_of_stock()
This works as expected and makes the code one line shorter. It’s important to note how I needed to surround the assignment expression with parentheses to compare it with 4 in the if statement. In the lemonade example, no surrounding parentheses were required because the assignment expression stood on its own as a nonzero, truthy check; it wasn’t a subexpression of a larger expression. As with other expressions, you should avoid surrounding assignment expressions with parentheses when possible to reduce visual noise.
Another common variation of this repetitive pattern occurs when I need to assign a variable in the enclosing scope depending on some condition and then reference that variable shortly afterward in a function call. For example, say that a customer orders some banana smoothies. In order to make them, I need to have at least two bananas’ worth of slices, or else an OutOfBananas exception is raised. Here I implement this logic in a typical way:
def slice_bananas(count):
...
class OutOfBananas(Exception):
pass
def make_smoothies(count):
...
pieces = 0
count = fresh_fruit.get("banana", 0)
if count >= 2:
pieces = slice_bananas(count)
try:
smoothies = make_smoothies(pieces)
except OutOfBananas:
out_of_stock()
>>>
Slicing 8 bananas
Making smoothies with 32 banana slices
The other common way to do this is to put the pieces = 0 assignment in the else block:
count = fresh_fruit.get("banana", 0)
if count >= 2:
pieces = slice_bananas(count)
else:
pieces = 0 # Moved
try:
smoothies = make_smoothies(pieces)
except OutOfBananas:
out_of_stock()
This second approach can feel odd because it means that the pieces variable has two different locations—in each block of the if statement—where it can be initially defined. This split definition technically works because of Python’s scoping rules (see Item 33: “Know How Closures Interact with Variable Scope and nonlocal”), but it isn’t easy to read or discover, which is why many people prefer the construct above, where the pieces = 0 assignment is first.
The walrus operator can be used to shorten this example by one line of code. This small change removes any emphasis on the count variable. Now it’s clearer that pieces will be important beyond the if statement:
pieces = 0
if (count := fresh_fruit.get("banana", 0)) >= 2: # Changed
pieces = slice_bananas(count)
try:
smoothies = make_smoothies(pieces)
except OutOfBananas:
out_of_stock()
Using the walrus operator also improves the readability of splitting the definition of pieces across both parts of the if statement. It’s easier to trace the pieces variable when the count definition no longer precedes the if statement:
if (count := fresh_fruit.get("banana", 0)) >= 2:
pieces = slice_bananas(count)
else:
pieces = 0 # Moved
try:
smoothies = make_smoothies(pieces)
except OutOfBananas:
out_of_stock()
One frustration that programmers who are new to Python often have is the lack of a flexible switch/case statement. The general style for approximating this type of functionality is to have a deep nesting of multiple if, elif, and else blocks.
For example, imagine that I want to implement a system of precedence so that each customer automatically gets the best juice available and doesn’t have to order. Here I define logic to make it so banana smoothies are served first, followed by apple cider, and then finally lemonade:
count = fresh_fruit.get("banana", 0)
if count >= 2:
pieces = slice_bananas(count)
to_enjoy = make_smoothies(pieces)
else:
count = fresh_fruit.get("apple", 0)
if count >= 4:
to_enjoy = make_cider(count)
else:
count = fresh_fruit.get("lemon", 0)
if count:
to_enjoy = make_lemonade(count)
else:
to_enjoy = "Nothing"
Ugly constructs like this are surprisingly common in Python code. Luckily, the walrus operator provides an elegant solution that can feel nearly as versatile as dedicated syntax for switch/case statements:
if (count := fresh_fruit.get("banana", 0)) >= 2:
pieces = slice_bananas(count)
to_enjoy = make_smoothies(pieces)
elif (count := fresh_fruit.get("apple", 0)) >= 4:
to_enjoy = make_cider(count)
elif count := fresh_fruit.get("lemon", 0):
to_enjoy = make_lemonade(count)
else:
to_enjoy = "Nothing"
The version that uses assignment expressions is only five lines shorter than the original, but the improvement in readability is vast due to the reduction in nesting and indentation. If you ever see the previous ugly constructs emerge in your code, I suggest that you move them over to using the walrus operator if possible (see Item 9: “Consider match for Destructuring in Flow Control; Avoid When if Statements Are Sufficient” for another approach).
Another common frustration of new Python programmers is the lack of a do/while loop construct. For example, say that I want to bottle juice as new fruit is delivered until there’s no fruit remaining. Here I implement this logic with a while loop:
def pick_fruit():
...
def make_juice(fruit, count):
...
bottles = []
fresh_fruit = pick_fruit()
while fresh_fruit:
for fruit, count in fresh_fruit.items():
batch = make_juice(fruit, count)
bottles.extend(batch)
fresh_fruit = pick_fruit()
This is repetitive because it requires two separate fresh_fruit = pick_fruit() calls: one before the loop to set initial conditions and another at the end of the loop to replenish the list of delivered fruit.
A strategy for improving code reuse in this situation is to use the loop-and-a-half idiom. This eliminates the redundant lines, but it also undermines the while loop’s contribution by making it a dumb infinite loop. Now all of the flow control of the loop depends on the conditional break statement:
bottles = []
while True: # Loop
fresh_fruit = pick_fruit()
if not fresh_fruit: # And a half
break
for fruit, count in fresh_fruit.items():
batch = make_juice(fruit, count)
bottles.extend(batch)
The walrus operator obviates the need for the loop-and-a-half idiom by allowing the fresh_fruit variable to be reassigned and then conditionally evaluated each time through the while loop. This solution is short and easy to read, and it should be the preferred approach in your code:
bottles = []
while fresh_fruit := pick_fruit(): # Changed
for fruit, count in fresh_fruit.items():
batch = make_juice(fruit, count)
bottles.extend(batch)
There are many other situations where assignment expressions can be used to eliminate redundancy (see Item 42: “Reduce Repetition in Comprehensions with Assignment Expressions” for an example). In general, when you find yourself repeating the same expression or assignment multiple times within a grouping of lines, it’s time to consider using assignment expressions in order to improve readability.
Assignment expressions use the walrus operator (:=) to both assign and evaluate variable names in a single expression, thus reducing repetition.
When an assignment expression is a subexpression of a larger expression, it must be surrounded with parentheses.
Although switch/case statements and do/while loops are not available in Python, their functionality can be emulated much more clearly by using assignment expressions.
match for Destructuring in Flow Control; Avoid When if Statements Are SufficientThe match statement is a relatively new Python feature, introduced in version 3.10. With so many distinct capabilities, the learning curve for match is steep: It feels like another mini-language embedded within Python, similar to the unique ergonomics of comprehensions (see Item 40: “Use Comprehensions Instead of map and filter” and Item 44: “Consider Generator Expressions for Large List Comprehensions”). At first glance, match statements appear to provide Python with long-sought-after behavior that’s similar to switch statements from other programming languages (see Item 8: “Prevent Repetition with Assignment Expressions” for another approach).
For example, say that I’m writing a vehicle assistant program that reacts to a traffic light’s color. Here I use a simple Python if statement for this purpose:
def take_action(light):
if light == "red":
print("Stop")
elif light == "yellow":
print("Slow down")
elif light == "green":
print("Go!")
else:
raise RuntimeError
I can confirm that this function works as expected:
take_action("red")
take_action("yellow")
take_action("green")
>>>
Stop
Slow down
Go!
To use the match statement, I can create case clauses corresponding to each of the if, elif, and else conditions:
def take_match_action(light):
match light:
case "red":
print("Stop")
case "yellow":
print("Slow down")
case "green":
print("Go!")
case _:
raise RuntimeError
Using a match statement seems better than using an if statement because I can remove repeated references to the light variable, and I can leave out the == operator for each conditional branch. However, this code still isn’t ideal because of how it uses string literals for everything. To fix this, what I’d normally do is create a constant at the module level for each light color and modify the code to use them, like this:
# Added these constants
RED = "red"
YELLOW = "yellow"
GREEN = "green"
def take_constant_action(light):
match light:
case RED: # Changed
print("Stop")
case YELLOW: # Changed
print("Slow down")
case GREEN: # Changed
print("Go!")
case _:
raise RuntimeError
>>>
Traceback ...
SyntaxError: name capture 'RED' makes remaining patterns
➥unreachable
Unfortunately, this code has an error—and a cryptic one at that. The issue is that the match statement assumes that simple variable names that come after the case keyword are capture patterns. To demonstrate what this means, here I shorten the match statement to have only a single branch that should match RED:
def take_truncated_action(light):
match light:
case RED:
print("Stop")
Now I call the function by passing GREEN. I expect the match light clause to be evaluated first and the light variable lookup in the current scope to resolve to "green". Next, I expect the case RED clause to be evaluated and the RED variable lookup to resolve to "red". These two values don’t match (i.e., "green" vs. "red"), and so I expect no output:
take_truncated_action(GREEN)
>>>
Stop
Surprisingly, the match statement executed the RED branch. Here I use print to figure out what’s happening:
def take_debug_action(light):
match light:
case RED:
print(f"{RED=}, {light=}")
take_debug_action(GREEN)
>>>
RED='green', light='green'
The case clause didn’t look up the value of RED. Instead, it assigned RED to the value of the light variable! What the match statement is doing is similar to the behavior of unpacking (see Item 5: “Prefer Multiple-Assignment Unpacking over Indexing”). Instead of case RED translating to light == RED, Python determines if the multiple assignment (RED,) = (light,) would execute without an error, similar to this:
def take_unpacking_action(light):
try:
(RED,) = (light,)
except TypeError:
# Did not match
...
else:
# Matched
print(f"{RED=}, {light=}")
The original syntax error above occurred because Python determines at compile time that the assignment (RED,) = (light,) will work for any value of light, and thus the subsequent clauses with case YELLOW and case GREEN are unreachable.
One work-around for this problem is to ensure that a . character is in the case clause’s variable reference. The presence of a dot operator causes Python to look up the attribute and do an equality test instead of treating the variable name as a capture pattern. For example, here I achieve the original intended behavior by using the enum built-in module and the dot operator to access each constant name:
import enum # Added
class ColorEnum(enum.Enum): # Added
RED = "red"
YELLOW = "yellow"
GREEN = "green"
def take_enum_action(light):
match light:
case ColorEnum.RED: # Changed
print("Stop")
case ColorEnum.YELLOW: # Changed
print("Slow down")
case ColorEnum.GREEN: # Changed
print("Go!")
case _:
raise RuntimeError
Although this code now works as expected, it’s hard to see the benefits of the match version over the simpler if version in the take_action function above. The if version is 9 lines versus 10 lines with match. The if version repeats the light == prefix for each branch, but the match version repeats the ColorEnum. prefix for the constants. Superficially, it seems like a wash. Why did Python add match statements to the language if they’re not a compelling feature?
match Is for DestructuringDestructuring is a programming language technique for extracting components from a complex nested data structure with minimal syntax. Python programmers use destructuring all the time without even thinking about it. For example, the multiple assignment of index, value to the return value of enumerate in this for loop is a form of destructuring (see Item 17: “Prefer enumerate over range”):
for index, value in enumerate("abc"):
print(f"index {index} is {value}")
>>>
index 0 is a
index 1 is b
index 2 is c
Python has supported destructuring assignments for deeply nested tuples and lists for a long time (see Item 16: “Prefer Catch-All Unpacking over Slicing”). The match statement extends the language to also support this unpacking-like behavior for dictionaries, sets, and user-defined classes solely for the purpose of control flow. The structural pattern matching technique that match enables is especially valuable when your code needs to deal with heterogeneous object graphs and semi-structured data. (Similar idioms in functional-style programming are algebraic data types, sum types, and tagged unions.)
For example, say that I want to search a binary tree and determine if it contains a given value. I can represent the binary tree as a three-item tuple, where the first index is the value, the second index is the left (lower-value) child, and the third index is the right (higher-value) child. None in the second position or third position indicates the absence of a child node. In the case of a leaf node, I can just put the value inline instead of using another nested tuple. Here I define a nested tree containing five values (7, 9, 10, 11, 13):
my_tree = (10, (7, None, 9), (13, 11, None))
I can implement a recursive function to test whether a tree contains a value by using simple if statements. The tree argument might be None (for an absent child node) or a non-tuple (for a leaf node), so this code needs to ensure that those conditions are handled before unpacking the three-tuple node representation:
def contains(tree, value):
if not isinstance(tree, tuple):
return tree == value
pivot, left, right = tree
if value < pivot:
return contains(left, value)
elif value > pivot:
return contains(right, value)
else:
return value == pivot
This function works as expected when the node values are comparable:
assert contains(my_tree, 9)
assert not contains(my_tree, 14)
Now I can rewrite this function using the match statement:
def contains_match(tree, value):
match tree:
case pivot, left, _ if value < pivot:
return contains_match(left, value)
case pivot, _, right if value > pivot:
return contains_match(right, value)
case (pivot, _, _) | pivot:
return pivot == value
Using match, the call to isinstance is eliminated, the unpacking assignment can be avoided, the structure of the code (using case clauses) is more regular, the logic is simpler and easier to follow, and the function is only seven lines of code instead of the nine lines required for the if version. This makes the match statement appear quite compelling (see Item 76: “Know How to Port Threaded I/O to asyncio” for another example).
In this function, the way that match works is each of the case clauses tries to extract the contents of the tree argument by using the given destructuring pattern. After Python determines that the structure matches, it evaluates any subsequent if clauses, which work similarly to if clauses in comprehensions. When the if clause, sometimes called a guard expression, evaluates to True, the indented statements for that case block are executed, and the rest are skipped. If no case clauses match the input value, then the match statement does nothing and falls through.
This code also uses the | pipe operator to add an or pattern to the final case branch. This allows the case clause to match either of the given patterns: (pivot, _, _) or pivot. As you might recall from the traffic light example above that tried to reference the RED constant, the second pattern (pivot) is a capture pattern that will match any value. Thus, when tree is not a tuple with the right structure, the code assumes that it’s a leaf value that should be tested for equality.
Now imagine that my requirements change yet again, and I want to use a class instead of a tuple to represent the nodes in my binary tree (see Item 29: “Compose Classes Instead of Deeply Nesting Dictionaries, Lists, and Tuples” for how to make that choice). Here I define a new class for nodes:
class Node:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
I can create another instance of the tree by using this class. Again, I specify leaf nodes simply by providing their value instead of wrapping them in an additional Node object:
obj_tree = Node(
value=10,
left=Node(value=7, right=9),
right=Node(value=13, left=11),
)
Modifying the if statement version of the contains function to handle the Node class is straightforward:
def contains_class(tree, value):
if not isinstance(tree, Node):
return tree == value
elif value < tree.value:
return contains_class(tree.left, value)
elif value > tree.value:
return contains_class(tree.right, value)
else:
return tree.value == value
The resulting code is similarly complex to the earlier version that used three-tuples. In some ways the class makes the function better (e.g., accessing object attributes instead of unpacking), and in other ways it makes the function worse (e.g., repetitive tree. prefixes).
I can also adapt the match version of the contains function to use the Node class:
def contains_match_class(tree, value):
match tree:
case Node(value=pivot, left=left) if value < pivot:
return contains_match_class(left, value)
case Node(value=pivot, right=right) if value > pivot:
return contains_match_class(right, value)
case Node(value=pivot) | pivot:
return pivot == value
The way this works is each case clause implicitly does an isinstance check to test whether the value of tree is a Node object. Then it extracts the object’s attributes using the capture patterns (pivot, left, right), similar to how tuple destructuring works. The capture variables can be used in guard expressions and case blocks to avoid more verbose attribute accesses (e.g., tree.left). The power and clarity provided by match works just as well with objects as it does with nested built-in data structures.
match also excels when the structure of data and its interpretation are decoupled. For example, a deserialized JSON object is merely a nesting of dictionaries, lists, strings, and numbers (see Item 54: “Consider Composing Functionality with Mix-in Classes” for an example). It lacks the clear encapsulation of responsibilities provided by an explicit class hierarchy (see Item 53: “Initialize Parent Classes with super”). But the way in which these basic JSON types are nested—the keys, values, and elements that are present at each level—gives the data semantic meaning that programs can interpret.
For example, imagine that I’m building billing software, and I need to deserialize customer records that are stored as JSON. Some of the records are for customers who are individuals, and other records are for customers that are businesses:
record1 = """{"customer": {"last": "Ross", "first": "Bob"}}"""
record2 = """{"customer": {"entity": "Steve's Painting Co."}}"""
I’d like to take these records and turn them into well-defined Python objects that I can use with my program’s data processing features, UI widgets, and so on (see Item 51: “Prefer dataclasses for Defining Lightweight Classes” for background):
from dataclasses import dataclass
@dataclass
class PersonCustomer:
first_name: str
last_name: str
@dataclass
class BusinessCustomer:
company_name: str
I can use the match statement to interpret the structure and values within the JSON data and map it to the concrete PersonCustomer and BusinessCustomer classes. This uses the match statement’s unique syntax for destructuring dictionary literals with capture patterns:
import json
def deserialize(data):
record = json.loads(data)
match record:
case {"customer": {"last": last_name,
"first": first_name}}:
return PersonCustomer(first_name, last_name)
case {"customer": {"entity": company_name}}:
return BusinessCustomer(company_name)
case _:
raise ValueError("Unknown record type")
This function works as expected on the records defined above and produces the objects I need:
print("Record1:", deserialize(record1))
print("Record2:", deserialize(record2))
>>>
Record1: PersonCustomer(first_name='Bob', last_name='Ross')
Record2: BusinessCustomer(company_name="Steve's Painting Co.")
These examples give you merely a small taste of what’s possible with match statements. There’s also support for set patterns, as patterns, positional constructor patterns (with __match_args__ customization), exhaustiveness checking with type annotations (see Item 124: “Consider Static Analysis via typing to Obviate Bugs”), and more. Given the intricacies, it’s best to refer to the official tutorial (https://peps.python.org/pep-0636/) to determine how to leverage match for your specific use case.
Although you can use match statements to replace simple if statements, doing so is error prone. The structural nature of capture patterns in case clauses is unintuitive for Python programmers who aren’t already familiar with the gotchas of match.
match statements provide a concise syntax for combining isinstance checks and destructuring behaviors with flow control. They’re especially useful when processing heterogeneous object graphs and interpreting the semantic meaning of semi-structured data.
case patterns can be used effectively with built-in data structures (e.g., lists, tuples, dictionaries) and user-defined classes, but each type has unique semantics that aren’t immediately obvious.