7

Classes and Interfaces

As an object-oriented programming language, Python supports a full range of features, such as inheritance, polymorphism, and encapsulation. Getting things done in Python often requires writing new classes and defining how they interact through their interfaces and relationships.

Classes and inheritance make it easy to express a Python program’s intended behaviors with objects. They allow you to improve and expand functionality over time. They provide flexibility in an environment of changing requirements. Knowing how to use classes and inheritance well enables you to write maintainable code.

Python is also a multi-paradigm language that encourages functional-style programming. Function objects are first class, meaning they can be passed around like normal variables. Python also allows you to use a mix of object-oriented-style and functional-style features in the same program, which can be even more powerful than each style on its own.

Item 48: Accept Functions Instead of Classes for Simple Interfaces

Many of Python’s built-in APIs allow you to customize behavior by passing in a function. These hooks are used by APIs to call back your code while they execute. For example, the list type’s sort method takes an optional key argument that’s used to determine each index’s value for sorting (see Item 100: “Sort by Complex Criteria Using the key Parameter” for details). Here I sort a list of names based on their lengths by providing the len built-in function as the key hook:

names = ["Socrates", "Archimedes", "Plato", "Aristotle"]
names.sort(key=len)
print(names)

>>>
['Plato', 'Socrates', 'Aristotle', 'Archimedes']

In other languages, you might expect hooks to be defined by an abstract class. In Python, many hooks are just stateless functions with well-documented arguments and return values. Functions are ideal for hooks because they are easier to describe and simpler to implement than classes. Functions work as hooks because Python has first-class functions: Functions and methods can be passed around and referenced like any other value in the language.

For example, say that I want to customize the behavior of the defaultdict class (see Item 27: “Prefer defaultdict over setdefault to Handle Missing Items in Internal State” for background). This data structure allows you to supply a function that will be called with no arguments each time a missing key is accessed. The function must return the default value that the missing key should have in the dictionary. Here I define a hook that logs each time a key is missing and returns 0 for the default value:

def log_missing():
    print("Key added")
    return 0

Given an initial dictionary and a set of desired increments, I can cause the log_missing function to run and print twice (for "red" and "orange"):

from collections import defaultdict

current = {"green": 12, "blue": 3}
increments = [
    ("red", 5),
    ("blue", 17),
    ("orange", 9),
]
result = defaultdict(log_missing, current)
print("Before:", dict(result))
for key, amount in increments:
    result[key] += amount
print("After: ", dict(result))

>>>
Before: {'green': 12, 'blue': 3}
Key added
Key added
After:  {'green': 12, 'blue': 20, 'red': 5, 'orange': 9}

Enabling functions like log_missing to be supplied helps APIs separate side effects from deterministic behavior. For example, say that I now want the default value hook passed to defaultdict to count the total number of missing keys. One way to achieve this is by using a stateful closure (see Item 33: “Know How Closures Interact with Variable Scope and nonlocal” for details). Here I define a helper function that uses such a closure as the default value hook:

def increment_with_report(current, increments):
    added_count = 0

    def missing():
        nonlocal added_count  # Stateful closure
        added_count += 1
        return 0

    result = defaultdict(missing, current)
    for key, amount in increments:
        result[key] += amount

    return result, added_count

Running this function produces the expected result (2), even though the defaultdict instance has no idea that the missing hook maintains state in the added_count closure variable:

result, count = increment_with_report(current, increments)
assert count == 2

The problem with defining a closure for stateful hooks is that it’s harder to read than the stateless function example. Another approach is to define a small class that encapsulates the state you want to track:

class CountMissing:
    def __init__(self):
        self.added = 0

    def missing(self):
        self.added += 1
        return 0

In other languages, you might expect that now defaultdict would have to be modified to accommodate the interface of CountMissing. But in Python, thanks to first-class functions, you can reference the CountMissing.missing method directly on an object and pass it to defaultdict as the default value hook. It’s trivial to have an object instance’s method satisfy a function interface:

counter = CountMissing()
result = defaultdict(counter.missing, current)  # Method ref
for key, amount in increments:
    result[key] += amount
assert counter.added == 2

Using a helper class like this to provide the behavior of a stateful closure is clearer than using the increment_with_report function, as above. However, in isolation, it’s still not immediately obvious what the purpose of the CountMissing class is. Who constructs a CountMissing object? Who calls the missing method? Will the class need other public methods to be added in the future? Until you see its usage with defaultdict, the class is a bit of a mystery.

To clarify this situation, Python classes can define the __call__ special method. __call__ allows an object to be called just like a function. It also causes the callable built-in function to return True for such an instance, just like a normal function or method. All objects that can be executed in this manner are referred to as callables:

class BetterCountMissing:
    def __init__(self):
        self.added = 0

    def __call__(self):
        self.added += 1
        return 0

counter = BetterCountMissing()
assert counter() == 0
assert callable(counter)

Here I use a BetterCountMissing instance as the default value hook for a defaultdict to track the number of missing keys that were added:

counter = BetterCountMissing()
result = defaultdict(counter, current)  # Relies on __call__
for key, amount in increments:
    result[key] += amount
assert counter.added == 2

This is much clearer than the CountMissing.missing example. The __call__ method indicates that a class’s instances will be used somewhere a function argument would also be suitable (like API hooks). It directs new readers of the code to the entry point that’s responsible for the class’s primary behavior. It provides a strong hint that the goal of the class is to act as a stateful closure.

Best of all, defaultdict still has no view into what’s going on when you use __call__. All that defaultdict requires is a callable for the default value hook. Python provides many different ways to satisfy a simple function interface, and you can choose the one that works best for what you need to accomplish.

Things to Remember

  • Images Instead of defining and instantiating classes, you can often simply use functions for simple interfaces between components in Python.

  • Images References to functions and methods in Python are first class, meaning they can be used in expressions (like any other type).

  • Images The __call__ special method enables instances of a class to be called like plain Python functions and pass callable checks.

  • Images When you need a function to maintain state, consider defining a class that provides the __call__ method instead of implementing a stateful closure function.

Item 49: Prefer Object-Oriented Polymorphism over Functions with isinstance Checks

Imagine that I want to create a pocket calculator that receives simple formulas as input and computes the solution. To do this, I would normally tokenize and parse the provided text and create an abstract syntax tree (AST) to represent the operations to perform, similar to what the Python compiler does when it’s loading programs. For example, here I define three AST classes for the addition and multiplication of two integers:

class Integer:
    def __init__(self, value):
        self.value = value

class Add:
    def __init__(self, left, right):
        self.left = left
        self.right = right

class Multiply:
    def __init__(self, left, right):
        self.left = left
        self.right = right

For a basic equation like 2 + 9, I can create the AST (bypassing tokenization and parsing) by directly instantiating objects:

tree = Add(
    Integer(2),
    Integer(9),
)

A recursive function can be used to evaluate an AST like this. For each type of operation it might encounter, I need to add another branch to one compound if statement. I can use the isinstance built-in function to direct control flow based on the type of AST object being evaluated (see Item 9: “Consider match for Destructuring in Flow Control; Avoid When if Statements Are Sufficient” for another way to do this):

def evaluate(node):
    if isinstance(node, Integer):
        return node.value
    elif isinstance(node, Add):
        return evaluate(node.left) + evaluate(node.right)
    elif isinstance(node, Multiply):
        return evaluate(node.left) * evaluate(node.right)
    else:
        raise NotImplementedError

And indeed, this approach to interpreting the AST—often called tree walking—works as expected:

print(evaluate(tree))
>>>
11

By calling the same evaluate function for every type of node, the system can support arbitrary nesting without additional complexity. For example, here I define an AST for the equation (3 + 5) * (4 + 7) and evaluate it without having to make any other code changes:

tree = Multiply(
    Add(Integer(3), Integer(5)),
    Add(Integer(4), Integer(7)),
)
print(evaluate(tree))

>>>
88

Now, imagine that the number of nodes I need to consider in the tree is significantly more than three. I need to handle subtraction, division, logarithms, and on and on. Mathematics has an enormous surface area, and there could be hundreds of node types. If I need to do everything in this one evaluate function, it’s going to get extremely long. Even if I add helper functions and call them inside the elif blocks, the overall if compound statement would be huge. There must be a better way.

One common approach to solving this problem is object-oriented programming (OOP). Instead of having one function that does everything for all types of objects, you encapsulate the functionality for each type right next to its data (in methods). Then you rely on polymorphism to dynamically dispatch method calls to the right subclass implementation at runtime. This has the same effect as the if compound statement and isinstance checks but does it in a way that doesn’t require defining everything in one gigantic function.

For this pocket calculator example, using OOP would begin with defining a superclass (see Item 53: “Initialize Parent Classes with super” for background) with the methods that should be common among all objects in the AST:

class Node:
    def evaluate(self):
        raise NotImplementedError

Each type of node would need to implement the evaluate method to compute the result that corresponds to the data contained within the object. Here I define this method for integers:

class IntegerNode(Node):
    def __init__(self, value):
        self.value = value

    def evaluate(self):
        return self.value

And here’s the implementation of evaluate for addition and multiplication operations:

class AddNode(Node):
    def __init__(self, left, right):
        self.left = left
        self.right = right

    def evaluate(self):
        left = self.left.evaluate()
        right = self.right.evaluate()
        return left + right

class MultiplyNode(Node):
    def __init__(self, left, right):
        self.left = left
        self.right = right

    def evaluate(self):
        left = self.left.evaluate()
        right = self.right.evaluate()
        return left * right

Creating objects representing the AST is straightforward, as before, but this time I can directly call the evaluate method on the tree object instead of having to use a separate helper function:

tree = MultiplyNode(
    AddNode(IntegerNode(3), IntegerNode(5)),
    AddNode(IntegerNode(4), IntegerNode(7)),
)
print(tree.evaluate())

>>>
88

The way this works is that the call to tree.evaluate will call the MultiplyNode.evaluate method with the tree instance. Then the AddNode.evaluate method will be called for the left node, which in turn calls IntegerNode.evaluate for values 3 and 5. After that, the AddNode.evaluate method is called for the right node, which similarly calls IntegerNode.evaluate for values 4 and 7. Critically, all of the decisions on which evaluate method implementation to call for each Node subclass occur at runtime. This is the key benefit of object-oriented polymorphism.

Later, I might need to extend the pocket calculator with more features. For example, I could add the ability for the calculator to print the formula that was inputted but with pretty formatting that’s consistent and easy to read. With OOP, I’d accomplish this by adding another abstract method to the superclass and implement it in each of the subclasses. Here I add the pretty method for this new purpose:

class NodeAlt:
    def evaluate(self):
        raise NotImplementedError

    def pretty(self):
        raise NotImplementedError

The implementation for integers is very simple:

class IntegerNodeAlt(NodeAlt):
    ...

    def pretty(self):
        return repr(self.value)

The add and multiply operations descend into their left and right branches to produce a formatted result:

class AddNodeAlt(NodeAlt):
    ...

    def pretty(self):
        left_str = self.left.pretty()
        right_str = self.right.pretty()
        return f"({left_str} + {right_str})"

class MultiplyNodeAlt(NodeAlt):
    ...

    def pretty(self):
        left_str = self.left.pretty()
        right_str = self.right.pretty()
        return f"({left_str} * {right_str})"

Much as with the evaluate method above, I can call the pretty method on the tree root in order to format the whole AST as a string:

tree = MultiplyNodeAlt(
    AddNodeAlt(IntegerNodeAlt(3), IntegerNodeAlt(5)),
    AddNodeAlt(IntegerNodeAlt(4), IntegerNodeAlt(7)),
)
print(tree.pretty())

>>>
((3 + 5) * (4 + 7))

With OOP you can add more and more AST methods and subclasses as the needs of your program grow. It’s not necessary to maintain one enormous function with dozens of isinstance checks. Each of the types can have its own self-contained implementation, which makes the code relatively easy to organize, extend, and test. Python also provides many additional features to make polymorphic code more useful (see Item 52: “Use @classmethod Polymorphism to Construct Objects Generically” and Item 57: “Inherit from collections.abc Classes for Custom Container Types”).

However, it’s also important to understand that OOP has serious limitations when solving certain types of problems, especially in large programs (see Item 50: “Consider functools.singledispatch for Functional-Style Programming Instead of Object-Oriented Polymorphism”).

Things to Remember

  • Images Python programs can use the isinstance built-in function to alter behavior at runtime based on the type of objects.

  • Images Polymorphism is an object-oriented programming (OOP) technique for dispatching a method call to the most specific subclass implementation at runtime.

  • Images Code that uses polymorphism among many classes instead of isinstance checks can be much easier to read, maintain, extend, and test.

Item 50: Consider functools.singledispatch for Functional-Style Programming Instead of Object-Oriented Polymorphism

In the pocket calculator example from Item 49: “Prefer Object-Oriented Polymorphism over Functions with isinstance Checks,” I showed how object-oriented programming (OOP) can make it easier to vary behavior based on the type of an object. At the end, I had a hierarchy of classes with different method implementations, like this:

class NodeAlt:
    def evaluate(self):
        raise NotImplementedError

    def pretty(self):
        raise NotImplementedError

class IntegerNodeAlt(NodeAlt):
    def __init__(self, value):
        self.value = value

    def evaluate(self):
        return self.value

    def pretty(self):
        return repr(self.value)

class AddNodeAlt(NodeAlt):
    ...

class MultiplyNodeAlt(NodeAlt):
    ...

This made it possible to call the recursive methods evaluate and pretty on the root of an abstract syntax tree (AST) that represents the calculation to perform:

tree = MultiplyNodeAlt(
    AddNodeAlt(IntegerNodeAlt(3), IntegerNodeAlt(5)),
    AddNodeAlt(IntegerNodeAlt(4), IntegerNodeAlt(7)),
)
print(tree.evaluate())
print(tree.pretty())

>>>
19
((3 + 5) * (4 + 7))

Now, imagine that instead of there being 2 methods required by the superclass, there were 25 of them: One method might simplify an equation; another would check for undefined variables; yet another could calculate the derivative; still another would produce LaTeX syntax; and so on. In the typical approach to OOP, I would add 25 new methods to each class that contains a node type’s data. This would make the class definition very large, especially considering all of the helper functions and supporting data structures that might be required. With so much code, I’d want to split these node class definitions across multiple modules (e.g., one file per node type) to improve code organization:

class NodeAlt2:
    def evaluate(self):
        ...

    def pretty(self):
        ...

    def solve(self):
        ...

    def error_check(self):
        ...

    def derivative(self):
        ...

    # And 20 more methods...

Unfortunately, this type of module-per-class code organization can cause serious maintainability problems in production systems. The critical issue is that all of the 25 new methods might actually be quite different from one other, even though they are somehow related to a pocket calculator. When you’re editing and debugging code, the view you need is within each of the larger, independent systems (e.g., solving, error checking), but with OOP these systems must be implemented across all of the classes. That means that in practice, for this hypothetical example, the OOP approach could cause you to jump between 25 different files in order to accomplish simple programming tasks. The code appears to be organized along the wrong axis. You’ll almost never need to look at two independent systems for a single class at the same time, but that’s how the source files are laid out.

Making matters worse, OOP code organization also conflates dependencies. For this example, the LaTeX-generating methods might need to import a native library for handling that format, the formula-solving methods might need a heavy-weight symbolic math module, and so on. If your code organization is class-centric, that means each module defining a class needs to import all of the dependencies for all of the methods (see Item 98: “Lazy-Load Modules with Dynamic Imports to Reduce Startup Time” for background). This prevents you from creating self-contained, well-factored systems of functionality, thus hampering extensibility, refactoring, and testability. Fortunately, OOP is not the only option.

Single dispatch is a functional-style programming technique in which a program decides which version of a function to call based on the type of one of its arguments. It behaves similarly to polymorphism, but it can also avoid many of OOP’s pitfalls. You can use single dispatch to essentially add methods to a class without modifying it. Python provides the singledispatch decorator in the functools built-in module for this purpose.

To use singledispatch, first I need to define the function that will do the dispatching. Here I create a function for custom object printing:

import functools

@functools.singledispatch
def my_print(value):
    raise NotImplementedError

This initial version of the function will be called as a last resort if no better option is found for the type of the first argument (value). I can specialize the implementation for a particular type by using the dispatching function’s register method as a decorator. Here I add implementations for the int and float built-in types:

@my_print.register(int)
def _(value):
    print("Integer!", value)

@my_print.register(float)
def _(value):
    print("Float!", value)

These functions use the underscore (_) to indicate that their names don’t matter and they won’t be called directly; all dispatching will occur through the my_print function. Here I show this working for the types I’ve registered so far:

my_print(20)
my_print(1.23)

>>>
Integer! 20
Float! 1.23

Going back to the pocket calculator example, I can use singledispatch to implement the evaluate functionality without OOP. First, I define a new dispatching function:

@functools.singledispatch
def my_evaluate(node):
    raise NotImplementedError

Then I add a type-specific implementation for the simple integer data structure:

class Integer:
    def __init__(self, value):
        self.value = value

@my_evaluate.register(Integer)
def _(node):
    return node.value

And I provide similar implementations for the simple operation data structures. Note how none of the data structures define any additional methods:

class Add:
    def __init__(self, left, right):
        self.left = left
        self.right = right

@my_evaluate.register(Add)
def _(node):
    left = my_evaluate(node.left)
    right = my_evaluate(node.right)
    return left + right

class Multiply:
    def __init__(self, left, right):
        self.left = left
        self.right = right

@my_evaluate.register(Multiply)
def _(node):
    left = my_evaluate(node.left)
    right = my_evaluate(node.right)
    return left * right

These functions work as expected when I call my_evaluate:

tree = Multiply(
    Add(Integer(3), Integer(5)),
    Add(Integer(4), Integer(7)),
)
result = my_evaluate(tree)
print(result)

>>>
88

Now, say that I want to implement equation pretty printing, as in Item 49: “Prefer Object-Oriented Polymorphism over Functions with isinstance Checks,” but without using OOP. I can do this simply by defining another singledispatch function and decorating implementation functions for each type I want to handle:

@functools.singledispatch
def my_pretty(node):
    raise NotImplementedError

@my_pretty.register(Integer)
def _(node):
    return repr(node.value)

@my_pretty.register(Add)
def _(node):
    left_str = my_pretty(node.left)
    right_str = my_pretty(node.right)
    return f"({left_str} + {right_str})"

@my_pretty.register(Multiply)
def _(node):
    left_str = my_pretty(node.left)
    right_str = my_pretty(node.right)
    return f"({left_str} * {right_str})"

And again, this works as expected without using OOP:

print(my_pretty(tree))

>>>
((3 + 5) * (4 + 7))

If I create a new type that is a subclass of a type I’ve already registered, it will immediately work with my_pretty without additional code changes because it follows method-resolution order, like inheritance (see Item 53: “Initialize Parent Classes with super”). For example, here I add a subclass of the Integer type and show that it can pretty print:

class PositiveInteger(Integer):
    ...

print(my_pretty(PositiveInteger(1234)))

>>>
1234

The difficulty with singledispatch arises when I create a new class. For example, calling the my_pretty function with a new type of object will raise a NotImplementedError exception because there’s no implementation registered to handle the new type:

class Float:
    ...

print(my_pretty(Float(5.678)))

>>>
Traceback ...
NotImplementedError

This is the fundamental trade-off in using function-style single dispatch: When you add a new type to the code, you need to add a corresponding implementation for every dispatch function you want to support. That might require modifying many or all of the independent modules in your program. In contrast, with object-oriented polymorphism, new classes might seem easier to add—just implement the required methods—but adding a new method to the system requires updating every class. Although there’s some friction with either approach, in my view, the burden with single dispatch is lower, and the benefits are numerous.

With single dispatch, you can have thousands of data structures and hundreds of behaviors in the program without polluting the class definitions with methods. This allows you to create independent systems of behavior in completely separate modules with no interdependencies on each other and a narrow set of external dependencies. Simple data structures can live at the bottom of your program’s dependency tree and be shared across the whole codebase without high coupling. Using the single dispatch approach like this organizes the code on the correct axis: All of the related behaviors are together instead of spread across countless modules where OOP classes reside. Ultimately, this makes it easier to maintain, debug, extend, refactor, and test your code. That said, OOP can still be a good choice when your classes share common functionality and the larger systems are more interconnected.

The choice between these code structures comes down to how independent the program’s components are and how much common data or behavior they share. You can also mix OOP and single dispatch together to benefit from the best attributes of both styles. For example, you could add utility methods to the simple classes that are common across all of the independent systems.

Things to Remember

  • Images Object-oriented programming leads to class-centric code organization, which can make it difficult to build and maintain large programs because behavior is spread out across many modules.

  • Images Single dispatch is an alternative approach for achieving dynamic dispatch using functions instead of method polymorphism, making it possible to bring related functionality closer together in source code.

  • Images Python’s functools built-in module has a singledispatch decorator that can be used to implement single dispatch behaviors.

  • Images Programs with highly independent systems that operate on the same underlying data might benefit from the functional style of single dispatch instead of OOP.

Item 51: Prefer dataclasses for Defining Lightweight Classes

It’s easy to start writing Python programs by using built-in data types (such as strings and dictionaries) and defining functions that interact with them. At some point, your code will get complex enough that creating your own types of objects to contain data and encapsulate behavior is warranted (see Item 29: “Compose Classes Instead of Deeply Nesting Dictionaries, Lists, and Tuples” for an example).

However, Python’s vast set of object-oriented capabilities can be overwhelming, especially for beginners. To help make these features more approachable, Python provides a variety of built-in modules (see Item 57: “Inherit from collections.abc Classes for Custom Container Types”), and there are many community-built packages as well (such as attrs and pydantic—see Item 116: “Know Where to Find Community-Built Modules”).

One especially valuable built-in module is dataclasses, which you can use to greatly reduce the amount of repetitive code in class definitions. The cost of using the module is a small performance overhead at import time due to how its implementation uses exec (see Item 91: “Avoid exec and eval Unless You’re Building a Developer Tool”). But it’s well worth it, especially for classes with few or no methods that exist primarily to store data in attributes.

The potential benefits of dataclasses become most clear when you consider how much effort it takes to build each of its features yourself (see Item 56: “Prefer dataclasses for Creating Immutable Objects” for more examples). Understanding how these common object-oriented idioms work under the hood is also important so you can migrate your code away from dataclasses when you inevitably need more flexibility or customization.

Avoiding __init__ Boilerplate

The first thing to do with objects is to create them. The __init__ special method is called to construct an object when a class name is invoked like a function call. For example, here I define a simple class to store an RGB (red, green, blue) color value:

class RGB:
    def __init__(self, red, green, blue):
        self.red = red
        self.green = green
        self.blue = blue

This code is verbose, repeating the name of each attribute three times. It’s also error prone because there are many opportunities to insert typos or accidentally assign an attribute to the wrong argument of __init__:

class BadRGB:
    def __init__(self, green, red, blue):  # Bad: Order swapped
        self.red = red
        self.green = green
        self.bloe = blue                   # Bad: Typo

The dataclasses module includes a class decorator (see Item 66: “Prefer Class Decorators over Metaclasses for Composable Class Extensions”) that provides better default behaviors for simple classes like this. Here I define a new class similar to the one above, but I wrap it in the @dataclass decorator:

from dataclasses import dataclass

@dataclass
class DataclassRGB:
    red: int
    green: int
    blue: int

To use the dataclass decorator, I list each attribute of the object in the class body with its corresponding type hint (see Item 124: “Consider Static Analysis via typing to Obviate Bugs”). I only have to identify each attribute a single time, so I avoid the risk of typos. If I reorder the attributes, I only need to update the callers instead of making sure the class is consistent within itself.

With these type annotations present, I can also use a static type checking tool to detect errors before the program is executed. For example, here I provide the wrong types when constructing an object and modifying it:

from dataclasses import dataclass

@dataclass
class DataclassRGB:
    red: int
    green: int
    blue: int

obj = DataclassRGB(1, "bad", 3)
obj.red = "also bad"

The type checker is able to report these problems without needing more code in the class definition:

>>>
$ python3 -m mypy --strict example.py
.../example.py:9: error: Argument 2 to "DataclassRGB" has
➥incompatible type "str"; expected "int"  [arg-type]
.../example.py:10: error: Incompatible types in assignment
➥(expression has type "str", variable has type "int")
➥[assignment]
Found 2 errors in 1 file (checked 1 source file)

It’s possible to enable the same type checking with a standard class by putting type information to the __init__ method, but this location is cramped and visually noisy in comparison to the class body:

class RGB:
    def __init__(
        self, red: int, green: int, blue: int
    ) -> None:  # Changed
        self.red = red
        self.green = green
        self.blue = blue

If you don’t want type annotations in your project (see Item 3: “Never Expect Python to Detect Errors at Compile Time”), or if you need your class’s attributes to be totally flexible, you can still use the dataclass decorator. Simply provide the Any type from the built-in typing module for the fields:

from typing import Any

@dataclass
class DataclassRGB:
    red: Any
    green: Any
    blue: Any

Requiring Initialization Arguments to Be Passed by Keyword

Arguments are supplied to __init__ just as they would be for any other function call, meaning both positional arguments and keyword arguments are allowed (see Item 35: “Provide Optional Behavior with Keyword Arguments” for background). For example, here I initialize the RGB class three different ways:

color1 = RGB(red=1, green=2, blue=3)
color2 = RGB(1, 2, 3)
color3 = RGB(1, 2, blue=3)

However, this flexibility is error prone because I can too easily mix up the different color component values. To address this, I can use the * symbol in the argument list to require that arguments to __init__ are always supplied by keyword (see Item 37: “Enforce Clarity with Keyword-Only and Positional-Only Arguments” for details):

class RGB:
    def __init__(self, *, red, green, blue):  # Changed
        self.red = red
        self.green = green
        self.blue = blue

Now using keyword arguments is the only way to create these objects:

color4 = RGB(red=1, green=2, blue=3)

Initializing the class with positional arguments will fail:

RGB(1, 2, 3)

>>>
Traceback ...
TypeError: RGB.__init__() takes 1 positional argument but 4
➥were given

By default, classes wrapped by the dataclass decorator will also accept a mix of positional and keyword arguments. I can achieve the same keyword-only behavior as above by simply passing the kw_only flag to the decorator:

@dataclass(kw_only=True)
class DataclassRGB:
    red: int
    green: int
    blue: int

Now this class must be initialized with keyword arguments:

color5 = DataclassRGB(red=1, green=2, blue=3)

Passing any positional arguments will fail, just like the standard class implementation:

DataclassRGB(1, 2, 3)

>>>
Traceback ...
TypeError: DataclassRGB.__init__() takes 1 positional argument
➥but 4 were given

Providing Default Attribute Values

For classes that are focused on storing data, it can be useful to have default values for some attributes so they don’t need to be specified every time an object is constructed.

For example, say that I want to extend the RGB class to allow for an alpha field to represent the color’s level of transparency on a scale of 0 to 1. By default, I want the color to be opaque with an alpha of 1. Here I achieve this by providing a default value for the corresponding argument in the __init__ constructor:

class RGBA:
    def __init__(self, *, red, green, blue, alpha=1.0):
        self.red = red
        self.green = green
        self.blue = blue
        self.alpha = alpha

Now I can omit the alpha argument, and the default value will be assigned anyway:

color1 = RGBA(red=1, green=2, blue=3)
print(
    color1.red,
    color1.green,
    color1.blue,
    color1.alpha,
)

>>>
1 2 3 1.0

To enable the same behavior with the dataclass decorator, I simply assign a default value to the attribute in the class body:

@dataclass(kw_only=True)
class DataclassRGBA:
    red: int
    green: int
    blue: int
    alpha: int = 1.0

Creating an object with this new constructor will assign the correct default value for the alpha attribute:

color2 = DataclassRGBA(red=1, green=2, blue=3)
print(color2)

>>>
DataclassRGBA(red=1, green=2, blue=3, alpha=1.0)

However, neither of these approaches will work correctly when the default value is mutable (see Item 26: “Prefer get over in and KeyError to Handle Missing Dictionary Keys” for a similar problem and Item 30: “Know That Function Arguments Can Be Mutated” for background). For example, if the default value provided is a list, a single object reference will be shared between all instances of a class, causing weird behaviors like this:

class BadContainer:
    def __init__(self, *, value=[]):
        self.value = value

obj1 = BadContainer()
obj2 = BadContainer()
obj1.value.append(1)
print(obj2.value)  # Should be empty, but isn't

>>>
[1]

For standard classes, you can solve this problem by providing a default value of None in the __init__ method and then dynamically allocating the real default value (see Item 36: “Use None and Docstrings to Specify Dynamic Default Arguments” for background):

class MyContainer:
    def __init__(self, *, value=None):
        if value is None:
            value = []  # Create when not supplied
        self.value = value

Now each object will have a different list allocated by default:

obj1 = MyContainer()
obj2 = MyContainer()
obj1.value.append(1)
assert obj1.value == [1]
assert obj2.value == []

To achieve the same behavior with the dataclass decorator, I can use the field helper function from the dataclasses module. It accepts a default_factory argument that is the function to call in order to allocate a default value for that attribute:

from dataclasses import field

@dataclass
class DataclassContainer:
    value: list = field(default_factory=list)

This similarly fixes the implementation to ensure that each new object has its own separate list instance:

obj1 = DataclassContainer()
obj2 = DataclassContainer()
obj1.value.append(1)
assert obj1.value == [1]
assert obj2.value == []

The dataclasses module provides many other helpful features like this, which are covered in detail in the official documentation (https://docs.python.org/3/library/dataclasses.html).

Representing Objects as Strings

When you define a new class in Python using the standard approach, even a basic feature like print doesn’t seem to work correctly. Instead of seeing a nice list of attributes and their values, you get the memory address of the object, which is practically useless:

color1 = RGB(red=1, green=2, blue=3)
print(color1)

>>>
<__main__.RGB object at 0x1029a0b90>

To fix this, I can implement the __repr__ special method (see Item 12: “Understand the Difference Between repr and str when Printing Objects” for background). Here I add such a method to a standard Python class using one big format string (see Item 11: “Prefer Interpolated F-Strings over C-Style Format Strings and str.format” for background):

class RGB:
    ...

    def __repr__(self):
        return (
            f"{type(self).__module__}"
            f".{type(self).__name__}("
            f"red={self.red!r}, "
            f"green={self.green!r}, "
            f"blue={self.blue!r})"
        )

Now these objects will look good when they’re printed:

color1 = RGB(red=1, green=2, blue=3)
print(color1)

>>>
__main__.RGB(red=1, green=2, blue=3)

However, there are two problems with implementing __repr__ yourself. First, it’s repetitive and verbose boilerplate that needs to be added to every class. Second, it’s error prone because I could easily forget to add new attributes, misspell attribute names, put attribute names in the wrong order for positional construction, or incorrectly insert separating commas and whitespace.

The dataclass decorator provides an implementation of the __repr__ special method by default, increasing productivity and avoiding these potential bugs:

color2 = DataclassRGB(red=1, green=2, blue=3)
print(color2)

>>>
DataclassRGB(red=1, green=2, blue=3)

Converting Objects into Tuples

To help with equality testing, indexing, and sorting, it can be useful to convert an object into a tuple. To do this with a standard Python class, here I define a new method that packs an object’s attributes together:

class RGB:
    ...

    def _astuple(self):
        return (self.red, self.green, self.blue)

Using this method is simple:

color1 = RGB(1, 2, 3)
print(color1._astuple())

>>>
(1, 2, 3)

The _astuple method also allows me to copy an object by using the return value as positional arguments for the constructor using the * operator (see Item 34: “Reduce Visual Noise with Variable Positional Arguments” and Item 16: “Prefer Catch-All Unpacking over Slicing” for background):

color2 = RGB(*color1._astuple())
print(color2.red, color2.green, color2.blue)

>>>
1 2 3

However, like the __repr__ implementation for standard Python classes, the _astuple method requires error-prone boilerplate with all of the same pitfalls. In contrast, I can use the astuple function from the dataclasses module to achieve the same behavior for any dataclass-decorated class:

from dataclasses import astuple

color3 = DataclassRGB(1, 2, 3)
print(astuple(color3))

>>>
(1, 2, 3)

Converting Objects into Dictionaries

To help with data serialization, it can be useful to convert an object into a dictionary containing its attributes. I can achieve this with a standard Python class by defining a new method:

class RGB:
    ...

    def _asdict(self):
        return dict(
            red=self.red,
            green=self.green,
            blue=self.blue,
        )

The return value of this method can be passed to the dumps function from the json built-in module to produce a serialized representation:

import json

color1 = RGB(red=1, green=2, blue=3)
data = json.dumps(color1._asdict())
print(data)

>>>
{"red": 1, "green": 2, "blue": 3}

The _asdict method also lets you create a copy of an object using a dictionary of keyword arguments with the ** operator, similar to how _astuple works for positional arguments:

color2 = RGB(**color1._asdict())
print(color2)

>>>
__main__.RGB(red=1, green=2, blue=3)

To get the same behavior using the dataclasses module, I can use the asdict function, which avoids all of the boilerplate:

from dataclasses import asdict

color3 = DataclassRGB(red=1, green=2, blue=3)
print(asdict(color3))

>>>
{'red': 1, 'green': 2, 'blue': 3}

The asdict function from dataclasses is also superior to my hand-built _asdict method; it will automatically transform data nested in attributes, including basic container types and other dataclass objects. To achieve the same effect using a standard class requires much more work (see Item 54: “Consider Composing Functionality with Mix-in Classes” for details).

Checking Whether Objects Are Equivalent

With a standard Python class, two objects that look like they’re equivalent actually aren’t:

color1 = RGB(1, 2, 3)
color2 = RGB(1, 2, 3)
print(color1 == color2)

>>>
False

The reason for this behavior is that the default implementation of the __eq__ special method uses the is operator, which tests whether the two operands have the same identity (i.e., whether they occupy the same location in memory):

assert color1 == color1
assert color1 is color1
assert color1 != color2
assert color1 is not color2

For simple classes, it’s a lot more useful when two objects of the same type with the same attribute values are considered equivalent. Here I implement this behavior for a standard Python class by using the _astuple method:

class RGB:
    ...

    def __eq__(self, other):
        return (
            type(self) == type(other)
            and self._astuple() == other._astuple()
        )

Now the == and != operators work as expected:

color1 = RGB(1, 2, 3)
color2 = RGB(1, 2, 3)
color3 = RGB(5, 6, 7)
assert color1 == color1
assert color1 == color2
assert color1 is not color2
assert color1 != color3

When a class is created using the dataclass decorator, you get this functionality automatically and don’t need to implement __eq__ yourself:

color4 = DataclassRGB(1, 2, 3)
color5 = DataclassRGB(1, 2, 3)
color6 = DataclassRGB(5, 6, 7)
assert color4 == color4
assert color4 == color5
assert color4 is not color5
assert color4 != color6

Enabling Objects to Be Compared

Beyond equivalence, it can be useful to compare two objects to see which one is bigger or smaller. For example, here I define a standard class to represent the size of a planet in the universe and its distance from Earth:

class Planet:
    def __init__(self, distance, size):
        self.distance = distance
        self.size = size

    def __repr__(self):
        return (
            f"{type(self).__module__}"
            f"{type(self).__name__}("
            f"distance={self.distance}, "
            f"size={self.size})"
        )

If I try to sort these planets, an exception will be raised because Python doesn’t know how to order the objects:

far = Planet(10, 5)
near = Planet(1, 2)
data = [far, near]
data.sort()

>>>
Traceback ...
TypeError: '<' not supported between instances of 'Planet' and
➥'Planet'

There are work-arounds for this limitation that are sufficient in many cases (see Item 100: “Sort by Complex Criteria Using the key Parameter”). However, there are other situations in which you need an object to have its own natural ordering (see Item 104: “Know How to Use heapq for Priority Queues” for an example).

To support this behavior in a standard class, I use the _astuple helper method, as described above, to fill in all of the special methods that Python needs to compare objects:

class Planet:
    ...

    def _astuple(self):
        return (self.distance, self.size)

    def __eq__(self, other):
        return (
            type(self) == type(other)
            and self._astuple() == other._astuple()
        )

    def __lt__(self, other):
        if type(self) != type(other):
            return NotImplemented
        return self._astuple() < other._astuple()

    def __le__(self, other):
        if type(self) != type(other):
            return NotImplemented
        return self._astuple() <= other._astuple()

    def __gt__(self, other):
        if type(self) != type(other):
            return NotImplemented
        return self._astuple() > other._astuple()

    def __ge__(self, other):
        if type(self) != type(other):
            return NotImplemented
        return self._astuple() >= other._astuple()

Python will allow comparisons between different types, so I need to return the NotImplemented singleton—which is not the same as the NotImplementedError exception class—to indicate when objects are not comparable.

Now these objects have a natural ordering given by the value returned from _astuple, and they can be sorted (first by distance from Earth, then by size) without any additional boilerplate:

far = Planet(10, 2)
near = Planet(1, 5)
data = [far, near]
data.sort()
print(data)

>>>
[__main__Planet(distance=1, size=5), __main__
➥Planet(distance=10, size=2)]

One alternative that reduces the number of special method implementations needed is the total_ordering class decorator from the functools built-in module. But achieving the same behavior with dataclass is even easier. Simply pass the order flag:

@dataclass(order=True)
class DataclassPlanet:
    distance: float
    size: float

These objects will be comparable using their attributes in the order they’re declared in the class body:

far2 = DataclassPlanet(10, 2)
near2 = DataclassPlanet(1, 5)
assert far2 > near2
assert near2 < far2

Things to remember

  • Images The dataclass decorator from the dataclasses built-in module can be used to define versatile, lightweight classes without the boilerplate typically required by standard Python syntax.

  • Images Using the dataclasses module can help you avoid pitfalls caused by the verbose and error-prone nature of Python’s standard object-oriented features.

  • Images The dataclasses module provides additional helper functions for conversions (e.g., asdict, astuple) and advanced attribute behavior (e.g., field).

  • Images It’s important to know how to implement object-oriented idioms yourself so you can migrate away from the dataclasses module once you need more customization than it allows.

Item 52: Use @classmethod Polymorphism to Construct Objects Generically

In Python, not only do objects support polymorphism, but classes do as well. What does this mean, and what is it good for?

Polymorphism enables multiple classes in a hierarchy to implement their own unique versions of a method. This means that many classes can fulfill the same interface or abstract base class while providing different functionality (see Item 49: “Prefer Object-Oriented Polymorphism over Functions with isinstance Checks” and Item 57: “Inherit from collections.abc Classes for Custom Container Types” for background).

For example, say that I’m writing a map-reduce implementation, and I want a common class to represent the input data. Here I define such a class with a read method that must be defined by subclasses:

class InputData:
    def read(self):
        raise NotImplementedError

I also have a concrete subclass of InputData that reads data from a file on disk:

class PathInputData(InputData):
    def __init__(self, path):
        super().__init__()
        self.path = path

    def read(self):
        with open(self.path) as f:
            return f.read()

I could have any number of InputData subclasses, like PathInputData, and each of them could implement the standard interface for read to return the data to process. Other InputData subclasses could read from the network, decompress data transparently, and so on.

I’d want a similar abstract interface for the map-reduce worker that consumes the input data in a standard way:

class Worker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

Here I define the concrete subclass Worker to implement the specific map-reduce function I want to apply—a simple newline counter:

class LineCountWorker(Worker):
    def map(self):
        data = self.input_data.read()
        self.result = data.count("\n")

    def reduce(self, other):
        self.result += other.result

It might look like this implementation is going great, but I’ve reached the biggest hurdle in all of this: What connects all of these pieces? I have a nice set of classes with reasonable interfaces and abstractions, but that’s only useful once the objects are constructed. What component is responsible for building the objects and orchestrating the map-reduce?

The simplest approach is to manually build and connect the objects with some helper functions. Here I list the contents of a directory and construct a PathInputData instance for each file it contains:

import os

def generate_inputs(data_dir):
    for name in os.listdir(data_dir):
        yield PathInputData(os.path.join(data_dir, name))

Next, I create the LineCountWorker instances by using the InputData instances returned by generate_inputs:

def create_workers(input_list):
    workers = []
    for input_data in input_list:
        workers.append(LineCountWorker(input_data))
    return workers

I execute these Worker instances by fanning out the map step to multiple threads (see Item 68: “Use Threads for Blocking I/O; Avoid for Parallelism” for background). Then, I call reduce repeatedly to combine the results into one final value:

from threading import Thread

def execute(workers):
    threads = [Thread(target=w.map) for w in workers]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

    first, *rest = workers
    for worker in rest:
        first.reduce(worker)
    return first.result

Finally, I connect all of the pieces together in a function that runs each step:

def mapreduce(data_dir):
    inputs = generate_inputs(data_dir)
    workers = create_workers(inputs)
    return execute(workers)

Calling this function with a set of test input files works great:

import os
import random

def write_test_files(tmpdir):
    os.makedirs(tmpdir)
    for i in range(100):
        with open(os.path.join(tmpdir, str(i)), "w") as f:
            f.write("\n" * random.randint(0, 100))

tmpdir = "test_inputs"
write_test_files(tmpdir)

result = mapreduce(tmpdir)
print(f"There are {result} lines")

>>>
There are 4360 lines

What’s the problem? The huge issue is that the mapreduce function is not generic at all. If I wanted to write another InputData or Worker subclass, I would also have to rewrite the generate_inputs, create_workers, and mapreduce functions to match.

This problem boils down to needing a generic way to construct objects. In other languages, you’d solve this problem with constructor polymorphism, requiring that each InputData subclass provides a special constructor that can be used generically by the helper methods that orchestrate the map-reduce (similar to the factory pattern). The trouble is that Python only allows for the single constructor method (__init__). It’s unreasonable to require every InputData subclass to have a compatible constructor.

The best way to solve this problem is with class method polymorphism. This is exactly like the instance method polymorphism I used for InputData.read, except that it’s for whole classes instead of their constructed objects.

Let me apply this idea to the map-reduce classes. Here I extend the InputData class with a generic @classmethod that’s responsible for creating new InputData instances using a common interface:

class GenericInputData:
    def read(self):
        raise NotImplementedError

    @classmethod
    def generate_inputs(cls, config):
        raise NotImplementedError

I have generate_inputs take a dictionary with a set of configuration parameters that the GenericInputData concrete subclass needs to interpret. Here I use the config argument to find the directory to list for input files:

class PathInputData(GenericInputData):
    ...

    @classmethod
    def generate_inputs(cls, config):
        data_dir = config["data_dir"]
        for name in os.listdir(data_dir):
            yield cls(os.path.join(data_dir, name))

Similarly, I can make the create_workers helper part of the GenericWorker class. Here I use the input_class parameter, which must be a subclass of GenericInputData, to generate the necessary inputs:

class GenericWorker:
    def __init__(self, input_data):
        self.input_data = input_data
        self.result = None

    def map(self):
        raise NotImplementedError

    def reduce(self, other):
        raise NotImplementedError

    @classmethod
    def create_workers(cls, input_class, config):
        workers = []
        for input_data in input_class.generate_inputs(config):
            workers.append(cls(input_data))
        return workers

Note that the call to input_class.generate_inputs above is the class polymorphism that I’m trying to show. You can also see how create_workers calling cls provides an alternative way to construct GenericWorker objects besides using the __init__ method directly.

The effect on my concrete GenericWorker subclass is nothing more than changing its parent class:

class LineCountWorker(GenericWorker):  # Changed
    ...

Finally, I can rewrite the mapreduce function to be completely generic by calling the create_workers class method:

def mapreduce(worker_class, input_class, config):
    workers = worker_class.create_workers(input_class, config)
    return execute(workers)

Running the new worker on a set of test files produces the same result as the old implementation. The difference is that the mapreduce function requires more parameters so that it can operate generically:

config = {"data_dir": tmpdir}
result = mapreduce(LineCountWorker, PathInputData, config)
print(f"There are {result} lines")

>>>
There are 4360 lines

Now I can write other GenericInputData and GenericWorker subclasses as I wish, without having to rewrite any of the glue code.

Things to Remember

  • Images Python only supports a single constructor per class: the __init__ method.

  • Images Use @classmethod to define alternative constructors for your classes.

  • Images Use class method polymorphism to provide generic ways to build and connect many concrete subclasses.

Item 53: Initialize Parent Classes with super

The old, simple way to initialize a parent class from a child class is to directly call the parent class’s __init__ method with the child instance:

class MyBaseClass:
    def __init__(self, value):
        self.value = value

class MyChildClass(MyBaseClass):
    def __init__(self):
        MyBaseClass.__init__(self, 5)

This approach works fine for basic class hierarchies but breaks in many cases.

If a class is affected by multiple inheritance (something to avoid in general; see Item 54: “Consider Composing Functionality with Mix-in Classes”), calling the superclasses’ __init__ methods directly can lead to unpredictable behavior.

One problem is that the __init__ call order isn’t specified across all subclasses. For example, here I define two parent classes that operate on the instance’s value field:

class TimesTwo:
    def __init__(self):
        self.value *= 2

class PlusFive:
    def __init__(self):
        self.value += 5

This class defines its parent classes in one ordering:

class OneWay(MyBaseClass, TimesTwo, PlusFive):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        TimesTwo.__init__(self)
        PlusFive.__init__(self)

And constructing it produces a result that matches the parent class ordering:

foo = OneWay(5)
print("First ordering value is (5 * 2) + 5 =", foo.value)

>>>
First ordering value is (5 * 2) + 5 = 15

Here’s another class that defines the same parent classes but in a different ordering (PlusFive followed by TimesTwo instead of the other way around):

class AnotherWay(MyBaseClass, PlusFive, TimesTwo):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        TimesTwo.__init__(self)
        PlusFive.__init__(self)

However, I left the calls to the parent class constructors—PlusFive.__init__ and TimesTwo.__init__—in the same order as before, which means this class’s behavior doesn’t match the order of the parent classes in its definition. The conflict here between the ordering of the inheritance base classes and the __init__ calls is hard to spot, which makes this especially difficult for new readers of the code to understand:

bar = AnotherWay(5)
print("Second ordering should be (5 + 5) * 2, but is", bar.value)

>>>
Second ordering should be (5 + 5) * 2, but is 15

Another problem occurs with diamond inheritance. Diamond inheritance happens when a subclass inherits from two separate classes that have the same superclass somewhere in the hierarchy. Diamond inheritance causes the common superclass’s __init__ method to run multiple times, leading to unexpected behavior. For example, here I define two child classes that inherit from MyBaseClass:

class TimesSeven(MyBaseClass):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        self.value *= 7

class PlusNine(MyBaseClass):
    def __init__(self, value):
        MyBaseClass.__init__(self, value)
        self.value += 9

Then I define a child class that inherits from both of these classes, making MyBaseClass the top of the diamond:

class ThisWay(TimesSeven, PlusNine):
    def __init__(self, value):
        TimesSeven.__init__(self, value)
        PlusNine.__init__(self, value)

foo = ThisWay(5)
print("Should be (5 * 7) + 9 = 44 but is", foo.value)

>>>
Should be (5 * 7) + 9 = 44 but is 14

The call to the second parent class’s constructor, PlusNine.__init__, causes self.value to be reset back to 5 when MyBaseClass.__init__ gets called a second time. That results in the calculation of self.value to be 5 + 9 = 14, completely ignoring the effect of the TimesSeven.__init__ constructor. This behavior is surprising and can be very difficult to debug in more complex cases.

To solve these problems, Python has the super built-in function and standard method resolution order (MRO). super ensures that common superclasses in diamond hierarchies are run only once (see Item 62: “Validate Subclasses with __init_subclass__” for another example). The MRO defines the ordering in which superclasses are initialized, following an algorithm called C3 linearization.

Here I create a diamond-shaped class hierarchy again, but this time I use super to initialize the parent class:

class TimesSevenCorrect(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value *= 7

class PlusNineCorrect(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value += 9

Now the top part of the diamond, MyBaseClass.__init__, is run only a single time. The other parent classes are run in the order specified in the class statement:

class GoodWay(TimesSevenCorrect, PlusNineCorrect):
    def __init__(self, value):
        super().__init__(value)

foo = GoodWay(5)
print("Should be 7 * (5 + 9) = 98 and is", foo.value)

>>>
Should be 7 * (5 + 9) = 98 and is 98

This order might seem backward. Shouldn’t TimesSevenCorrect.__init__ have run first? Shouldn’t the result be (5 * 7) + 9 = 44? The answer is no. This ordering matches what the MRO defines for this class. The MRO ordering is available via a class method called mro or cached in a class attribute called __mro__:

mro_str = "\n".join(repr(cls) for cls in GoodWay.__mro__)
print(mro_str)

>>>
<class '__main__.GoodWay'>
<class '__main__.TimesSevenCorrect'>
<class '__main__.PlusNineCorrect'>
<class '__main__.MyBaseClass'>
<class 'object'>

When I call GoodWay(5), it in turn calls TimesSevenCorrect.__init__, which calls PlusNineCorrect.__init__, which calls MyBaseClass.__init__. Once this reaches the top of the diamond, all of the initialization methods actually do their work in the opposite order from how their __init__ functions were called. MyBaseClass.__init__ assigns value to 5. PlusNineCorrect.__init__ adds 9 to make value equal 14. TimesSevenCorrect.__init__ multiplies it by 7 to make value equal 98.

Besides making multiple inheritance robust, the call to super().__init__ is also much more maintainable than calling MyBaseClass.__init__ directly from within the subclasses. I could later rename MyBaseClass to something else or have TimesSevenCorrect and PlusNineCorrect inherit from another superclass without having to update their __init__ methods to match.

The super function can also be called with two parameters: first the type of the class whose MRO parent view you’re trying to access and then the instance on which to access that view. Using these optional parameters within the constructor looks like this:

class ExplicitTrisect(MyBaseClass):
    def __init__(self, value):
        super(ExplicitTrisect, self).__init__(value)
        self.value /= 3

However, these parameters are not required for object instance initialization. Python’s compiler automatically provides the correct parameters (__class__ and self) for you when super is called with zero arguments within a class definition. This means all three of these usages are equivalent:

class AutomaticTrisect(MyBaseClass):
    def __init__(self, value):
        super(__class__, self).__init__(value)
        self.value /= 3

class ImplicitTrisect(MyBaseClass):
    def __init__(self, value):
        super().__init__(value)
        self.value /= 3

assert ExplicitTrisect(9).value == 3
assert AutomaticTrisect(9).value == 3
assert ImplicitTrisect(9).value == 3

The only time you should provide parameters to super is in situations where you need to access the specific functionality of a superclass’s implementation from a child class (e.g., in order to wrap or reuse functionality).

Things to Remember

  • Images Python’s standard MRO solves the problems of superclass initialization order and diamond inheritance.

  • Images Use the super built-in function with zero arguments to initialize parent classes and call parent methods.

Item 54: Consider Composing Functionality with Mix-in Classes

Python is an object-oriented language with built-in facilities for making multiple inheritance tractable (see Item 53: “Initialize Parent Classes with super”). However, it’s better to avoid multiple inheritance altogether.

If you find yourself desiring the convenience and encapsulation that come with multiple inheritance but wanting to avoid the potential headaches, consider writing a mix-in instead. A mix-in is a class that defines only a small set of additional methods for its child classes to provide. Mix-in classes don’t define their own instance attributes or require their __init__ constructor to be called.

Writing mix-ins is easy because Python makes it trivial to inspect the current state of any object, regardless of its type. Dynamic inspection means you can write generic functionality just once, in a mix-in, and it can then be applied to many other classes. Mix-ins can be composed and layered to minimize repetitive code and maximize reuse.

For example, say that I want to have the ability to convert a Python object from its in-memory representation to a dictionary that’s ready for serialization. Why not write this functionality generically so I can use it with all of my classes?

Here I define an example mix-in that accomplishes this with a new public method that’s added to any class that inherits from it. The implementation details are straightforward and rely on dynamic attribute access using hasattr, dynamic type inspection with isinstance, and accessing the instance dictionary __dict__:

class ToDictMixin:
    def to_dict(self):
        return self._traverse_dict(self.__dict__)

    def _traverse_dict(self, instance_dict):
        output = {}
        for key, value in instance_dict.items():
            output[key] = self._traverse(key, value)
        return output

    def _traverse(self, key, value):
        if isinstance(value, ToDictMixin):
            return value.to_dict()
        elif isinstance(value, dict):
            return self._traverse_dict(value)
        elif isinstance(value, list):
            return [self._traverse(key, i) for i in value]
        elif hasattr(value, "__dict__"):
            return self._traverse_dict(value.__dict__)
        else:
            return value

Here I define an example class that uses the mix-in to make a dictionary representation of a binary tree:

class BinaryTree(ToDictMixin):
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

Translating a large number of related Python objects into a dictionary becomes easy:

tree = BinaryTree(
    10,
    left=BinaryTree(7, right=BinaryTree(9)),
    right=BinaryTree(13, left=BinaryTree(11)),
)
print(tree.to_dict())

>>>
{'value': 10,
 'left': {'value': 7,
          'left': None,
          'right': {'value': 9, 'left': None, 'right': None}},
 'right': {'value': 13,
           'left': {'value': 11, 'left': None, 'right': None},
           'right': None}}

The best part about mix-ins is that you can make their generic functionality pluggable so behaviors can be overridden when required. For example, here I define a subclass of BinaryTree that holds a reference to its parent. This circular reference would cause the default implementation of ToDictMixin.to_dict to loop forever. The solution is to override the BinaryTreeWithParent._traverse method to only process values that matter, preventing cycles encountered by the mix-in. Here the _traverse override inserts the parent’s numerical value and otherwise defers to the mix-in’s default implementation by using the super built-in function:

class BinaryTreeWithParent(BinaryTree):
    def __init__(
        self,
        value,
        left=None,
        right=None,
        parent=None,
    ):
        super().__init__(value, left=left, right=right)
        self.parent = parent

    def _traverse(self, key, value):
        if (
            isinstance(value, BinaryTreeWithParent)
            and key == "parent"
        ):
            return value.value  # Prevent cycles
        else:
            return super()._traverse(key, value)

Calling BinaryTreeWithParent.to_dict works without issue because the circular referencing properties aren’t followed:

root = BinaryTreeWithParent(10)
root.left = BinaryTreeWithParent(7, parent=root)
root.left.right = BinaryTreeWithParent(9, parent=root.left)
print(root.to_dict())

>>>
{'value': 10,
 'left': {'value': 7,
          'left': None,
          'right': {'value': 9,
                    'left': None,
                    'right': None,
                    'parent': 7},
          'parent': 10},
 'right': None,
 'parent': None}

By defining BinaryTreeWithParent._traverse, I’ve also enabled any class that has an attribute of type BinaryTreeWithParent to automatically work with ToDictMixin:

class NamedSubTree(ToDictMixin):
    def __init__(self, name, tree_with_parent):
        self.name = name
        self.tree_with_parent = tree_with_parent

my_tree = NamedSubTree("foobar", root.left.right)
print(my_tree.to_dict())  # No infinite loop

>>>
{'name': 'foobar',
 'tree_with_parent': {'value': 9,
                      'left': None,
                      'right': None,
                      'parent': 7}}

Mix-ins can also be composed together. For example, say I want a mix-in that provides generic JSON serialization for any class. I can do this by assuming that a class provides a to_dict method (which may or may not be provided by the ToDictMixin class):

import json

class JsonMixin:
    @classmethod
    def from_json(cls, data):
        kwargs = json.loads(data)
        return cls(**kwargs)

    def to_json(self):
        return json.dumps(self.to_dict())

Note how the JsonMixin class defines both instance methods and class methods. Mix-ins let you add either kind of behavior to subclasses (see Item 52: “Use @classmethod Polymorphism to Construct Objects Generically” for similar functionality). In this example, the only requirements of a JsonMixin subclass are providing a to_dict method and taking keyword arguments for the __init__ method (see Item 35: “Provide Optional Behavior with Keyword Arguments” for background).

This mix-in makes it simple to create hierarchies of utility classes that can be serialized to and from JSON with little boilerplate. For example, here I have a hierarchy of classes representing parts of a datacenter topology:

class DatacenterRack(ToDictMixin, JsonMixin):
    def __init__(self, switch=None, machines=None):
        self.switch = Switch(**switch)
        self.machines = [
            Machine(**kwargs) for kwargs in machines]

class Switch(ToDictMixin, JsonMixin):
    def __init__(self, ports=None, speed=None):
        self.ports = ports
        self.speed = speed

class Machine(ToDictMixin, JsonMixin):
    def __init__(self, cores=None, ram=None, disk=None):
        self.cores = cores
        self.ram = ram
        self.disk = disk

Serializing these classes to and from JSON is simple. Here I verify that the data is able to be sent round-trip through serializing and deserializing:

serialized = """{
    "switch": {"ports": 5, "speed": 1e9},
    "machines": [
        {"cores": 8, "ram": 32e9, "disk": 5e12},
        {"cores": 4, "ram": 16e9, "disk": 1e12},
        {"cores": 2, "ram": 4e9, "disk": 500e9}
    ]
}"""

deserialized = DatacenterRack.from_json(serialized)
roundtrip = deserialized.to_json()
assert json.loads(serialized) == json.loads(roundtrip)

When you use mix-ins like this, it’s fine if the class you apply JsonMixin to already inherits from JsonMixin higher up in the class hierarchy. The resulting class will behave the same way, thanks to the behavior of super.

Things to Remember

  • Images Avoid using multiple inheritance with instance attributes and __init__ if mix-in classes can achieve the same outcome.

  • Images Use pluggable behaviors at the instance level to provide per-class customization when mix-in classes may require it.

  • Images Mix-ins can include instance methods or class methods, depending on your needs.

  • Images Compose mix-ins to create complex functionality from simple behaviors.

Item 55: Prefer Public Attributes over Private Ones

In Python, there are only two types of visibility for a class’s attributes: public and private:

class MyObject:
    def __init__(self):
        self.public_field = 5
        self.__private_field = 10

    def get_private_field(self):
        return self.__private_field

Public attributes can be accessed by anyone using the dot operator on an object:

foo = MyObject()
assert foo.public_field == 5

Private fields are specified by prefixing an attribute’s name with a double underscore. They can be accessed directly by methods of the containing class:

assert foo.get_private_field() == 10

However, directly accessing private fields from outside the class raises an exception:

foo.__private_field

>>>
Traceback ...
AttributeError: 'MyObject' object has no attribute
➥'__private_field'

Class methods also have access to private attributes because they are declared within the surrounding class block:

class MyOtherObject:
    def __init__(self):
        self.__private_field = 71

    @classmethod
    def get_private_field_of_instance(cls, instance):
        return instance.__private_field

bar = MyOtherObject()
assert MyOtherObject.get_private_field_of_instance(bar) == 71

As you’d expect with private fields, a subclass can’t access its parent class’s private fields:

class MyParentObject:
    def __init__(self):
        self.__private_field = 71

class MyChildObject(MyParentObject):
    def get_private_field(self):
        return self.__private_field

baz = MyChildObject()
baz.get_private_field()

>>>
Traceback ...
AttributeError: 'MyChildObject' object has no attribute
➥'_MyChildObject__private_field'

The private attribute behavior is implemented with a simple transformation of the attribute name. When the Python compiler sees private attribute access in methods like MyChildObject.get_private_field, it translates the __private_field attribute access to use the name _MyChildObject__private_field instead. In the example above, __private_field is only defined in MyParentObject.__init__, which means the private attribute’s real name is _MyParentObject__private_field. Accessing the parent’s private attribute from the child class fails simply because the transformed attribute name doesn’t exist (_MyChildObject__private_field instead of _MyParentObject__private_field).

Knowing this scheme, you can easily access the private attributes of any class—from a subclass or externally—without asking for permission:

assert baz._MyParentObject__private_field == 71

If you look in the object’s attribute dictionary, you can see that private attributes are actually stored with the names as they appear after the transformation:

print(baz.__dict__)

>>>
{'_MyParentObject__private_field': 71}

Why doesn’t the syntax for private attributes actually enforce strict visibility? The simplest answer is one often-quoted motto of Python: “We are all consenting adults here.” What this means is that we don’t need the language to prevent us from doing what we want to do. It’s our individual choice to extend functionality as we wish and to take responsibility for the consequences of such a risk. Python programmers believe that the benefits of being open—permitting unplanned extension of classes by default—outweigh the downsides.

Beyond that, having the ability to hook language features like attribute access (see Item 61: “Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes”) enables you to mess around with the internals of objects whenever you wish. If you can do that, what is the value of Python trying to prevent private attribute access otherwise?

To minimize damage caused by accessing internals unknowingly, Python programmers follow a naming convention defined in the style guide (see Item 2: “Follow the PEP 8 Style Guide”). Any field whose name is prefixed by a single underscore (like _protected_field) is protected by convention, meaning external users of the class should proceed with caution.

However, programmers who are new to Python might consider using private fields to indicate an internal API that shouldn’t be accessed by subclasses or externally:

class MyStringClass:
    def __init__(self, value):
        self.__value = value

    def get_value(self):
        return str(self.__value)

foo = MyStringClass(5)
assert foo.get_value() == "5"

This is the wrong approach. Inevitably someone—maybe even you—will want to subclass your class to add new behavior or to work around deficiencies in existing methods (e.g., the way that MyStringClass.get_value always returns a string). By choosing private attributes, you’re only making subclass overrides and extensions cumbersome and brittle. Your potential subclassers will still access the private fields when they absolutely need to do so:

class MyIntegerSubclass(MyStringClass):
    def get_value(self):
        return int(self._MyStringClass__value)

foo = MyIntegerSubclass("5")
assert foo.get_value() == 5

But if the class hierarchy above you changes, these classes will break because the private attribute references are no longer valid. Here the MyIntegerSubclass class’s immediate parent, MyStringClass, has had another parent class, called MyBaseClass, added:

class MyBaseClass:
    def __init__(self, value):
        self.__value = value

    def get_value(self):
        return self.__value

class MyStringClass(MyBaseClass):
    def get_value(self):
        return str(super().get_value())         # Updated

class MyIntegerSubclass(MyStringClass):
    def get_value(self):
        return int(self._MyStringClass__value)  # Not updated

The __value attribute is now assigned in the MyBaseClass parent class, not the MyStringClass parent. This causes the private variable reference self._MyStringClass__value to break in MyIntegerSubclass:

foo = MyIntegerSubclass(5)
foo.get_value()

>>>
Traceback ...
AttributeError: 'MyIntegerSubclass' object has no attribute
➥'_MyStringClass__value'

In general, it’s better to err on the side of allowing subclasses to do more by using protected attributes. Document each protected field and explain which fields are internal APIs available to subclasses and which should be left alone entirely. This is as much advice to other programmers as it is guidance for your future self on how to extend your own code safely:

class MyStringClass:
    def __init__(self, value):
        # This stores the user-supplied value for the object.
        # It should be coercible to a string. Once assigned in
        # the object it should be treated as immutable.
        self._value = value

    ...

The only time to seriously consider using private attributes is when you’re worried about naming conflicts between subclasses. This problem occurs when a child class unwittingly defines an attribute that was already defined by its parent class:

class ApiClass:
    def __init__(self):
        self._value = 5

    def get(self):
        return self._value

class Child(ApiClass):
    def __init__(self):
        super().__init__()
        self._value = "hello"  # Conflicts

a = Child()
print(f"{a.get()} and {a._value} should be different")

>>>
hello and hello should be different

This is primarily a concern with classes that are part of a public API; the subclasses are out of your control, so you can’t refactor to fix the problem. Such a conflict is especially possible with attribute names that are very common (like value). To reduce the risk of this issue occurring, you can use a private attribute in the parent class to ensure that there are no attribute names that overlap with child classes:

class ApiClass:
    def __init__(self):
        self.__value = 5       # Double underscore

    def get(self):
        return self.__value    # Double underscore

class Child(ApiClass):
    def __init__(self):
        super().__init__()
        self._value = "hello"  # OK!

a = Child()
print(f"{a.get()} and {a._value} are different")

>>>
5 and hello are different

Things to Remember

  • Images Private attributes aren’t rigorously enforced by the Python compiler.

  • Images Plan from the beginning to allow subclasses to do more with your internal APIs and attributes instead of choosing to lock them out.

  • Images Use documentation of protected fields to guide subclasses instead of trying to force access control with private attributes.

  • Images Only consider using private attributes to avoid naming conflicts with subclasses that are out of your control.

Item 56: Prefer dataclasses for Creating Immutable Objects

Nearly everything in Python can be modified at runtime, which is a fundamental part of the language’s philosophy (see Item 55: “Prefer Public Attributes over Private Ones” and Item 3: “Never Expect Python to Detect Errors at Compile Time”). However, this flexibility often causes problems that are difficult to debug.

One way to reduce the scope of what can go wrong is to not allow changes to objects after they’re created. This requirement forces code to be written in a functional style, where the primary purpose of functions and methods is to consistently map inputs to outputs, kind of like mathematical equations.

A function written in this style is easy to test. You only need to consider the equivalence of arguments and return values instead of worrying about object references and identities. It’s straightforward to reason about and modify a function that doesn’t make mutable state transitions or cause external side effects. And by returning values that can’t be modified later, functions can avoid downstream surprises.

You can benefit from these advantages with your own data types by creating immutable objects. The dataclasses built-in module (see Item 51: “Prefer dataclasses for Defining Lightweight Classes” for background) provides a way to define such classes that is far more productive than using Python’s standard object-oriented features. dataclasses also enables other functionality out of the box, such as the ability to use value objects as keys in dictionaries and members in sets.

Preventing Objects from Being Modified

In Python, all arguments to functions are passed by reference, which, unfortunately, enables a caller’s data to be changed by any callee (see Item 30: “Know That Function Arguments Can Be Mutated” for details). This behavior can cause all kinds of confusing bugs. For example, here I define a standard class that represents the location of a labeled point in two-dimensional space:

class Point:
    def __init__(self, name, x, y):
        self.name = name
        self.x = x
        self.y = y

I can define a well-behaved helper function that calculates the distance between two points and doesn’t modify the inputs:

def distance(left, right):
    return ((left.x - right.x) ** 2 +
            (left.y - right.y) ** 2) ** 0.5

origin1 = Point("source", 0, 0)
point1 = Point("destination", 3, 4)
print(distance(origin1, point1))

>>>
5.0

I can also define a poorly behaved function that overwrites the value of the x attribute for the first parameter:

def bad_distance(left, right):
    left.x = -3
    return distance(left, right)

This modification causes the wrong calculation to be made and permanently changes the state of the origin object so subsequent calculations will also be incorrect:

print(bad_distance(origin1, point1))
print(origin1.x)

>>>
7.211102550927978
-3

I can prevent these types of modifications in a standard class by implementing the __setattr__ and __delattr__ special methods and having them raise an AttributeError exception (see Item 61: “Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes” for background). To set the initial attribute values, I directly assign keys in the __dict__ instance dictionary:

class ImmutablePoint:
    def __init__(self, name, x, y):
        self.__dict__.update(name=name, x=x, y=y)

    def __setattr__(self, key, value):
        raise AttributeError("Immutable object: set not allowed")

    def __delattr__(self, key):
        raise AttributeError("Immutable object: del not allowed")

Now I can do the same distance calculation as before and get the right answer:

origin2 = ImmutablePoint("source", 0, 0)
assert distance(origin2, point1) == 5

But using the poorly behaved function that modifies its inputs will raise an exception:

bad_distance(origin2, point1)

>>>
Traceback ...
AttributeError: Immutable object: set not allowed

To achieve the same behavior with the dataclasses built-in module, all I have to do is pass the frozen flag to the dataclass decorator:

from dataclasses import dataclass

@dataclass(frozen=True)
class DataclassImmutablePoint:
    name: str
    x: float
    y: float

origin3 = DataclassImmutablePoint("origin", 0, 0)
assert distance(origin3, point1) == 5

Trying to modify the attributes of this new dataclass will raise a similar AttributeError error at runtime:

bad_distance(origin3, point1)

>>>
Traceback ...
FrozenInstanceError: cannot assign to field 'x'

As an added benefit, the dataclass approach also enables static analysis tools to detect this problem before program execution (see Item 124: “Consider Static Analysis via typing to Obviate Bugs” for details):

from dataclasses import dataclass

@dataclass(frozen=True)
class DataclassImmutablePoint:
    name: str
    x: float
    y: float

origin = DataclassImmutablePoint("origin", 0, 0)
origin.x = -3

>>>
$ python3 -m mypy --strict example.py
.../example.py:10: error: Property "x" defined in
➥"DataclassImmutablePoint" is read-only  [misc]
Found 1 error in 1 file (checked 1 source file)

You can also use the Final and Never annotations from the typing built-in module to make standard classes similarly fail static analysis, but much more code is required:

from typing import Any, Final, Never

class ImmutablePoint:
    name: Final[str]
    x: Final[int]
    y: Final[int]

    def __init__(self, name: str, x: int, y: int) -> None:
        self.name = name
        self.x = x
        self.y = y

    def __setattr__(self, key: str, value: Any) -> None:
        if key in self.__annotations__ and key not in dir(self):
            # Allow the very first assignment to happen
            super().__setattr__(key, value)
        else:
            raise AttributeError("Immutable object")

    def __delattr__(self, key: str) -> Never:
        raise AttributeError("Immutable object")

Creating Copies of Objects with Replaced Attributes

When objects are immutable, a natural question arises: How are you supposed to write code that accomplishes anything when modifications to data structures aren’t possible? For example, here I have another helper function that moves a Point object by a relative amount:

def translate(point, delta_x, delta_y):
    point.x += delta_x
    point.y += delta_y

As expected, it fails when the input object is immutable:

point1 = ImmutablePoint("destination", 5, 3)
translate(point1, 10, 20)

>>>
Traceback ...
AttributeError: Immutable object: set not allowed

One way to work around this limitation is to return a copy of the given argument with updated attribute values:

def translate_copy(point, delta_x, delta_y):
    return ImmutablePoint(
        name=point.name,
        x=point.x + delta_x,
        y=point.y + delta_y,
    )

However, this is error prone because you need to copy all of the attributes that you’re not trying to modify, such as name in this case. Over time, as the class adds, removes, or changes attributes, this copying code might get out of sync and cause mysterious bugs in your program.

To reduce the risk of such errors in a standard class, here I add a method that knows how to create copies of an object with a given set of attribute overrides:

class ImmutablePoint:
    ...

    def _replace(self, **overrides):
        fields = dict(
            name=self.name,
            x=self.x,
            y=self.y,
        )
        fields.update(overrides)
        cls = type(self)
        return cls(**fields)

Now code can rely on the _replace method to ensure that all attributes are properly accounted for. Here I define another version of the translate function that uses this method:

def translate_replace(point, delta_x, delta_y):
    return point._replace(  # Changed
        x=point.x + delta_x,
        y=point.y + delta_y,
    )

Note how the name attribute is no longer mentioned.

But this approach still isn’t ideal. Although I’ve centralized the field copying code to one location inside the class, it’s still possible for the _replace method to get out of sync because it needs to be manually maintained. Further, each class that needs this functionality must define its own _replace method, which leads to more boilerplate code to manage.

To accomplish the same behavior with dataclass, I can simply use the replace helper function from the dataclasses module; no changes to the class definition are required, no custom _replace method needs to be defined, and there’s no chance for the method to get out of sync:

import dataclasses

def translate_dataclass(point, delta_x, delta_y):
    return dataclasses.replace(  # Changed
        point,
        x=point.x + delta_x,
        y=point.y + delta_y,
    )

Using Immutable Objects in Dictionaries and Sets

When you assign the same key to different values in a dict, you expect only the final mapping to be preserved:

my_dict = {}
my_dict["a"] = 123
my_dict["a"] = 456
print(my_dict)

>>>
{'a': 456}

Similarly, when you add a value to a set, you expect that all subsequent additions of the same value will result in no changes to the set because the item is already present:

my_set = set()
my_set.add("b")
my_set.add("b")
print(my_set)

>>>
{'b'}

These stable mapping and deduplication behaviors are critical expectations for how these data structures work. Surprisingly, by default, user-defined objects can’t be used as dictionary keys or set values in the same way the simple values "a" and "b" were in the code above.

For example, say that I want to write a program that simulates the physics of electricity. Here I create a dictionary that maps Point objects to the amount of charge at that location (there could be other dictionaries that map the same Point objects to other quantities like magnetic flux, etc).:

point1 = Point("A", 5, 10)
point2 = Point("B", -7, 4)
charges = {
    point1: 1.5,
    point2: 3.5,
}

Retrieving the value for a given Point in the dictionary seems to work:

print(charges[point1])

>>>
1.5

But if I create another Point object that appears equivalent to the first one—the same coordinates and name—a KeyError exception is raised by dictionary lookup:

point3 = Point("A", 5, 10)
charges[point3]

>>>
Traceback ...
KeyError: <__main__.Point object at 0x100e85eb0>

Upon further inspection, the Point objects aren’t considered equivalent because I haven’t implemented the __eq__ special method for the class:

assert point1 != point3

The default implementation of the == operator for objects is the same as the is operator that only compares their identities. Here I implement the __eq__ special method so it compares the values of the objects’ attributes instead:

class Point:
    ...

    def __eq__(self, other):
        return (
            type(self) == type(other)
            and self.name == other.name
            and self.x == other.x
            and self.y == other.y
        )

Now, two Point objects that appear equivalent will also be treated as such by the == operator:

point4 = Point("A", 5, 10)
point5 = Point("A", 5, 10)
assert point4 == point5

However, even with these new equivalent objects, the dictionary lookup from earlier still fails:

other_charges = {
    point4: 1.5,
}
other_charges[point5]

>>>
Traceback ...
TypeError: unhashable type: 'Point'

The issue is that the Point class doesn’t implement the __hash__ special method. Python’s implementation of the dictionary type relies on the integer value returned by the __hash__ method to maintain its internal lookup table. In order for dictionaries to work properly, this hash value must be stable and unchanging for individual objects, and it must be the same for equivalent objects. Here I implement the __hash__ method by putting the object’s attributes in a tuple and passing it to the hash built-in function:

class Point:
    ...

    def __hash__(self):
        return hash((self.name, self.x, self.y))

Now the dictionary lookup works as expected:

point6 = Point("A", 5, 10)
point7 = Point("A", 5, 10)

more_charges = {
    point6: 1.5,
}
value = more_charges[point7]
assert value == 1.5

With dataclasses, none of this effort is required in order to use an immutable object as a key in a dictionary. When you provide the frozen flag to the dataclass decorator, you get all these behaviors (e.g., __eq__, __hash__) automatically:

point8 = DataclassImmutablePoint("A", 5, 10)
point9 = DataclassImmutablePoint("A", 5, 10)

easy_charges = {
    point8: 1.5,
}
assert easy_charges[point9] == 1.5

These immutable objects can also be used as values in a set and will properly deduplicate:

my_set = {point8, point9}
assert my_set == {point8}

What about namedtuple?

Before dataclasses was added to the Python standard library (in version 3.7), a good choice for creating immutable objects was the namedtuple function from the collections built-in module. namedtuple provides many of the same benefits as the dataclass decorator using the frozen flag, including:

  • Images Construction of objects with positional or keyword arguments, default values provided when attributes are unspecified.

  • Images Automatic definition of object-oriented special methods (e.g., __init__, __repr__, __eq__, __hash__, __lt__).

  • Images Built-in helper methods _replace and _dict, runtime introspection with the _fields and _field_defaults class attributes.

  • Images Support for static type checking when using the NamedTuple class from the typing built-in module.

  • Images Low memory usage by avoiding __dict__ instance dictionaries (i.e., similar to using dataclasses with slots=True).

In addition, all fields of a namedtuple are accessible by positional index, which can be ideal for wrapping sequential data structures like lines from a CSV (comma-separated values) file or rows from database query results—with a dataclass you must call the _astuple method. However, the sequential nature of namedtuple can lead to unintentional usage (i.e., numerical indexing and iteration) that can cause bugs and make it difficult to migrate to a standard class later, especially for external APIs (see Item 119: “Use Packages to Organize Modules and Provide Stable APIs”). If your data structure is sequential, then namedtuple might be a good choice, but otherwise it’s best to go with dataclasses or a standard class (see Item 65: “Consider Class Body Definition Order to Establish Relationships Between Attributes”).

Things to remember

  • Images Functional-style code that uses immutable objects is often more robust than imperative-style code that modifies state and causes side effects.

  • Images The easiest way to make your own immutable objects is by using the dataclasses built-in module; simply apply the dataclass decorator when defining a class and pass the frozen=True argument.

  • Images The replace helper function from the dataclasses module allows you to create copies of immutable objects with some attributes changed, making it easier to write functional-style code.

  • Images Immutable objects created with dataclass are comparable for equivalence by value and have stable hashes, which allows them to be used as keys in dictionaries and as values in sets.

Item 57: Inherit from collections.abc Classes for Custom Container Types

Much of programming in Python involves defining classes that contain data and describing how such objects relate to each other. Every Python class is a container of some kind, encapsulating attributes and functionality together. Python also provides built-in container types for managing data: lists, tuples, sets, and dictionaries.

When you’re designing classes for simple use cases like sequences, it’s natural to want to subclass Python’s built-in list type directly. For example, say that I want to create my own custom list type that has additional methods for counting the frequency of its members:

class FrequencyList(list):
    def __init__(self, members):
        super().__init__(members)

    def frequency(self):
        counts = {}
        for item in self:
            counts[item] = counts.get(item, 0) + 1
        return counts

By subclassing list, I get all of list’s standard functionality and preserve the semantics familiar to all Python programmers. I can define additional methods to provide any custom behaviors that I need:

foo = FrequencyList(["a", "b", "a", "c", "b", "a", "d"])
print("Length is", len(foo))
foo.pop()  # Removes "d"
print("After pop:", repr(foo))
print("Frequency:", foo.frequency())

>>>
Length is 7
After pop: ['a', 'b', 'a', 'c', 'b', 'a']
Frequency: {'a': 3, 'b': 2, 'c': 1}

Now, imagine that I need to define an object that feels like a list and allows indexing but isn’t a list subclass. For example, say that I want to provide sequence semantics (like list or tuple; see Item 14: “Know How to Slice Sequences” for background) for a binary tree class:

class BinaryNode:
    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

How do you make this class act like a sequence type? Python implements its container behaviors with instance methods that have special names. When you access a sequence item by index:

bar = [1, 2, 3]
bar[0]

it will be interpreted as:

bar.__getitem__(0)

To make the BinaryNode class act like a sequence, you can provide a custom implementation of __getitem__ (often pronounced “dunder getitem” as an abbreviation for “double underscore getitem”) that traverses the object tree depth-first:

class IndexableNode(BinaryNode):
    def _traverse(self):
        if self.left is not None:
            yield from self.left._traverse()
        yield self
        if self.right is not None:
            yield from self.right._traverse()

    def __getitem__(self, index):
        for i, item in enumerate(self._traverse()):
            if i == index:
                return item.value
        raise IndexError(f"Index {index} is out of range")

Here I construct a binary tree with normal object initialization:

tree = IndexableNode(
    10,
    left=IndexableNode(
        5,
        left=IndexableNode(2),
        right=IndexableNode(6, right=IndexableNode(7)),
    ),
    right=IndexableNode(15, left=IndexableNode(11)),
)

But I can also access it like a list in addition to being able to traverse the tree with the left and right attributes:

print("LRR is", tree.left.right.right.value)
print("Index 0 is", tree[0])
print("Index 1 is", tree[1])
print("11 in the tree?", 11 in tree)
print("17 in the tree?", 17 in tree)
print("Tree is", list(tree))

>>>
LRR is 7
Index 0 is 2
Index 1 is 5
11 in the tree? True
17 in the tree? False
Tree is [2, 5, 6, 7, 10, 11, 15]

The problem is that implementing __getitem__ isn’t enough to provide all of the sequence semantics Python expects from a list instance:

len(tree)

>>>
Traceback ...
TypeError: object of type 'IndexableNode' has no len()

The len built-in function requires another special method, __len__, that must have an implementation for a custom sequence type:

class SequenceNode(IndexableNode):
    def __len__(self):
        count = 0
        for _ in self._traverse():
            count += 1
        return count

tree = SequenceNode(
    10,
    left=SequenceNode(
        5,
        left=SequenceNode(2),
        right=SequenceNode(6, right=SequenceNode(7)),
    ),
    right=SequenceNode(15, left=SequenceNode(11)),
)

print("Tree length is", len(tree))

>>>
Tree length is 7

Unfortunately, this still isn’t enough for the class to fully act as a valid sequence. Also missing are the count and index methods that a Python programmer would expect to see on a sequence like list or tuple. It turns out that defining your own container types is much harder than it seems.

To avoid this difficulty throughout the Python universe, the collections.abc built-in module defines a set of abstract base classes that provide all of the typical methods for each container type. When you subclass from these abstract base classes and forget to implement required methods, the module tells you something is wrong:

from collections.abc import Sequence

class BadType(Sequence):
    pass

foo = BadType()

>>>
Traceback ...
TypeError: Can't instantiate abstract class BadType without an
➥implementation for abstract methods '__getitem__', '__len__'

When you implement all of the methods required by an abstract base class from collections.abc, as I did above with SequenceNode, it provides all of the additional methods, like index and count, for free:

class BetterNode(SequenceNode, Sequence):
    pass

tree = BetterNode(
    10,
    left=BetterNode(
        5,
        left=BetterNode(2),
        right=BetterNode(6, right=BetterNode(7)),
    ),
    right=BetterNode(15, left=BetterNode(11)),
)

print("Index of 7 is", tree.index(7))
print("Count of 10 is", tree.count(10))

>>>
Index of 7 is 3
Count of 10 is 1

The benefit of using these abstract base classes is even greater for more complex container types such as Set and MutableMapping, which have a large number of special methods that need to be implemented to match Python conventions.

Beyond the collections.abc module, Python also uses a variety of special methods for object comparisons and sorting, which may be provided by container classes and non-container classes alike (see Item 104: “Know How to Use heapq for Priority Queues” and Item 51: “Prefer dataclasses for Defining Lightweight Classes” for examples).

Things to Remember

  • Images For simple use cases, it’s fine to inherit directly from Python’s container types (like list or dict) to utilize their fundamental behavior.

  • Images Beware of the large number of methods required to implement custom container types correctly when not inheriting from a built-in type.

  • Images To ensure that your custom container classes match the required behaviors, have them inherit from the interfaces defined in collections.abc.