8

Metaclasses and Attributes

Metaclasses are often mentioned in lists of Python’s unique features, but few understand what they accomplish in practice. The name metaclass vaguely implies a concept above and beyond a class. Simply put, metaclasses let you intercept Python’s class statement and provide special behavior each time a class is defined.

Similarly mysterious are Python’s built-in features for dynamically customizing attribute accesses. Along with Python’s object-oriented constructs, these facilities provide powerful tools to ease the transition from simple classes to complex ones.

However, with these capabilities come many pitfalls. Dynamic attributes enable you to override objects and cause unexpected side effects. Metaclasses can create extremely bizarre behaviors that are unapproachable to newcomers. It’s important that you follow the rule of least surprise and only use these mechanisms to implement well-understood idioms.

Item 58: Use Plain Attributes Instead of Setter and Getter Methods

Programmers coming to Python from other languages might naturally try to implement explicit getter and setter methods in their classes to access protected attributes (see Item 55: “Prefer Public Attributes over Private Ones” for background):

class OldResistor:
    def __init__(self, ohms):
        self._ohms = ohms

    def get_ohms(self):
        return self._ohms

    def set_ohms(self, ohms):
        self._ohms = ohms

Using these setters and getters is simple, but it’s not Pythonic:

r0 = OldResistor(50e3)
print("Before:", r0.get_ohms())
r0.set_ohms(10e3)
print("After: ", r0.get_ohms())

>>>
Before: 50000.0
After:  10000.0

Such methods are especially clumsy for operations like incrementing in place:

r0.set_ohms(r0.get_ohms() - 4e3)
assert r0.get_ohms() == 6e3

These utility methods do, however, help define the interface for a class, making it easier to encapsulate functionality, validate usage, and define boundaries. Those are important goals when designing a class to ensure that you don’t break callers as the class evolves over time.

In Python, however, you never need to implement explicit setter or getter methods like this. Instead, you should always start your implementations with simple public attributes, as I do here:

class Resistor:
    def __init__(self, ohms):
        self.ohms = ohms
        self.voltage = 0
        self.current = 0

r1 = Resistor(50e3)
r1.ohms = 10e3

These attributes make operations like incrementing in place natural and clear:

r1.ohms += 5e3

Later, if I decide I need special behavior when an attribute is set, I can migrate to the @property decorator (see Item 38: “Define Function Decorators with functools.wraps” for background) and its corresponding setter attribute. Here, I define a new subclass of Resistor that lets me vary current by assigning the voltage property. Note that in order for this code to work properly, the names of both the setter and the getter methods must match the intended property name (voltage):

class VoltageResistance(Resistor):
    def __init__(self, ohms):
        super().__init__(ohms)
        self._voltage = 0

    @property
    def voltage(self):
        return self._voltage

    @voltage.setter
    def voltage(self, voltage):
        self._voltage = voltage
        self.current = self._voltage / self.ohms

Now, assigning the voltage property will run the voltage setter method, which in turn will update the current attribute of the object to match:

r2 = VoltageResistance(1e2)
print(f"Before: {r2.current:.2f} amps")
r2.voltage = 10
print(f"After:  {r2.current:.2f} amps")

>>>
Before: 0.00 amps
After:  0.10 amps

Specifying the setter for a property also enables me to perform type checking and validation on values passed to the class. Here I define a class that ensures all resistance values are above zero ohms:

class BoundedResistance(Resistor):
    def __init__(self, ohms):
        super().__init__(ohms)

    @property
    def ohms(self):
        return self._ohms

    @ohms.setter
    def ohms(self, ohms):
        if ohms <= 0:
            raise ValueError(f"ohms must be > 0; got {ohms}")
        self._ohms = ohms

Assigning an invalid resistance to the attribute now raises an exception:

r3 = BoundedResistance(1e3)
r3.ohms = 0

>>>
Traceback ...
ValueError: ohms must be > 0; got 0

An exception is also raised if I pass an invalid value to the constructor:

BoundedResistance(-5)

>>>
Traceback ...
ValueError: ohms must be > 0; got -5

This happens because BoundedResistance.__init__ calls Resistor.__init__, which assigns self.ohms = -5. That assignment causes the @ohms.setter method from BoundedResistance to be called, and it immediately runs the validation code before object construction has completed.

I can even use @property to make attributes from parent classes immutable (see Item 56: “Prefer dataclasses for Creating Immutable Objects” for another approach):

class FixedResistance(Resistor):
    def __init__(self, ohms):
        super().__init__(ohms)

    @property
    def ohms(self):
        return self._ohms

    @ohms.setter
    def ohms(self, ohms):
        if hasattr(self, "_ohms"):
            raise AttributeError("Ohms is immutable")
        self._ohms = ohms

Trying to assign to the property after construction raises an exception:

r4 = FixedResistance(1e3)
r4.ohms = 2e3

>>>
Traceback ...
AttributeError: Ohms is immutable

When you use @property methods to implement setters and getters, be sure that the behavior you implement is not surprising. For example, don’t set other attributes in getter property methods:

class MysteriousResistor(Resistor):
    @property
    def ohms(self):
        self.voltage = self._ohms * self.current
        return self._ohms

    @ohms.setter
    def ohms(self, ohms):
        self._ohms = ohms

Setting other attributes in getter property methods leads to extremely bizarre behavior:

r7 = MysteriousResistor(10)
r7.current = 0.1
print(f"Before: {r7.voltage:.2f}")
r7.ohms
print(f"After:  {r7.voltage:.2f}")

>>>
Before: 0.00
After:  1.00

The best policy is to modify only related object state in @property.setter methods. Be sure to also avoid any other side effects that the caller may not expect beyond the object, such as importing modules dynamically, running slow helper functions, doing I/O, or making expensive database queries. Users of a class will expect its attributes to be like any other Python object: quick and easy. Use normal methods to do anything more complex or slow.

The biggest shortcoming of @property is that the methods for an attribute can only be shared by subclasses. Unrelated classes can’t share the same implementation. However, Python also supports descriptors (see Item 60: “Use Descriptors for Reusable @property Methods”) that enable reusable property logic and many other use cases.

Things to Remember

  • Images Define new class interfaces using simple public attributes and avoid defining setter and getter methods.

  • Images Use @property to define special behavior when attributes are accessed on your objects.

  • Images Follow the rule of least surprise and avoid odd side effects in your @property methods.

  • Images Ensure that @property methods are fast; for slow or complex work—especially involving I/O or causing side effects—use normal methods instead.

Item 59: Consider @property Instead of Refactoring Attributes

The built-in @property decorator makes it easy for simple accesses of an instance’s attributes to act more intelligently (see Item 58: “Use Plain Attributes Instead of Setter and Getter Methods”). One advanced but common use of @property is transitioning what was once a simple numerical attribute into an on-the-fly calculation. This is extremely helpful because it lets you migrate all existing usage of a class to have new behaviors without requiring any of the call sites to be rewritten (which is especially important if there’s calling code that you don’t control). @property also provides an important stopgap for improving interfaces over time.

For example, say that I want to implement a leaky-bucket rate-limiting quota system using plain Python objects. Here, the Bucket class represents how much quota remains and the duration for which the quota will be available:

from datetime import datetime, timedelta

class Bucket:
    def __init__(self, period):
        self.period_delta = timedelta(seconds=period)
        self.reset_time = datetime.now()
        self.quota = 0

    def __repr__(self):
        return f"Bucket(quota={self.quota})"

The leaky-bucket algorithm works by ensuring that, whenever the bucket is filled, the amount of quota does not carry over from one period to the next:

def fill(bucket, amount):
    now = datetime.now()
    if (now - bucket.reset_time) > bucket.period_delta:
        bucket.quota = 0
        bucket.reset_time = now
    bucket.quota += amount

Each time a quota consumer wants to do something, it must first ensure that it can deduct the amount of quota it needs to use:

def deduct(bucket, amount):
    now = datetime.now()
    if (now - bucket.reset_time) > bucket.period_delta:
        return False  # Bucket hasn't been filled this period
    if bucket.quota - amount < 0:
        return False  # Bucket was filled, but not enough
    bucket.quota -= amount
    return True       # Bucket had enough, quota consumed

To use this class, first I fill up the bucket:

bucket = Bucket(60)
fill(bucket, 100)
print(bucket)

>>>
Bucket(quota=100)

Then, I deduct the quota that I need:

if deduct(bucket, 99):
    print("Had 99 quota")
else:
    print("Not enough for 99 quota")

print(bucket)

>>>
Had 99 quota
Bucket(quota=1)

Eventually, I’m prevented from making progress because I try to deduct more quota than is available. In this case, the bucket’s quota level remains unchanged:

if deduct(bucket, 3):
    print("Had 3 quota")
else:
    print("Not enough for 3 quota")

print(bucket)

>>>
Not enough for 3 quota
Bucket(quota=1)

The problem with this implementation is that I never know what quota level the bucket started with. The quota is deducted over the course of the period until it reaches zero. At that point, deduct will always return False until the bucket is refilled. When that happens, it would be useful to know whether callers to deduct are being blocked because the Bucket ran out of quota or because the Bucket never had quota during this period in the first place.

To fix this, I can change the class to keep track of the max_quota issued in the period and the quota_consumed in the same period:

class NewBucket:
    def __init__(self, period):
        self.period_delta = timedelta(seconds=period)
        self.reset_time = datetime.now()
        self.max_quota = 0
        self.quota_consumed = 0

    def __repr__(self):
        return (
            f"NewBucket(max_quota={self.max_quota}, "
            f"quota_consumed={self.quota_consumed})"
        )

To match the previous interface of the original Bucket class, I use a @property method to compute the current level of quota on-the-fly using these new attributes:

    @property
    def quota(self):
        return self.max_quota - self.quota_consumed

When the quota attribute is assigned, I take special action to be compatible with the current usage of the class by the fill and deduct functions:

    @quota.setter
    def quota(self, amount):
        delta = self.max_quota - amount
        if amount == 0:
            # Quota being reset for a new period
            self.quota_consumed = 0
            self.max_quota = 0
        elif delta < 0:
            # Quota being filled during the period
            self.max_quota = amount + self.quota_consumed
        else:
            # Quota being consumed during the period
            self.quota_consumed = delta

Rerunning the demo code from above produces the same results:

bucket = NewBucket(60)
print("Initial", bucket)
fill(bucket, 100)
print("Filled", bucket)

if deduct(bucket, 99):
    print("Had 99 quota")
else:
    print("Not enough for 99 quota")

print("Now", bucket)

if deduct(bucket, 3):
    print("Had 3 quota")
else:
    print("Not enough for 3 quota")

print("Still", bucket)

>>>
Initial NewBucket(max_quota=0, quota_consumed=0)
Filled NewBucket(max_quota=100, quota_consumed=0)
Had 99 quota
Now NewBucket(max_quota=100, quota_consumed=99)
Not enough for 3 quota
Still NewBucket(max_quota=100, quota_consumed=99)

The best part is that the code using Bucket.quota doesn’t have to change or know that the class has changed. New usage of Bucket can do the right thing and access max_quota and quota_consumed directly.

I especially like @property because it lets you make incremental progress toward a better data model over time. Reading the Bucket example above, you might have thought that fill and deduct should have been implemented as instance methods in the first place. Although you’re probably right, in practice there are many situations in which objects start with poorly defined interfaces or act as dumb data containers (see Item 51: “Prefer dataclasses for Defining Lightweight Classes” for examples). This happens when code grows over time, scope increases, multiple authors contribute without anyone considering long-term hygiene, and so on.

@property is a tool to help you address problems you’ll come across in real-world code. Don’t overuse it. When you find yourself repeatedly extending @property methods, it’s probably time to refactor your class instead of further paving over your code’s poor design.

Things to Remember

  • Images Use @property to give existing instance attributes new functionality.

  • Images Make incremental progress toward better data models by using @property.

  • Images Consider refactoring a class and all call sites when you find yourself using @property too heavily.

Item 60: Use Descriptors for Reusable @property Methods

The big problem with the @property built-in (see Item 58: “Use Plain Attributes Instead of Setter and Getter Methods” and Item 59: “Consider @property Instead of Refactoring Attributes”) is reuse. The methods it decorates can’t be reused for multiple attributes of the same class. They also can’t be reused by unrelated classes.

For example, say that I want a class to validate that the grade received by a student on a homework assignment is a percentage:

class Homework:
    def __init__(self):
        self._grade = 0

    @property
    def grade(self):
        return self._grade

    @grade.setter
    def grade(self, value):
        if not (0 <= value <= 100):
            raise ValueError("Grade must be between 0 and 100")
        self._grade = value

Using @property makes this class easy to use:

galileo = Homework()
galileo.grade = 95

Say that I also want to give the student a grade for an exam, where the exam has multiple subjects, each with a separate grade:

class Exam:
    def __init__(self):
        self._writing_grade = 0
        self._math_grade = 0

    @staticmethod
    def _check_grade(value):
        if not (0 <= value <= 100):
            raise ValueError("Grade must be between 0 and 100")

This quickly gets tedious. For each section of the exam, I need to add a new @property and related validation:

    @property
    def writing_grade(self):
        return self._writing_grade

    @writing_grade.setter
    def writing_grade(self, value):
        self._check_grade(value)
        self._writing_grade = value

    @property
    def math_grade(self):
        return self._math_grade

    @math_grade.setter
    def math_grade(self, value):
        self._check_grade(value)
        self._math_grade = value

Also, this approach is not general. If I want to reuse this percentage validation in other classes beyond homework and exams, I’ll need to write the @property boilerplate and _check_grade method over and over again.

The better way to do this in Python is to use a descriptor. The descriptor protocol defines how attribute access is interpreted by the language. A descriptor class can provide __get__ and __set__ methods that let you reuse the grade validation behavior without any boilerplate. For this purpose, descriptors are also better than mix-ins (see Item 54: “Consider Composing Functionality with Mix-in Classes”) because they let you reuse the same logic for many different attributes in a single class.

Here I define a new class called Exam with class attributes that are Grade instances. The Grade class implements the descriptor protocol:

class Grade:
    def __get__(self, instance, instance_type):
        ...

    def __set__(self, instance, value):
        ...

class Exam:
    # Class attributes
    math_grade = Grade()
    writing_grade = Grade()
    science_grade = Grade()

Before I explain how the Grade class works, it’s important to understand what Python will do when such descriptor attributes are accessed on an Exam instance. When I assign a property:

exam = Exam()
exam.writing_grade = 40

it is interpreted as:

Exam.__dict__["writing_grade"].__set__(exam, 40)

When I retrieve a property:

exam.writing_grade

it is interpreted as:

Exam.__dict__["writing_grade"].__get__(exam, Exam)

What drives this behavior is the __getattribute__ method of object (see Item 61: “Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes” for background). In short, when an Exam instance doesn’t have an attribute named writing_grade, Python falls back to the Exam class’s attribute instead. If this class attribute is an object that has __get__ and __set__ methods, Python assumes that you want to follow the descriptor protocol.

Knowing this behavior and how I used @property for grade validation in the Homework class, here’s a reasonable first attempt at implementing the Grade descriptor:

class Grade:
    def __init__(self):
        self._value = 0

    def __get__(self, instance, instance_type):
        return self._value

    def __set__(self, instance, value):
        if not (0 <= value <= 100):
            raise ValueError("Grade must be between 0 and 100")
        self._value = value

Unfortunately, this is wrong and results in broken behavior. Accessing multiple attributes on a single Exam instance works as expected:

class Exam:
    math_grade = Grade()
    writing_grade = Grade()
    science_grade = Grade()

first_exam = Exam()
first_exam.writing_grade = 82
first_exam.science_grade = 99
print("Writing", first_exam.writing_grade)
print("Science", first_exam.science_grade)

>>>
Writing 82
Science 99

But accessing these attributes on multiple Exam instances results in surprising behavior:

second_exam = Exam()
second_exam.writing_grade = 75
print(f"Second {second_exam.writing_grade} is right")
print(f"First  {first_exam.writing_grade} is wrong; "
      f"should be 82")

>>>
Second 75 is right
First  75 is wrong; should be 82

The problem is that a single Grade instance is shared across all Exam instances for the class attribute writing_grade. The Grade instance for this attribute is constructed once in the program lifetime, when the Exam class is first defined, not each time an Exam instance is created.

To solve this, I need the Grade class to keep track of its value for each unique Exam instance. I can do this by saving the per-instance state in a dictionary:

class DictGrade:
    def __init__(self):
        self._values = {}

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return self._values.get(instance, 0)

    def __set__(self, instance, value):
        if not (0 <= value <= 100):
            raise ValueError("Grade must be between 0 and 100")
        self._values[instance] = value

This implementation is simple and works well, but there’s still one gotcha: It leaks memory. The _values dictionary holds a reference to every instance of Exam ever passed to __set__ over the lifetime of the program. This causes instances to never have their reference count go to zero, preventing cleanup by the garbage collector (see Item 115: “Use tracemalloc to Understand Memory Usage and Leaks” for how to detect this type of problem).

Instead, you should rely on Python’s __set_name__ special method for descriptors (see Item 64: “Annotate Class Attributes with __set_name__” for background). This method is called on each descriptor instance after a class is defined. Critically, the name of the class attribute assigned to the descriptor instance is supplied by Python. This allows you to compute a string to use for the per-object attribute name (in this case, a protected field that starts with "_"):

class NamedGrade:
    def __set_name__(self, owner, name):
        self.internal_name = "_" + name

I can call setattr and getattr on the object with the internal_name from the descriptor to store and retrieve the corresponding attribute data:

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return getattr(instance, self.internal_name)

    def __set__(self, instance, value):
        if not (0 <= value <= 100):
            raise ValueError("Grade must be between 0 and 100")
        setattr(instance, self.internal_name, value)

Now I can define a new class with this improved descriptor and see how the attribute data for the descriptor resides inside the object’s instance dictionary (__dict__):

class NamedExam:
    math_grade = NamedGrade()
    writing_grade = NamedGrade()
    science_grade = NamedGrade()

first_exam = NamedExam()
first_exam.math_grade = 78
first_exam.writing_grade = 89
first_exam.science_grade = 94
print(first_exam.__dict__)

>>>
{'_math_grade': 78, '_writing_grade': 89, '_science_grade': 94}

Unlike the earlier implementation, this won’t leak memory because when a NamedExam object is garbage collected, all of its attribute data, including values assigned by descriptors, will be freed too.

Things to Remember

  • Images Reuse the behavior and validation of @property methods by defining your own descriptor classes.

  • Images Use __set_name__ along with setattr and getattr to store the data needed by descriptors in object instance dictionaries in order to avoid memory leaks.

  • Images Don’t get bogged down trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes.

Item 61: Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes

Python’s object base class provides hooks that make it easy to write generic code for gluing systems together. For example, say that I want to represent the records in a database as Python objects. I assume that the database has its schema already defined elsewhere. In most languages, I’d need to explicitly specify in code how the database schema maps to classes and objects in my program. However, in Python, I can do this object-relational mapping generically at runtime, so no boilerplate is required.

How is that possible? Plain instance attributes, @property methods, and descriptors can’t do this because they all need to be defined in advance. Python enables this dynamic behavior with the __getattr__ special method. If a class defines __getattr__, that method is called every time an attribute can’t be found in an object’s instance dictionary. Here, I define a __getattr__ hook that will insert an attribute into the object’s instance dictionary to prove that it ran:

class LazyRecord:
    def __init__(self):
        self.exists = 5

    def __getattr__(self, name):
        value = f"Value for {name}"
        setattr(self, name, value)
        return value

When I access the missing object attribute foo, for example, Python calls the __getattr__ method above, which mutates the instance dictionary __dict__:

data = LazyRecord()
print("Before:", data.__dict__)
print("foo:   ", data.foo)
print("After: ", data.__dict__)

>>>
Before: {'exists': 5}
foo:    Value for foo
After:  {'exists': 5, 'foo': 'Value for foo'}

I can add logging to LazyRecord to show when __getattr__ is actually called. Note how in this implementation I call super().__getattr__() to use the superclass’s implementation of __getattr__ in order to fetch the real property value and avoid infinite recursion (see Item 53: “Initialize Parent Classes with super” for background):

class LoggingLazyRecord(LazyRecord):
    def __getattr__(self, name):
        print(
            f"* Called __getattr__({name!r}), "
            f"populating instance dictionary"
        )
        result = super().__getattr__(name)
        print(f"* Returning {result!r}")
        return result

data = LoggingLazyRecord()
print("exists:     ", data.exists)
print("First foo:  ", data.foo)
print("Second foo: ", data.foo)

>>>
exists:      5
* Called __getattr__('foo'), populating instance dictionary
* Returning 'Value for foo'
First foo:   Value for foo
Second foo:  Value for foo

The exists attribute is present in the instance dictionary, so __getattr__ is never called for it. The foo attribute is not in the instance dictionary initially, so __getattr__ is called the first time. But the call to __getattr__ for foo also does a setattr, which populates foo in the instance dictionary. This is why the second time I access foo, it doesn’t log a call to __getattr__.

This behavior is especially helpful for use cases like lazily accessing schemaless data. __getattr__ runs once to do the hard work of loading a property; all subsequent accesses retrieve the existing result.

Now imagine that I also want transactions in this database system. The next time the user accesses a dynamic attribute, I want to know whether the corresponding record in the database is still valid and whether the transaction is still open. The __getattr__ hook won’t get called every time the attribute is accessed because it will use the object’s instance dictionary as the fast path for existing attributes.

To enable this more advanced use case, Python has another object hook called __getattribute__. This special method is called every time an attribute is accessed on an object, even in cases where it does exist in the attribute dictionary. This enables me to do things like check global transaction state on every property access. It’s important to note that such an operation can incur significant overhead and negatively impact performance, but sometimes it’s worth it. Here, I define ValidatingRecord to log each time __getattribute__ is called:

class ValidatingRecord:
    def __init__(self):
        self.exists = 5

    def __getattribute__(self, name):
        print(f"* Called __getattribute__({name!r})")
        try:
            value = super().__getattribute__(name)
            print(f"* Found {name!r}, returning {value!r}")
            return value
        except AttributeError:
            value = f"Value for {name}"
            print(f"* Setting {name!r} to {value!r}")
            setattr(self, name, value)
            return value

data = ValidatingRecord()
print("exists:     ", data.exists)
print("First foo:  ", data.foo)
print("Second foo: ", data.foo)

>>>
* Called __getattribute__('exists')
* Found 'exists', returning 5
exists:      5
* Called __getattribute__('foo')
* Setting 'foo' to 'Value for foo'
First foo:   Value for foo
* Called __getattribute__('foo')
* Found 'foo', returning 'Value for foo'
Second foo:  Value for foo

In the event that a dynamically accessed property shouldn’t exist, you can raise an AttributeError to cause Python’s standard missing property behavior for both __getattr__ and __getattribute__:

class MissingPropertyRecord:
    def __getattr__(self, name):
        if name == "bad_name":
            raise AttributeError(f"{name} is missing")
        ...

data = MissingPropertyRecord()
data.bad_name

>>>
Traceback ...
AttributeError: bad_name is missing

Python code implementing generic functionality often relies on the hasattr built-in function to determine when properties exist and the getattr built-in function to retrieve property values. These functions also look in the instance dictionary for an attribute name before calling __getattr__:

data = LoggingLazyRecord()  # Implements __getattr__
print("Before:         ", data.__dict__)
print("Has first foo:  ", hasattr(data, "foo"))
print("After:          ", data.__dict__)
print("Has second foo: ", hasattr(data, "foo"))

>>>
Before:          {'exists': 5}
* Called __getattr__('foo'), populating instance dictionary
* Returning 'Value for foo'
Has first foo:   True
After:           {'exists': 5, 'foo': 'Value for foo'}
Has second foo:  True

In the example above, __getattr__ is called only once (for the first hasattr call). In contrast, classes that implement __getattribute__ have that method called each time hasattr or getattr is used with an instance:

data = ValidatingRecord()  # Implements __getattribute__
print("Has first foo:  ", hasattr(data, "foo"))
print("Has second foo: ", hasattr(data, "foo"))

>>>
* Called __getattribute__('foo')
* Setting 'foo' to 'Value for foo'
Has first foo:   True
* Called __getattribute__('foo')
* Found 'foo', returning 'Value for foo'
Has second foo:  True

Now, say that I want to lazily push data back to the database when values are assigned to my Python object. I can do this with __setattr__, a similar object hook that lets you intercept arbitrary attribute assignments. Unlike when retrieving an attribute with __getattr__ and __getattribute__, there’s no need for two separate methods. The __setattr__ method is always called every time an attribute is assigned on an instance (either directly or through the setattr built-in function):

class SavingRecord:
    def __setattr__(self, name, value):
        # Save some data for the record
        ...
        super().__setattr__(name, value)

Here, I define a logging subclass of SavingRecord. Its __setattr__ method is always called on each attribute assignment:

class LoggingSavingRecord(SavingRecord):
    def __setattr__(self, name, value):
        print(f"* Called __setattr__({name!r}, {value!r})")
        super().__setattr__(name, value)

data = LoggingSavingRecord()
print("Before: ", data.__dict__)
data.foo = 5
print("After:  ", data.__dict__)
data.foo = 7
print("Finally:", data.__dict__)

>>>
Before:  {}
* Called __setattr__('foo', 5)
After:   {'foo': 5}
* Called __setattr__('foo', 7)
Finally: {'foo': 7}

The problem with __getattribute__ and __setattr__ is that they’re called on every attribute access for an object—even when you might not want that to happen. For example, say that attribute accesses on my object should actually look up keys in an associated dictionary:

class BrokenDictionaryRecord:
    def __init__(self, data):
        self._data = data

    def __getattribute__(self, name):
        print(f"* Called __getattribute__({name!r})")
        return self._data[name]

This requires accessing self._data from the __getattribute__ method. However, if I actually try to do that, Python will recurse until it reaches its stack limit, and then the program will crash:

data = BrokenDictionaryRecord({"foo": 3})
data.foo

>>>
* Called __getattribute__('foo')
* Called __getattribute__('_data')
* Called __getattribute__('_data')
* Called __getattribute__('_data')
...
Traceback ...
RecursionError: maximum recursion depth exceeded while calling
➥a Python object

The problem is that __getattribute__ accesses self._data, which causes __getattribute__ to run again, which accesses self._data again, and so on. The solution is to use the super().__getattribute__ method to fetch values from the instance attribute dictionary; this avoids the accidental recursion:

class DictionaryRecord:
    def __init__(self, data):
        self._data = data

    def __getattribute__(self, name):
        print(f"* Called __getattribute__({name!r})")
        data_dict = super().__getattribute__("_data")
        return data_dict[name]

data = DictionaryRecord({"foo": 3})
print("foo: ", data.foo)

>>>
* Called __getattribute__('foo')
foo:  3

__setattr__ methods that modify attributes on an object similarly need to use super().__setattr__.

Things to Remember

  • Images Use __getattr__ and __setattr__ to lazily load and save attributes for an object.

  • Images Understand that __getattr__ only gets called when accessing a missing attribute, whereas __getattribute__ gets called every time any attribute is accessed.

  • Images Avoid infinite recursion in __getattribute__ and __setattr__ method implementations by calling super().__getattribute__ and super().__getattr__ to access object attributes.

Item 62: Validate Subclasses with __init_subclass__

One of the simplest applications of metaclasses is verifying that a class was defined correctly. When you’re building a complex class hierarchy, you may want to enforce style, require overriding methods, or have strict relationships between class attributes. Metaclasses enable these use cases by providing a reliable way to run your validation code each time a new subclass is defined.

Often a class’s validation code runs in the __init__ method when an object of the class’s type is constructed at runtime (see Item 58: “Use Plain Attributes Instead of Setter and Getter Methods” for an example). Using metaclasses for validation can raise errors much earlier, such as when the module containing the class is first imported at program startup.

Before I get into how to define a metaclass for validating subclasses, it’s important to understand what a metaclass does for standard objects. A metaclass is defined by inheriting from type. A class indicates its metaclass with the metaclass keyword argument in its inheritance argument list. In the typical case, a metaclass has its __new__ method called with the contents of any associated class statements when they occur. Here, I use a basic metaclass to inspect a class’s information before the type is actually constructed:

class Meta(type):
    def __new__(meta, name, bases, class_dict):
        print(f"* Running {meta}.__new__ for {name}")
        print("Bases:", bases)
        print(class_dict)
        return type.__new__(meta, name, bases, class_dict)

class MyClass(metaclass=Meta):
    stuff = 123

    def foo(self):
        pass

class MySubclass(MyClass):
    other = 567

    def bar(self):
        pass

>>>
* Running <class '__main__.Meta'>.__new__ for MyClass
Bases: ()
{'__module__': '__main__',
 '__qualname__': 'MyClass',
 'stuff': 123,
 'foo': <function MyClass.foo at 0x104a63a60>}
* Running <class '__main__.Meta'>.__new__ for MySubclass
Bases: (<class '__main__.MyClass'>,)
{'__module__': '__main__',
 '__qualname__': 'MySubclass',
 'other': 567,
 'bar': <function MySubclass.bar at 0x104a63b00>}

The metaclass has access to the name of the class, the parent classes it inherits from (bases), and all of the class attributes that were defined in the class’s body. All classes inherit from object, so it’s not explicitly listed in the tuple of base classes.

I can add functionality to the Meta.__new__ method in order to validate all of the parameters of an associated subclass before it’s defined. For example, say that I want to represent any type of multisided polygon. I can do this by defining a special validating metaclass and using it in the base class of my polygon class hierarchy. Note that it’s important not to apply the same validation to the base class:

class ValidatePolygon(type):
    def __new__(meta, name, bases, class_dict):
        # Only validate subclasses of the Polygon class
        if bases:
            if class_dict["sides"] < 3:
                raise ValueError("Polygons need 3+ sides")
        return type.__new__(meta, name, bases, class_dict)

class Polygon(metaclass=ValidatePolygon):
    sides = None  # Must be specified by subclasses

    @classmethod
    def interior_angles(cls):
        return (cls.sides - 2) * 180

class Triangle(Polygon):
    sides = 3

class Rectangle(Polygon):
    sides = 4

class Nonagon(Polygon):
    sides = 9

assert Triangle.interior_angles() == 180
assert Rectangle.interior_angles() == 360
assert Nonagon.interior_angles() == 1260

If I try to define a polygon with fewer than three sides, the validation logic will cause the class statement to fail immediately after the class statement body. This means the program will not even be able to start running when I define such a class (unless it’s defined in a dynamically imported module; see Item 98: “Lazy-Load Modules with Dynamic Imports to Reduce Startup Time” for how this can happen):

print("Before class")

class Line(Polygon):
    print("Before sides")
    sides = 2
    print("After sides")

print("After class")

>>>
Before class
Before sides
After sides
Traceback ...
ValueError: Polygons need 3+ sides

This seems like quite a lot of machinery in order to get Python to accomplish such a basic task. Luckily, Python 3.6 introduced simplified syntax—the __init_subclass__ special class method—for achieving the same behavior and avoiding metaclasses entirely. Here, I use this mechanism to provide the same level of validation as before:

class BetterPolygon:
    sides = None  # Must be specified by subclasses

    def __init_subclass__(cls):
        super().__init_subclass__()
        if cls.sides < 3:
            raise ValueError("Polygons need 3+ sides")

    @classmethod
    def interior_angles(cls):
        return (cls.sides - 2) * 180

class Hexagon(BetterPolygon):
    sides = 6

assert Hexagon.interior_angles() == 720

The code is much shorter now, and the ValidatePolygon metaclass is gone entirely. It’s also easier to follow since I can access the sides attribute directly on the cls instance in __init_subclass__ instead of having to go into the class’s dictionary with class_dict["sides"]. If I define an invalid subclass of BetterPolygon, the same exception as before is raised:

print("Before class")

class Point(BetterPolygon):
    sides = 1

print("After class")

>>>
Before class
Traceback ...
ValueError: Polygons need 3+ sides

Another problem with the standard Python metaclass machinery is that you can only specify a single metaclass per class definition. Here, I define a second metaclass that I’d like to use for validating the fill color used for a region (not necessarily just polygons):

class ValidateFilled(type):
    def __new__(meta, name, bases, class_dict):
        # Only validate subclasses of the Filled class
        if bases:
            if class_dict["color"] not in ("red", "green"):
                raise ValueError("Fill color must be supported")
        return type.__new__(meta, name, bases, class_dict)

class Filled(metaclass=ValidateFilled):
    color = None  # Must be specified by subclasses

When I try to use the Polygon metaclass and Filled metaclass together, I get a cryptic error message:

class RedPentagon(Filled, Polygon):
    color = "blue"
    sides = 5

>>>
Traceback ...
TypeError: metaclass conflict: the metaclass of a derived class
➥must be a (non-strict) subclass of the metaclasses of all its
➥bases

It’s possible to fix this by creating a complex hierarchy of metaclass type definitions to layer validation:

class ValidatePolygon(type):
    def __new__(meta, name, bases, class_dict):
        # Only validate non-root classes
        if not class_dict.get("is_root"):
            if class_dict["sides"] < 3:
                raise ValueError("Polygons need 3+ sides")
        return type.__new__(meta, name, bases, class_dict)

class Polygon(metaclass=ValidatePolygon):
    is_root = True
    sides = None  # Must be specified by subclasses

class ValidateFilledPolygon(ValidatePolygon):
    def __new__(meta, name, bases, class_dict):
        # Only validate non-root classes
        if not class_dict.get("is_root"):
            if class_dict["color"] not in ("red", "green"):
                raise ValueError("Fill color must be supported")
        return super().__new__(meta, name, bases, class_dict)

class FilledPolygon(Polygon, metaclass=ValidateFilledPolygon):
    is_root = True
    color = None  # Must be specified by subclasses

This requires every FilledPolygon instance to be a Polygon instance:

class GreenPentagon(FilledPolygon):
    color = "green"
    sides = 5

greenie = GreenPentagon()
assert isinstance(greenie, Polygon)

Validation works for colors:

class OrangePentagon(FilledPolygon):
    color = "orange"
    sides = 5

>>>
Traceback ...
ValueError: Fill color must be supported

Validation also works for number of sides:

class RedLine(FilledPolygon):
    color = "red"
    sides = 2

>>>
Traceback ...
ValueError: Polygons need 3+ sides

However, this approach ruins composability, which is often the purpose of class validation like this (similar to mix-ins; see Item 54: “Consider Composing Functionality with Mix-in Classes”). If I want to apply the color validation logic from ValidateFilledPolygon to another hierarchy of classes, I’ll have to duplicate all of the logic again, which reduces code reuse and increases boilerplate.

The __init_subclass__ special class method can also be used to solve this problem. It can be defined by multiple levels of a class hierarchy as long as the super built-in function is used to call any parent or sibling __init_subclass__ definitions. Here, I define a class to represent a region’s fill color that can be composed with the BetterPolygon class from before:

class Filled:
    color = None  # Must be specified by subclasses

    def __init_subclass__(cls):
        super().__init_subclass__()
        if cls.color not in ("red", "green", "blue"):
            raise ValueError("Fills need a valid color")

I can inherit from both classes to define a new class. Both classes call super().__init_subclass__(), causing their corresponding validation logic to run when the subclass is created:

class RedTriangle(Filled, BetterPolygon):
    color = "red"
    sides = 3

ruddy = RedTriangle()
assert isinstance(ruddy, Filled)
assert isinstance(ruddy, BetterPolygon)

If I specify the number of sides incorrectly, I get a validation error:

print("Before class")

class BlueLine(Filled, BetterPolygon):
    color = "blue"
    sides = 2

print("After class")

>>>
Before class
Traceback ...
ValueError: Polygons need 3+ sides

If I specify the color incorrectly, I also get a validation error:

print("Before class")

class BeigeSquare(Filled, BetterPolygon):
    color = "beige"
    sides = 4

print("After class")

>>>
Before class
Traceback ...
ValueError: Fills need a valid color

You can even use __init_subclass__ in complex cases like multiple inheritance and diamond inheritance (see Item 53: “Initialize Parent Classes with super” for background). Here, I define a basic diamond hierarchy to show this in action:

class Top:
    def __init_subclass__(cls):
        super().__init_subclass__()
        print(f"Top for {cls}")

class Left(Top):
    def __init_subclass__(cls):
        super().__init_subclass__()
        print(f"Left for {cls}")

class Right(Top):
    def __init_subclass__(cls):
        super().__init_subclass__()
        print(f"Right for {cls}")

class Bottom(Left, Right):
    def __init_subclass__(cls):
        super().__init_subclass__()
        print(f"Bottom for {cls}")

>>>
Top for <class '__main__.Left'>
Top for <class '__main__.Right'>
Top for <class '__main__.Bottom'>
Right for <class '__main__.Bottom'>
Left for <class '__main__.Bottom'>

As expected, Top.__init_subclass__ is called only a single time for each class, even though there are two paths to it for the Bottom class through its Left and Right parent classes.

Things to Remember

  • Images The __new__ method of metaclasses is run after the class statement’s entire body has been processed.

  • Images Metaclasses can be used to inspect or modify a class after it’s defined but before it’s created, but they’re often more heavyweight than you need.

  • Images Use __init_subclass__ to ensure that subclasses are well formed at the time they are defined, before objects of their type are constructed.

  • Images Be sure to call super().__init_subclass__ from within your class’s __init_subclass__ definition to enable composable validation in multiple layers of classes and multiple inheritance.

Item 63: Register Class Existence with __init_subclass__

Another common use of metaclasses (see Item 62: “Validate Subclasses with __init_subclass__” for background) is to automatically register types in a program. Registration is useful for doing reverse lookups, where you need to map an identifier back to a corresponding class.

For example, say that I want to implement my own serialized representation of a Python object using JSON. I need a way to turn an object into a JSON string. Here, I do this generically by defining a base class that records the constructor parameters and turns them into a JSON dictionary (see Item 54: “Consider Composing Functionality with Mix-in Classes” for another approach):

import json

class Serializable:
    def __init__(self, *args):
        self.args = args

    def serialize(self):
        return json.dumps({"args": self.args})

This class makes it easy to serialize simple data structures to a string, like this one:

class Point2D(Serializable):
    def __init__(self, x, y):
        super().__init__(x, y)
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point2D({self.x}, {self.y})"

point = Point2D(5, 3)
print("Object:    ", point)
print("Serialized:", point.serialize())

>>>
Object:     Point2D(5, 3)
Serialized: {"args": [5, 3]}

Now I need to deserialize this JSON string and construct the Point2D object it represents. Here I define another class that can deserialize the data from its Serializable parent class (see Item 52: “Use @classmethod Polymorphism to Construct Objects Generically” for background):

class Deserializable(Serializable):
    @classmethod
    def deserialize(cls, json_data):
        params = json.loads(json_data)
        return cls(*params["args"])

Using Deserializable as a parent class makes it easy to serialize and deserialize simple objects in a generic way:

class BetterPoint2D(Deserializable):
    def __init__(self, x, y):
        super().__init__(x, y)
        self.x = x
        self.y = y

    def __repr__(self):
        return f"Point2D({self.x}, {self.y})"

before = BetterPoint2D(5, 3)
print("Before:    ", before)
data = before.serialize()
print("Serialized:", data)
after = BetterPoint2D.deserialize(data)
print("After:     ", after)

>>>
Before:     Point2D(5, 3)
Serialized: {"args": [5, 3]}
After:      Point2D(5, 3)

The problem with this approach is that it works only if you know the intended type of the serialized data ahead of time (e.g., Point2D, BetterPoint2D). Ideally, you’d have a large number of classes serializing to JSON and one common function that could deserialize any of them back to a corresponding Python object (see Item 50: “Consider functools.singledispatch for Functional-Style Programming Instead of Object-Oriented Polymorphism” for a similar example).

To do this, I can include the serialized object’s class name in the JSON data:

class BetterSerializable:
    def __init__(self, *args):
        self.args = args

    def serialize(self):
        return json.dumps(
            {
                "class": self.__class__.__name__,
                "args": self.args,
            }
        )

    def __repr__(self):
        name = self.__class__.__name__
        args_str = ", ".join(str(x) for x in self.args)
        return f"{name}({args_str})"

Then I can maintain a mapping of class names back to constructors for those objects. The general deserialize function works for any classes passed to register_class:

REGISTRY = {}

def register_class(target_class):
    REGISTRY[target_class.__name__] = target_class

def deserialize(data):
    params = json.loads(data)
    name = params["class"]
    target_class = REGISTRY[name]
    return target_class(*params["args"])

To ensure that deserialize always works properly, I must call register_class for every class I might want to deserialize in the future:

class EvenBetterPoint2D(BetterSerializable):
    def __init__(self, x, y):
        super().__init__(x, y)
        self.x = x
        self.y = y

register_class(EvenBetterPoint2D)

Now I can deserialize an arbitrary JSON string without having to know which class it contains:

before = EvenBetterPoint2D(5, 3)
print("Before:    ", before)
data = before.serialize()
print("Serialized:", data)
after = deserialize(data)
print("After:     ", after)

>>>
Before:     EvenBetterPoint2D(5, 3)
Serialized: {"class": "EvenBetterPoint2D", "args": [5, 3]}
After:      EvenBetterPoint2D(5, 3)

The problem with this approach is that it’s possible to forget to call register_class:

class Point3D(BetterSerializable):
    def __init__(self, x, y, z):
        super().__init__(x, y, z)
        self.x = x
        self.y = y
        self.z = z

# Forgot to call register_class! Whoops!

The code breaks at runtime when I try to deserialize an object of a class I forgot to register:

point = Point3D(5, 9, -4)
data = point.serialize()
deserialize(data)

>>>
Traceback ...
KeyError: 'Point3D'

Even though I chose to subclass BetterSerializable, I don’t actually get all of its features if I forget to call register_class after the class statement body. This approach is error prone and especially challenging for beginners to debug. The same omission can happen with class decorators (see Item 66: “Prefer Class Decorators over Metaclasses for Composable Class Extensions” for when those are appropriate).

What if I could somehow act on the programmer’s intent to use BetterSerializable and ensure that register_class is called in all cases? Metaclasses enable this by intercepting the class statement when subclasses are defined. Here I use a metaclass and corresponding superclass to register any child classes immediately after their class statements end:

class Meta(type):
    def __new__(meta, name, bases, class_dict):
        cls = type.__new__(meta, name, bases, class_dict)
        register_class(cls)
        return cls

class RegisteredSerializable(BetterSerializable, metaclass=Meta):
    pass

When I define a subclass of RegisteredSerializable, I can be confident that the call to register_class happens and deserialize will always work as expected:

class Vector3D(RegisteredSerializable):
    def __init__(self, x, y, z):
        super().__init__(x, y, z)
        self.x, self.y, self.z = x, y, z

before = Vector3D(10, -7, 3)
print("Before:    ", before)
data = before.serialize()
print("Serialized:", data)
print("After:     ", deserialize(data))

>>>
Before:     Vector3D(10, -7, 3)
Serialized: {"class": "Vector3D", "args": [10, -7, 3]}
After:      Vector3D(10, -7, 3)

An even better approach is to use the __init_subclass__ special class method. This simplified syntax, introduced in Python 3.6, reduces the visual noise of applying custom logic when a class is defined. It also makes it more approachable to beginners who may be confused by the complexity of metaclass syntax. Here I implement a new superclass to automatically call register_class and a subclass that uses it:

class BetterRegisteredSerializable(BetterSerializable):
    def __init_subclass__(cls):
        super().__init_subclass__()
        register_class(cls)

class Vector1D(BetterRegisteredSerializable):
    def __init__(self, magnitude):
        super().__init__(magnitude)
        self.magnitude = magnitude

Serialization and deserialization work as expected for this new class:

before = Vector1D(6)
print("Before:    ", before)
data = before.serialize()
print("Serialized:", data)
print("After:     ", deserialize(data))

>>>
Before:     Vector1D(6)
Serialized: {"class": "Vector1D", "args": [6]}
After:      Vector1D(6)

By using __init_subclass__ (or metaclasses) for class registration, you can ensure that you’ll never miss registering a class as long as the inheritance tree is right. This works well for serialization, as I’ve shown, and also applies to database object-relational mappings (ORMs), extensible plug-in systems, and callback hooks.

Things to Remember

  • Images Class registration is a helpful pattern for building modular Python programs.

  • Images Metaclasses let you run registration code automatically each time your base class is subclassed in a program.

  • Images Using metaclasses for class registration helps you avoid errors by ensuring that you never miss a registration call.

  • Images Prefer __init_subclass__ over standard metaclass machinery because it’s clearer and easier for beginners to understand.

Item 64: Annotate Class Attributes with __set_name__

One more useful feature enabled by metaclasses (see Item 62: “Validate Subclasses with __init_subclass__” for background) is the ability to modify or annotate properties after a class is defined but before the class is actually used. This approach is commonly used with descriptors (see Item 60: “Use Descriptors for Reusable @property Methods” for details) to give these attributes more introspection into how they’re being used within their containing class.

For example, say that I want to define a new class that represents a row in a customer database. I’d like to have a corresponding property on the class for each column in the database table. Here I define a descriptor class to connect attributes to column names:

class Field:
    def __init__(self, column_name):
        self.column_name = column_name
        self.internal_name = "_" + self.column_name

I can use the column name to save all of the per-instance state directly in the instance dictionary as protected fields by using the setattr built-in function, and later I can load state with getattr (see Item 61: “Use __getattr__, __getattribute__, and __setattr__ for Lazy Attributes” for background):

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return getattr(instance, self.internal_name, "")

    def __set__(self, instance, value):
        setattr(instance, self.internal_name, value)

Defining the class representing a row requires supplying the database table’s column name for each descriptor attribute:

class Customer:
    # Class attributes
    first_name = Field("first_name")
    last_name = Field("last_name")
    prefix = Field("prefix")
    suffix = Field("suffix")

Using the row class is simple. Here, the Field descriptors modify the instance dictionary __dict__ as expected:

cust = Customer()
print(f"Before: {cust.first_name!r} {cust.__dict__}")
cust.first_name = "Euclid"
print(f"After:  {cust.first_name!r} {cust.__dict__}")

>>>
Before: '' {}
After:  'Euclid' {'_first_name': 'Euclid'}

But the code for this class definition seems redundant. I already declared the name of the field for the class on the left (field_name =). Why do I also have to pass a string containing the same information to the Field constructor (Field("first_name")) on the right?

class Customer:
    # Left side is redundant with right side
    first_name = Field("first_name")
    ...

The problem is that the order of evaluation for the Customer class definition is the opposite of how it reads from left to right. First, the Field constructor is called as Field("first_name"). Then, the return value of that is assigned to the Customer.first_name class attribute. There’s no way for a Field instance to know upfront which class attribute it will be assigned to.

To eliminate this redundancy, I can use a metaclass. A metaclass lets you hook the class statement directly and take action as soon as a class body is finished. In this case, I can use the metaclass to assign Field.column_name and Field.internal_name on the descriptor automatically instead of manually specifying the field name multiple times:

class Meta(type):
    def __new__(meta, name, bases, class_dict):
        for key, value in class_dict.items():
            if isinstance(value, Field):
                value.column_name = key
                value.internal_name = "_" + key
        cls = type.__new__(meta, name, bases, class_dict)
        return cls

Here I define a base class that uses the metaclass. All classes representing database rows should inherit from this class to ensure that they use the metaclass:

class DatabaseRow(metaclass=Meta):
    pass

To work with the metaclass, the field descriptor is largely unchanged. The only difference is that it no longer requires any arguments to be passed to its constructor. Instead, its attributes are set by the Meta.__new__ method above:

class Field:
    def __init__(self):
        # These will be assigned by the metaclass.
        self.column_name = None
        self.internal_name = None

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return getattr(instance, self.internal_name, "")

    def __set__(self, instance, value):
        setattr(instance, self.internal_name, value)

When I use the metaclass, the new DatabaseRow base class, and the new Field descriptor, the class definition for a database row no longer has the redundancy from before:

class BetterCustomer(DatabaseRow):
    first_name = Field()
    last_name = Field()
    prefix = Field()
    suffix = Field()

The behavior of the new class is identical to the behavior of the old one:

cust = BetterCustomer()
print(f"Before: {cust.first_name!r} {cust.__dict__}")
cust.first_name = "Euler"
print(f"After:  {cust.first_name!r} {cust.__dict__}")

>>>
Before: '' {}
After:  'Euler' {'_first_name': 'Euler'}

The trouble with this approach is that you can’t use the Field class for properties unless you also inherit from DatabaseRow. If you somehow forget to subclass DatabaseRow, or if you don’t want to due to other structural requirements of the class hierarchy, the code will break:

class BrokenCustomer:  # Missing inheritance
    first_name = Field()
    last_name = Field()
    prefix = Field()
    suffix = Field()

cust = BrokenCustomer()
cust.first_name = "Mersenne"

>>>
Traceback ...
TypeError: attribute name must be string, not 'NoneType'

The solution to this problem is to use the __set_name__ special method for descriptors. This method, introduced in Python 3.6, is called on every descriptor instance when its containing class is defined. It receives as parameters the owning class that contains the descriptor instance and the attribute name to which the descriptor instance was assigned. Here I avoid defining a metaclass entirely and move what the Meta.__new__ method from above was doing into the __set_name__ method:

class Field:
    def __init__(self):
        self.column_name = None
        self.internal_name = None

    def __set_name__(self, owner, column_name):
        # Called on class creation for each descriptor
        self.column_name = column_name
        self.internal_name = "_" + column_name

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return getattr(instance, self.internal_name, "")

    def __set__(self, instance, value):
        setattr(instance, self.internal_name, value)

Now, I can get the benefits of the Field descriptor without having to inherit from a specific parent class or having to use a metaclass:

class FixedCustomer:  # No parent class
    first_name = Field()
    last_name = Field()
    prefix = Field()
    suffix = Field()

cust = FixedCustomer()
print(f"Before: {cust.first_name!r} {cust.__dict__}")
cust.first_name = "Mersenne"
print(f"After:  {cust.first_name!r} {cust.__dict__}")

>>>
Before: '' {}
After:  'Mersenne' {'_first_name': 'Mersenne'}

Things to Remember

  • Images A metaclass enables you to modify a class’s attributes before the class is fully defined.

  • Images Descriptors and metaclasses make a powerful combination for declarative behavior and runtime introspection.

  • Images Define __set_name__ on your descriptor classes to allow them to take into account their surrounding class and its property names.

Item 65: Consider Class Body Definition Order to Establish Relationships Between Attributes

The purpose of many classes defined in Python programs is to represent external data that is created and maintained elsewhere. For example, say that I have a CSV (comma-separated values) file containing a list of freight deliveries where each row includes the destination city, method of travel, and shipment weight. Here, I read in this data using the csv built-in module:

import csv


with open("packages.csv") as f:
    for row in csv.reader(f):
        print(row)

>>>
['Sydney', 'truck', '25']
['Melbourne', 'boat', '6']
['Brisbane', 'plane', '12']
['Perth', 'road train', '90']
['Adelaide', 'truck', '17']
...

I can define a new class to store this data and a helper function that creates an object, given a CSV row (see Item 52: “Use @classmethod Polymorphism to Construct Objects Generically” for background):

class Delivery:
    def __init__(self, destination, method, weight):
        self.destination = destination
        self.method = method
        self.weight = weight

    @classmethod
    def from_row(cls, row):
        return cls(row[0], row[1], row[2])

This works as expected when provided a list of values, one for each column:

row1 = ["Sydney", "truck", "25"]
obj1 = Delivery.from_row(row1)
print(obj1.__dict__)

>>>
{'destination': 'Sydney', 'method': 'truck', 'weight': '25'}

If more columns are added to the CSV file or if the columns are reordered, with a small amount of effort, I can make corresponding adjustments to the __init__ and from_row methods to maintain compatibility with the file format. Now imagine that there are many kinds of CSV files that I want to process, each with different numbers of columns and types of cell values. It would be better if I could more efficiently define a new class for each CSV file without much boilerplate.

Here I try to accomplish this by implementing a base class that uses the fields class attribute to map CSV columns (in the order they appear in the file) to object attribute names (see Item 64: “Annotate Class Attributes with __set_name__” for another approach):

class RowMapper:
    fields = ()  # Must be in CSV column order

    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            if key not in type(self).fields:
                raise TypeError(f"Invalid field: {key}")
            setattr(self, key, value)

    @classmethod
    def from_row(cls, row):
        if len(row) != len(cls.fields):
            raise ValueError("Wrong number of fields")
        kwargs = dict(pair for pair in zip(cls.fields, row))
        return cls(**kwargs)

Now I can create a concrete child class for the freight CSV file format:

class DeliveryMapper(RowMapper):
    fields = ("destination", "method", "weight")

obj2 = DeliveryMapper.from_row(row1)
assert obj2.destination == "Sydney"
assert obj2.method == "truck"
assert obj2.weight == "25"

If I had another CSV format to support—say, for moving-van logistics—I could quickly create another child by providing the column names:

class MovingMapper(RowMapper):
    fields = ("source", "destination", "square_feet")

Although this works, it’s not Pythonic. The attributes are specified using strings instead of variable names, which makes the code difficult to read and flummoxes tools (see Item 124: “Consider Static Analysis via typing to Obviate Bugs” and Item 3: “Never Expect Python to Detect Errors at Compile Time”). More importantly, the fields tuple feels redundant with the body of the class: It’s a list of attributes nested inside a list of attributes.

What would be better is if I could put the names of the CSV columns in the body of the class, like this:

class BetterMovingMapper:
    source = ...
    destination = ...
    square_feet = ...

It turns out this is possible using three features of Python together (see Item 51: “Prefer dataclasses for Defining Lightweight Classes” for another approach). The first feature is the __init_subclass__ special class method, which allows you to run code when a subclass is defined (see Item 62: “Validate Subclasses with __init_subclass__”). The second feature is how Python class attributes can be inspected at runtime using the __dict__ instance dictionary of a class object (see Item 54: “Consider Composing Functionality with Mix-in Classes”). The third feature is how Python dictionaries preserve the insertion order of key/value pairs (see Item 25: “Be Cautious when Relying on Dictionary Insertion Ordering”).

Here I create a class that finds child attributes assigned to ... and stores their names in the fields class attribute for the RowMapper parent class to use:

class BetterRowMapper(RowMapper):
    def __init_subclass__(cls):
        fields = []
        for key, value in cls.__dict__.items():
            if value is Ellipsis:
                fields.append(key)
        cls.fields = tuple(fields)

Now I can declare a concrete class like before, but using the class body with ellipses to indicate the columns of the CSV file:

class BetterDeliveryMapper(BetterRowMapper):
    destination = ...
    method = ...
    weight = ...


obj3 = BetterDeliveryMapper.from_row(row1)
assert obj3.destination == "Sydney"
assert obj3.method == "truck"
assert obj3.weight == "25"

If the order of the columns in the CSV file changes, I can just change the attribute definition order to compensate. For example, here I move the destination field to the end:

class ReorderedDeliveryMapper(BetterRowMapper):
    method = ...
    weight = ...
    destination = ...  # Moved

row4 = ["road train", "90", "Perth"]  # Different order
obj4 = ReorderedDeliveryMapper.from_row(row4)
print(obj4.__dict__)

>>>
{'method': 'road train', 'weight': '90', 'destination':
➥'Perth'}

In a real program, I would use a descriptor class instead of ellipses when declaring the fields to enable use cases like attribute validation and data conversion (see Item 60: “Use Descriptors for Reusable @property Methods” for background). For example, say that I want the weight column to be parsed into a floating point number instead of remaining as a string.

Here I implement a descriptor class that intercepts attribute accesses and converts assigned values as needed:

class Field:
    def __init__(self):
        self.internal_name = None

    def __set_name__(self, owner, column_name):
        self.internal_name = "_" + column_name

    def __get__(self, instance, instance_type):
        if instance is None:
            return self
        return getattr(instance, self.internal_name, "")

    def __set__(self, instance, value):
        adjusted_value = self.convert(value)
        setattr(instance, self.internal_name, adjusted_value)

    def convert(self, value):
        raise NotImplementedError

I can implement two concrete Field subclasses—one for strings and another for floating point numbers:

class StringField(Field):
    def convert(self, value):
        if not isinstance(value, str):
            raise ValueError
        return value

class FloatField(Field):
    def convert(self, value):
        return float(value)

Another new base class for representing CSV files can look for Field instances instead of Ellipsis instances to discover the ordered CSV columns and populate the fields class attribute accordingly:

class DescriptorRowMapper(RowMapper):
    def __init_subclass__(cls):
        fields = []
        for key, value in cls.__dict__.items():
            if isinstance(value, Field):  # Changed
                fields.append(key)
        cls.fields = tuple(fields)

Now I can declare a concrete subclass for my specific CSV format, and the weight field will be converted to a floating point, as expected:

class ConvertingDeliveryMapper(DescriptorRowMapper):
    destination = StringField()
    method = StringField()
    weight = FloatField()

obj5 = ConvertingDeliveryMapper.from_row(row1)
assert obj5.destination == "Sydney"
assert obj5.method == "truck"
assert obj5.weight == 25.0  # Number, not string

Inspecting class attributes can also be used to discover methods. In a completely different example from the CSV use-case above, imagine that I want to create a class that describes a sequential workflow of methods that need to run in definition order, like this:

class HypotheticalWorkflow:
    def start_engine(self):
        ...

    def release_brake(self):
        ...

    def run(self):
        # Runs `start_engine` then `release_brake`
        ...

I can make this work by first creating a simple function decorator (see Item 38: “Define Function Decorators with functools.wraps”) that indicates which methods should be considered for the workflow:

def step(func):
    func._is_step = True
    return func

A new base class can then look for callable class attributes (see Item 48: “Accept Functions Instead of Classes for Simple Interfaces” for background) with the _is_step attribute present to discover which methods should be included in the workflow and the order in which they should be called:

class Workflow:
    def __init_subclass__(cls):
        steps = []
        for key, value in cls.__dict__.items():
            if callable(value) and hasattr(value, "_is_step"):
                steps.append(key)
        cls.steps = tuple(steps)

The run method only needs to iterate through the list of steps and call the methods in the saved sequence. No other boilerplate is required:

    def run(self):
        for step_name in type(self).steps:
            func = getattr(self, step_name)
            func()

Putting it together, here I define a simple workflow for starting a car, which includes a helper method that should be ignored by the base class:

class MyWorkflow(Workflow):
    @step
    def start_engine(self):
        print("Engine is on!")
        ...

    def my_helper_function(self):
        raise RuntimeError("Should not be called")

    @step
    def release_brake(self):
        print("Brake is off!")
        ...

    ...

The workflow runs successfully and doesn’t call the bad method:

workflow = MyWorkflow()
workflow.run()

>>>
Engine is on!
Brake is off!
...

Things to Remember

  • Images You can examine the attributes and methods defined in a class body at runtime by inspecting the corresponding class object’s __dict__ instance dictionary.

  • Images The definition order of class bodies is preserved in a class object’s __dict__, enabling code to consider the relative positions of a class’s attributes and methods. This is especially useful for use cases like mapping object fields to CSV column indexes.

  • Images Descriptors and method decorators can be used to further enhance the power of using the definition order of class bodies to control program behavior.

Item 66: Prefer Class Decorators over Metaclasses for Composable Class Extensions

Although metaclasses allow you to customize class creation in multiple ways (see Item 62: “Validate Subclasses with __init_subclass__” and Item 63: “Register Class Existence with __init_subclass__”), they still fall short of handling every situation that may arise.

For example, say that I want to decorate all of the methods of a class with a helper function that prints arguments, return values, and any exceptions that were raised. Here, I define such a debugging decorator (see Item 38: “Define Function Decorators with functools.wraps” for background):

from functools import wraps

def trace_func(func):
    if hasattr(func, "tracing"):  # Only decorate once
        return func

    @wraps(func)
    def wrapper(*args, **kwargs):
        args_repr = repr(args)
        kwargs_repr = repr(kwargs)
        result = None
        try:
            result = func(*args, **kwargs)
            return result
        except Exception as e:
            result = e
            raise
        finally:
            print(
                f"{func.__name__}"
                f"({args_repr}, {kwargs_repr}) -> "
                f"{result!r}"
            )

    wrapper.tracing = True
    return wrapper

I can apply this decorator to various special methods in my new dict subclass (see Item 57: “Inherit from collections.abc Classes for Custom Container Types”):

class TraceDict(dict):
    @trace_func
    def __init__(self, *args, **kwargs):
        return super().__init__(*args, **kwargs)

    @trace_func
    def __setitem__(self, *args, **kwargs):
        return super().__setitem__(*args, **kwargs)

    @trace_func
    def __getitem__(self, *args, **kwargs):
        return super().__getitem__(*args, **kwargs)

    ...

And I can verify that these methods are decorated by interacting with an instance of the class:

trace_dict = TraceDict([("hi", 1)])
trace_dict["there"] = 2
trace_dict["hi"]
try:
    trace_dict["does not exist"]
except KeyError:
    pass  # Expected

>>>
__init__(({}, [('hi', 1)]), {}) -> None
__setitem__(({'hi': 1}, 'there', 2), {}) -> None
__getitem__(({'hi': 1, 'there': 2}, 'hi'), {}) -> 1
__getitem__(({'hi': 1, 'there': 2}, 'does not exist'),
➥{}) -> KeyError('does not exist')

The problem with this code is that I had to redefine all the methods that I wanted to decorate with @trace_func. This is redundant boilerplate that’s hard to read and error prone. Further, if a new method is later added to the dict superclass, it won’t be decorated unless I also define it in TraceDict.

One way to solve this problem is to use a metaclass to automatically decorate all methods of a class. Here I implement this behavior by wrapping each function or method in the new type with the trace_func decorator:

import types

TRACE_TYPES = (
    types.MethodType,
    types.FunctionType,
    types.BuiltinFunctionType,
    types.BuiltinMethodType,
    types.MethodDescriptorType,
    types.ClassMethodDescriptorType,
    types.WrapperDescriptorType,
)

IGNORE_METHODS = (
    "__repr__",
    "__str__",
)

class TraceMeta(type):
    def __new__(meta, name, bases, class_dict):
        klass = super().__new__(meta, name, bases, class_dict)

        for key in dir(klass):
            if key in IGNORE_METHODS:
                continue

            value = getattr(klass, key)
            if not isinstance(value, TRACE_TYPES):
                continue

            wrapped = trace_func(value)
            setattr(klass, key, wrapped)

        return klass

Now I can declare my dict subclass by using the TraceMeta metaclass and verify that it works as expected:

class TraceDict(dict, metaclass=TraceMeta):
    pass

trace_dict = TraceDict([("hi", 1)])
trace_dict["there"] = 2
trace_dict["hi"]
try:
    trace_dict["does not exist"]
except KeyError:
    pass  # Expected

>>>
__new__((<class '__main__.TraceDict'>, [('hi', 1)]), {}) -> {}
__init__(({}, [('hi', 1)]), {}) -> None
__setitem__(({'hi': 1}, 'there', 2), {}) -> None
__getitem__(({'hi': 1, 'there': 2}, 'hi'), {}) -> 1
__getitem__(({'hi': 1, 'there': 2}, 'does not exist'),
➥{}) -> KeyError('does not exist')

This works, and it even prints out a call to __new__ that was missing from my earlier implementation. What happens if I try to use TraceMeta when a superclass already has specified a metaclass?

class OtherMeta(type):
    pass

class SimpleDict(dict, metaclass=OtherMeta):
    pass

class ChildTraceDict(SimpleDict, metaclass=TraceMeta):
    pass

>>>
Traceback ...
TypeError: metaclass conflict: the metaclass of a derived class
➥must be a (non-strict) subclass of the metaclasses of all its
➥bases

This fails because TraceMeta does not inherit from OtherMeta. In theory, I can use metaclass inheritance to solve this problem by having OtherMeta inherit from TraceMeta:

class TraceMeta(type):
    ...

class OtherMeta(TraceMeta):
    pass

class SimpleDict(dict, metaclass=OtherMeta):
    pass

class ChildTraceDict(SimpleDict, metaclass=TraceMeta):
    pass

trace_dict = ChildTraceDict([("hi", 1)])
trace_dict["there"] = 2
trace_dict["hi"]
try:
    trace_dict["does not exist"]
except KeyError:
    pass  # Expected

>>>
__init_subclass__((), {}) -> None
__new__((<class '__main__.ChildTraceDict'>, [('hi', 1)]),
➥{}) -> {}
__init__(({}, [('hi', 1)]), {}) -> None
__setitem__(({'hi': 1}, 'there', 2), {}) -> None
__getitem__(({'hi': 1, 'there': 2}, 'hi'), {}) -> 1
__getitem__(({'hi': 1, 'there': 2}, 'does not exist'),
➥{}) -> KeyError('does not exist')

But this won’t work if the metaclass is from a library that I can’t modify or if I want to use multiple utility metaclasses like TraceMeta at the same time. The metaclass approach puts too many constraints on the class that’s being modified.

To solve this problem, Python supports class decorators. Class decorators work just like function decorators: They’re applied with the @ symbol prefixing a function before the class declaration. The function is expected to modify or re-create the class accordingly and then return it, like this:

def my_class_decorator(klass):
    klass.extra_param = "hello"
    return klass

@my_class_decorator
class MyClass:
    pass

print(MyClass)
print(MyClass.extra_param)

>>>
<class '__main__.MyClass'>
hello

I can implement a class decorator to apply the trace_func function decorator to all methods of a class by moving the core of the TraceMeta.__new__ method above into a stand-alone function. This implementation is much shorter than the metaclass version:

def trace(klass):
    for key in dir(klass):
        if key in IGNORE_METHODS:
            continue

        value = getattr(klass, key)
        if not isinstance(value, TRACE_TYPES):
            continue

        wrapped = trace_func(value)
        setattr(klass, key, wrapped)

    return klass

I can apply this decorator to my dict subclass to get the same behavior that I get by using the metaclass approach above:

@trace
class DecoratedTraceDict(dict):
    pass

trace_dict = DecoratedTraceDict([("hi", 1)])
trace_dict["there"] = 2
trace_dict["hi"]
try:
    trace_dict["does not exist"]
except KeyError:
    pass  # Expected

>>>
__new__((<class '__main__.DecoratedTraceDict'>, [('hi', 1)]),
➥{}) -> {}
__init__(({}, [('hi', 1)]), {}) -> None
__setitem__(({'hi': 1}, 'there', 2), {}) -> None
__getitem__(({'hi': 1, 'there': 2}, 'hi'), {}) -> 1
__getitem__(({'hi': 1, 'there': 2}, 'does not exist'),
➥{}) -> KeyError('does not exist')

Class decorators also work when the class being decorated already has a metaclass:

class OtherMeta(type):
    pass

@trace
class HasMetaTraceDict(dict, metaclass=OtherMeta):
    pass

trace_dict = HasMetaTraceDict([("hi", 1)])
trace_dict["there"] = 2
trace_dict["hi"]
try:
    trace_dict["does not exist"]
except KeyError:
    pass  # Expected

>>>
__new__((<class '__main__.HasMetaTraceDict'>, [('hi', 1)]),
➥{}) -> {}
__init__(({}, [('hi', 1)]), {}) -> None
__setitem__(({'hi': 1}, 'there', 2), {}) -> None
__getitem__(({'hi': 1, 'there': 2}, 'hi'), {}) -> 1
__getitem__(({'hi': 1, 'there': 2}, 'does not exist'),
➥{}) -> KeyError('does not exist')

When you’re looking for composable ways to extend classes, class decorators are the best tool for the job. (See Item 104: “Know How to Use heapq for Priority Queues” for an example class decorator called functools.total_ordering.)

Things to Remember

  • Images A class decorator is a simple function that receives a class instance as a parameter and returns either a new class or a modified version of the original class.

  • Images Class decorators are useful when you want to modify every method or attribute of a class with minimal boilerplate.

  • Images Metaclasses can’t be composed together easily, although many class decorators can be used to extend the same class without conflicts.