Chapter 6. Classes and Objects: Beyond the Basics

This chapter assumes you are familiar with the basics of object-oriented programming (OOP) in Python: creating classes, defining methods, and simple inheritance. You will build on that knowledge in this chapter.

As with any object-oriented language, it’s useful to learn about design patterns—reusable solutions to common problems involving classes and objects. A lot has been written about design patterns. And while much of it applies to Python, it tends to apply differently.

That is because many design-pattern books and articles are written for languages like Java, C++, and C#. But as a language, Python is different. Its dynamic typing, first-class functions, and other additions all mean the “standard” design patterns just work differently.

So let’s learn what Pythonic OOP is really about.

Properties

Python objects have attributes. “Attribute” is a general term meaning “whatever is to the right of the dot” in an expression like x.y or z.f(). Member variables and methods are two kinds of attributes. But Python has another kind of attribute called properties.

A property is a hybrid: a cross between a method and a member variable. The idea is to create an attribute that acts like a member variable from the outside, but reading or writing to this attribute triggers method calls internally.

You’ll set this up with a special decorator called @property. A simple example:

class Person:
    def __init__(self, firstname, lastname):
        self.firstname = firstname
        self.lastname = lastname

    @property
    def fullname(self):
        return self.firstname + " " + self.lastname

By instantiating this, you can access fullname like it is a member variable:

>>> joe = Person("Joe", "Smith")
>>> joe.fullname
'Joe Smith'

Look carefully for the actual member variables here. There are two, firstname and lastname, set in the constructor. This class also has a method called fullname. But after creating the instance, we reference joe.fullname as a member variable; we don’t call joe.fullname() as a method. However, when you read the value of joe.fullname, the fullname() method is invoked.

This is all due to the @property decorator. When applied to a method, this decorator makes it inaccessible as a method. You must access it like a member variable. In fact, if you try to call it as a method, you get an error:

>>> joe.fullname()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

As defined above, fullname is read-only. We cannot modify it:

>>> joe.fullname = "Joseph Smith"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

In other words, Python properties are read-only by default. Another way of saying this is that @property automatically defines a getter, but not a setter.¹ If you want fullname to be writable, here is how you define its setter:

class Person:
    def __init__(self, firstname, lastname):
        self.firstname = firstname
        self.lastname = lastname

    @property
    def fullname(self):
        return self.firstname + " " + self.lastname

    @fullname.setter
    def fullname(self, value):
        self.firstname, self.lastname = value.split(" ", 1)

This lets you assign to joe.fullname:

>>> joe = Person("Joe", "Smith")
>>> joe.firstname
'Joe'
>>> joe.lastname
'Smith'
>>> joe.fullname = "Joseph Smith"
>>> joe.firstname
'Joseph'
>>> joe.lastname
'Smith'

So we have two methods named fullname(). The first one, decorated with @property, is dispatched (invoked) when you read the value of joe.fullname. The second one, decorated with @fullname.setter, is dispatched when you assign to joe.fullname. Python picks which to run, depending on whether you are getting or setting.

The first time I saw this, I had many questions. “Wait, why is this fullname method defined twice? And why is the second decorator named @fullname, and why does it have a setter attribute? How on earth does this even work?!”

The code is actually designed to work this way. The @property line, followed by def fullname, must come first. Those two lines create the property to begin with, and also create the getter. By “create the property”, I mean that an object named fullname exists in the namespace of the class, and it has an attribute named fullname.setter. This fullname.setter is a decorator that is applied to the next def fullname, christening it as the setter for the fullname property.

It’s okay to not fully understand how this works. A full explanation relies on understanding not only decorators, but also Python’s descriptor protocol, which is beyond the scope of this chapter. Fortunately, you don’t have to understand how it works in order to use it.

What you see here with the Person class is one way properties are useful: they are magic attributes which act like member variables, but their value is derived from other member variables. This denormalizes the object’s data, and lets you access the attribute like it is a member variable. You’ll see a situation where that is extremely useful later.

Property Patterns

Properties enable a useful collection of design patterns. One—as mentioned—is creating read-only member variables. In the first version of Person, the fullname “member variable” is a dynamic attribute; it does not exist on its own, but instead calculates its value at runtime.

It’s also common to have the property backed by a single, non-public member variable. That pattern looks like this:

class Coupon:
    def __init__(self, amount):
        self._amount = amount
    @property
    def amount(self):
        return self._amount

This allows the class itself to modify the value internally, but prevent outside code from doing so:

>>> coupon = Coupon(1.25)
>>> coupon.amount
1.25
>>> coupon.amount = 1.50
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

In Python, prefixing a member variable or method by a single underscore signals it is protected; it should only be accessed internally, inside methods of that class or its subclasses.² This property pattern says, “You can read the value of this attribute, but you cannot change it”.

Validation

There is another pattern between “regular member variable” and “read-only”: the value can be changed, but you must validate it first. Suppose you and I are developing some software that manages live events. We write a Ticket class, representing tickets sold to attendees:

class Ticket:
    def __init__(self, price):
        self.price = price
    # And some other methods...

One day, we find a bug in our web UI that lets shifty customers adjust the price to a negative value. So we end up paying them to go to the concert. Not good!

The first priority is, of course, to fix the bug in the UI. But how do we modify our code to prevent this from ever happening again? Before reading further, look at the Ticket class and ponder—how could you use properties to make this kind of bug impossible in the future?

The answer: verify the new price is non-negative in the setter:

# Version 1...
class Ticket:
    def __init__(self, price):
        self._price = price
    @property
    def price(self):
        return self._price
    @price.setter
    def price(self, new_price):
        # Only allow non-negative prices.
        if new_price < 0:
            raise ValueError("Nice try")
        self._price = new_price

This lets the price be adjusted, but only to sensible values:

>>> t = Ticket(42)
>>> t.price = 24 # This is allowed.
>>> print(t.price)
24
>>> t.price = -1 # This is NOT.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in price
ValueError: Nice try

However, there’s a defect in this new Ticket class. Can you spot it? (And how to fix it?)

The problem is that while we can’t change the price to a negative value, this first version lets us create a ticket with a negative price to begin with. That’s because we wrote self._price = price in the constructor. The solution is to use the setter in the constructor instead:

# Final version, with modified constructor. (Constructor
# is different; code for getter & setter is the same.)
class Ticket:
    def __init__(self, price):
        # instead of "self._price = price"
        self.price = price
    @property
    def price(self):
        return self._price
    @price.setter
    def price(self, new_price):
        # Only allow positive prices.
        if new_price < 0:
            raise ValueError("Nice try")
        self._price = new_price

Yes, you can reference self.price in methods of the class. When we write self.price = price, Python translates this to calling the price setter—that is, the second price() method. This final version of Ticket centralizes all reads and writes of self._price in the property—a useful encapsulation technique. The idea is to centralize any special behavior for that member variable in the getter and setter, even for the class’s internal code. In practice, sometimes other methods will need to violate this rule; if so, you simply reference self._price and move on. But as much as you can, only use the protected (underscore) member variable in the getter and setter, and that will naturally tend to boost the quality of your code.

Properties and Refactoring

Imagine writing a simple money class:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents
    # And some other methods...

Suppose you put this class in a library many developers use: people on your current team, perhaps developers on different teams. Maybe you release it as open source, so developers around the world use and rely on this class.

One day you realize that many of Money’s methods—which do calculations on the money amount—could be simpler and more straightforward if they operated on the total number of cents, rather than dollars and cents separately. So you refactor the internal state:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents

This creates a major maintainability problem. Do you spot it?

Here’s the trouble: your original Money has member variables named dollars and cents. And since many developers are using these variables, changing to total_cents breaks all their code!

money = Money(27, 12)
message = "I have {:d} dollars and {:d} cents."
# This line breaks, because there's no longer
# dollars or cents attributes.
print(message.format(money.dollars, money.cents))

If no one but you uses this class, there’s no real problem—you can just refactor your own code. But otherwise, coordinating this change with everyone’s different codebases is a nightmare. It becomes a barrier to improving your own code.

So, what do you do? Can you think of a way to handle this situation?

You get out of this mess using properties. You want two things to happen:

The class uses total_cents internally.
All code using dollars and cents continues to work, without modification.

You’ll do this by replacing dollars and cents with total_cents internally, but also creating getters and setters for these attributes. Take a look:

class Money:
    def __init__(self, dollars, cents):
        self.total_cents = dollars * 100 + cents
    # Getter and setter for dollars...
    @property
    def dollars(self):
        # // is integer division
        return self.total_cents // 100
    @dollars.setter
    def dollars(self, new_dollars):
        self.total_cents = 100 * new_dollars + self.cents
    # And for cents.
    @property
    def cents(self):
        return self.total_cents % 100
    @cents.setter
    def cents(self, new_cents):
        self.total_cents = 100 * self.dollars + new_cents

Now, I can get and set dollars and cents all day:

>>> money = Money(27, 12)
>>> money.total_cents
2712
>>> money.cents
12
>>> money.dollars = 35
>>> money.total_cents
3512

Python’s way of doing properties brings many benefits. In languages like Java, the following story can play out:

A newbie developer starts writing Java classes. They want to expose some state, so they create public member variables.
They use this class everywhere. Other developers use it too.
One day, the developer decides to change the name or type of that member variable, or even delete it entirely (like what we did with Money).
But that would break everyone’s code. So they can’t.

This is not a problem for Java developers in practice, because they quickly learn to make all their variables private by default—proactively creating getters and setters for every publicly exposed chunk of data. They realize this boilerplate is far less painful than the alternative, because if everyone must use the public getters and setters to begin with, you always have the freedom to make internal changes later.

This works well enough. But it is distracting, and just enough trouble that there’s always the temptation to make that member variable public, and be done with it.

In Python, we have the best of both worlds. You can freely create member variables—which are public by default—and refactor them as properties if and when you ever need to. No one using your code even has to know.

The Factory Patterns

There are several design patterns with the word “factory” in their names. Their unifying idea is providing a handy, simplified way to create useful, potentially complex objects. The two most important forms are:

Where the object’s type is fixed, but we want to have several different ways to create it. This is called the Simple Factory Pattern.
Where the factory dynamically chooses one of several different types. This is called the Factory Method Pattern.

Let’s look at how you do these in Python.

Alternative Constructors: The Simple Factory

Imagine a simple Money class, suitable for currencies which have dollars and cents:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents

We looked at this in the previous section, changing what member variables it has. But let’s roll back, and focus instead on the constructor’s interface. This constructor is convenient when we have the dollars and cents as separate integer variables. But there are many other ways to specify an amount of money. Perhaps you’re modeling a giant jar of pennies:

# Emptying the penny jar...
total_pennies = 3274
dollars = total_pennies // 100
cents = total_pennies % 100
total_cash = Money(dollars, cents)

Suppose your code repeatedly splits pennies into dollars and cents, over and over. And you’re tired of re-re-typing this calculation, plus there is a chance you could make a mistake eventually. You could change the constructor, but that means refactoring all Money-creating code, and perhaps a lot of code fits the current constructor better anyway. Some languages let you define several constructors, but Python makes you pick one.

In this case, you can usefully create a factory function. A factory function takes the data you have, uses that to calculate what the class constructor needs, then returns the instance. For example:

# Factory function taking a single argument, returning
# an appropriate Money instance.
def money_from_pennies(total_cents):
    dollars = total_cents // 100
    cents = total_cents % 100
    return Money(dollars, cents)

Imagine that, in the same codebase, you also need to parse strings like "$140.75". Here’s another factory function for that:

# Another factory, creating Money from a string amount.
import re
def money_from_string(amount):
    match = re.search(
        r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$', amount)
    if match is None:
        raise ValueError(f"Invalid amount: {amount}")
    dollars = int(match.group('dollars'))
    cents = int(match.group('cents'))
    return Money(dollars, cents)

These are effectively alternate constructors: callables we can use with different arguments, which are used to create the final instance. But this approach has problems. First, it’s awkward to have them as separate functions, defined outside of the class. But more importantly: what happens if you subclass Money? Suddenly money_from_string() and money_from_pennies() are worthless, because they are hard-coded to use Money.

Python solves these problems with a flexible and powerful feature: the classmethod decorator. Use it like this:

class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents
    @classmethod
    def from_pennies(cls, total_cents):
        dollars = total_cents // 100
        cents = total_cents % 100
        return cls(dollars, cents)

The function money_from_pennies() is now a method of the Money class, called from_pennies(). But it has a new argument: cls. When applied to a method definition, classmethod modifies how that method is invoked and interpreted. The first argument is not self, which would be an instance of the class. The first argument is now the class itself. In the method body, self isn’t mentioned at all; instead, cls is a variable holding the current class object—Money in this case. So the last line is creating a new instance of Money:

>>> piggie_bank_cash = Money.from_pennies(3217)
>>> type(piggie_bank_cash)
<class '__main__.Money'>
>>> piggie_bank_cash.dollars
32
>>> piggie_bank_cash.cents
17

Notice from_pennies() is invoked on the class itself, not an instance of the class. This is already nicer code organization. But now it works with inheritance:

>>> class TipMoney(Money):
...     pass
...
>>> tip = TipMoney.from_pennies(475)
>>> type(tip)
<class '__main__.TipMoney'>

This is the real benefit of class methods. You define it once on the base class, and all subclasses can leverage it, substituting their own type for cls. This makes class methods perfect for the simple factory in Python. The final line returns an instance of cls, using its regular constructor. And cls refers to whatever the current class is: Money, TipMoney, or some other subclass.

For the record, here’s how to translate money_from_string():

class Money:
    # ...
    def from_string(cls, amount):
	match = re.search(
	    r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$', amount)
	if match is None:
	    raise ValueError(f"Invalid amount: {amount}")
	dollars = int(match.group('dollars'))
	cents = int(match.group('cents'))
	return cls(dollars, cents)

Class methods are a superior way to implement factories in Python. If we subclass Money, that subclass will have from_pennies() and from_string() methods that create objects of that subclass, without any extra work on our part. And if we change the name of the Money class, we only have to change it in one place, not three.

This form of the factory pattern is called Simple Factory, a name I don’t love. I prefer to call it Alternate Constructor. Especially in the context of Python, it describes well what @classmethod is most useful for. And it suggests a general principle for designing your classes. Look at this complete code of the Money class, and I’ll explain:

import re
class Money:
    def __init__(self, dollars, cents):
        self.dollars = dollars
        self.cents = cents
    @classmethod
    def from_pennies(cls, total_cents):
        dollars = total_cents // 100
        cents = total_cents % 100
        return cls(dollars, cents)
    @classmethod
    def from_string(cls, amount):
        match = re.search(
            r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$', amount)
        if match is None:
            raise ValueError(f"Invalid amount: {amount}")
        dollars = int(match.group('dollars'))
        cents = int(match.group('cents'))
        return cls(dollars, cents)

You can think of this class as having several constructors. As a general rule, you’ll want to make __init__() the most generic one, and implement the others as class methods. Sometimes, that means one of the class methods will be used more often than __init__().

When using a new class, most developers’ intuition will be to reach for the default constructor first, without thinking to check the provided class methods—if they even know about that feature of Python in the first place. You may need to educate your teammates. (Hint: Good examples in the class’s code docs go a long way.)

Dynamic Type: The Factory Method Pattern

This next factory pattern, called Factory Method, is quite different. The idea is that the factory will create an object, but will choose its type from one of several possibilities, dynamically deciding at runtime based on some criteria. It’s typically used when you have one base class, and are creating an object that can be one of several different derived classes.

Let’s see an example. Imagine you are implementing an image processing library, creating classes to read the image from storage. So you create a base ImageReader class, and several derived types:

import abc
class ImageReader(metaclass=abc.ABCMeta):
    def __init__(self, path):
        self.path = path
    @abc.abstractmethod
    def read(self):
        pass # Subclass must implement.
    def __repr__(self):
        return f"{self.__class__.__name__}({self.path})"

class GIFReader(ImageReader):
    def read(self):
        # Read a GIF

class JPEGReader(ImageReader):
    def read(self):
        # Read a JPEG

class PNGReader(ImageReader):
    def read(self):
        # Read a PNG

The ImageReader class is marked abstract, requiring subclasses to implement the read() method. So far, so good.

When reading an image file, if its extension is .gif, I want to use GIFReader. And if it is a JPEG image, I want to use JPEGReader, and so on. The logic is:

Analyze the file path name to get the extension.
Choose the correct reader class based on that.
Create the appropriate reader object.

This process is a prime candidate for automation. Let’s define a little helper function:

def extension_of(path):
    # returns "png", "gif", "jpg", etc.
    position_of_last_dot = path.rfind('.')
    return path[position_of_last_dot+1:]

With these pieces, we can now define the factory:

def get_image_reader(path):
    image_type = extension_of(path)
    if image_type == 'gif':
        reader_class = GIFReader
    elif image_type == 'jpg':
        reader_class = JPEGReader
    elif image_type == 'png':
        reader_class = PNGReader
    else:
        raise ValueError(f"Unknown extension: {image_type}")
    return reader_class(path)

Classes in Python can be put in variables, just like any other object. We take full advantage here, by storing the appropriate ImageReader subclass in reader_class. Once we decide on the proper value, the last line creates and returns the reader object.

This correctly working code is already more concise, readable, and maintainable than what some languages force you to go through. But in Python, we can do even better. We can use the built-in dictionary type to make it more readable, and easier to update and maintain over time:

READERS = {
    'gif' : GIFReader,
    'jpg' : JPEGReader,
    'png' : PNGReader,
    }
def get_image_reader(path):
    reader_class = READERS[extension_of(path)]
    return reader_class(path)

Here we have a global variable mapping filename extensions to ImageReader subclasses. This lets us readably implement get_image_reader() in two lines. Finding the correct class is a simple dictionary lookup, and then we instantiate and return the instance. If we need to support new image formats in the future, we can simply add a line in the READERS definition. (And, of course, define its reader class.)

What if we encounter an extension not in the mapping, like .tiff? As written above, the code will raise a KeyError. That may be what we want. Or perhaps we want to catch that exception and re-raise a different exception. ValueError is a good choice; this is what the previous version of get_image_reader() raised.

Alternatively, we may want to fall back on some default. Let’s create a new reader class, meant as an all-purpose fallback:

class RawByteReader(ImageReader):
    def read(self):
        # Read raw bytes

Then you can write the factory like:

def get_image_reader(path):
    try:
        reader_class = READERS[extension_of(path)]
    except KeyError:
        reader_class = RawByteReader
    return reader_class(path)

Or more briefly:

def get_image_reader(path):
    reader_class = READERS.get(extension_of(path), RawByteReader)
    return reader_class(path)

This design pattern is commonly called the Factory Method pattern, which wins my award for Worst Design Pattern Name in History. That name (which appears to originate from a Java implementation detail) fails to tell you anything about what this pattern is actually for. I myself call it the Dynamic Type pattern, which I feel is much more descriptive and useful.

So far, we have looked at patterns that are mostly confined to a single class. But there are richer patterns involving multiple codesigned classes, interacting with each other. Let’s look at one.

The Observer Pattern

The Observer pattern provides a “one-to-many” relationship. That’s a vague description, so let’s make it more specific.

In the Observer pattern, there’s one root object, called the observable. This object knows how to detect some kind of event of interest. It can literally be anything: a customer makes a new purchase; someone subscribes to an email list; or maybe it monitors a fleet of cloud instances, detecting when a machine’s disk usage exceeds 75%. You use this pattern when the code to reliably detect the event of interest is at least slightly complicated; that detection code is encapsulated inside the observable.

In this pattern, you also have other objects, called observers, which need to know when that event occurs, so they can take some action in response. You don’t want to reimplement the robust detection algorithm in each, of course. Instead, these observers register themselves with the observable. The observable then notifies each observer—by calling a method on that observer—for each event. This separation of concerns is the heart of the observer pattern.

I must tell you: I don’t like the names of things in this pattern. The words “observable” and “observer” are a bit obscure, and sound confusingly similar—especially if your native tongue is not English. There is another terminology, however, which many developers find easier: pub-sub.

In this terminology, instead of an “observable,” you create a publisher object, which watches for events. One or more subscribers (instead of “observers”) ask that publisher to notify them when the event happens. I’ve found the pattern is easier to reason about when looked at in this way, so that is the terminology I’m going to use.³

Let’s make this concrete, with code.

The Simple Observer

We’ll start with the basic Observer pattern, as it’s often documented in design pattern books—except we’ll translate it to Python. In this simple form, each subscriber must implement a method called update(). Here’s an example:

class Subscriber:
    def __init__(self, name):
        self.name = name
    def update(self, message):
        print(f"{self.name} got message: {message}")

update() takes a string. It’s okay to define an update() method taking other arguments, or even calling it something other than update(); the publisher and subscriber just need to agree on the protocol. But we’ll use a single string argument.

Now, when a publisher detects an event, it notifies the subscriber by calling its update() method. Here’s what a basic Publisher class looks like:

class Publisher:
    def __init__(self):
        self.subscribers = set()
    def register(self, who):
        self.subscribers.add(who)
    def unregister(self, who):
        self.subscribers.discard(who)
    def dispatch(self, message):
        for subscriber in self.subscribers:
            subscriber.update(message)
    # Plus other methods, for detecting the event.

Let’s step through:

A publisher needs to keep track of its subscribers, right? We’ll store them in a set object, named self.subscribers, created in the constructor.
A subscriber is added with register(). Its argument who is an instance of Subscriber. Who calls register()? It could be anyone. The subscriber can register itself, or some external code can register a subscriber with a specific publisher.
unregister() is there in case a subscriber no longer needs to be notified of the events.
When the event of interest occurs, the publisher notifies its subscribers by calling its dispatch() method. Usually this is invoked by the publisher itself, in some other method of the class (not shown) that implements the event-detection logic. It simply cycles through the subscribers, calling update() on each.

Using these two classes in code is straightforward enough:

# Create a publisher and some subscribers.
pub = Publisher()
bob = Subscriber('Bob')
alice = Subscriber('Alice')
john = Subscriber('John')

# Register the subscribers, so they get notified.
pub.register(bob)
pub.register(alice)
pub.register(john)

Now, the publisher can dispatch messages:

# Send a message...
pub.dispatch("It's lunchtime!")
# John unsubscribes...
pub.unregister(john)
# ... and a new message is sent.
pub.dispatch("Time for dinner")

Here’s the output from running the above:

John got message "It's lunchtime!"
Bob got message "It's lunchtime!"
Alice got message "It's lunchtime!"
Bob got message "Time for dinner"
Alice got message "Time for dinner"

This is the basic Observer pattern, and pretty close to how you’d implement the idea in languages like Java, C#, and C++. But Python’s feature set differs from those languages. That means we can do different things.

A Pythonic Refinement

Python’s functions are first-class objects. This means you can store a function in a variable—not the value returned when you call a function, but the function itself—as well as pass it as an argument to other functions and methods. Some other languages support this too (or something like it, such as function pointers), but Python’s strong support gives us a convenient opportunity for this design pattern.

The standard Observer pattern requires the publisher to hard-code a certain method (usually named update()) that the subscriber must implement. But maybe you need to register a subscriber which doesn’t have that method. What then? If it’s your own class, you can just add it. If you are importing the subscriber class from another library (which you can’t or don’t want to modify), perhaps you can add the method by subclassing it.

Sometimes you can’t do any of those things—or you could, but it’s a lot of trouble, and you want to avoid it. What then?

Let’s extend the traditional observer pattern, and make register() more flexible. Suppose you have these subscribers:

# This subscriber uses the standard "update"
class SubscriberOne:
    def __init__(self, name):
        self.name = name
    def update(self, message):
        print(f'{self.name} got message "{message}"')
# This one wants to use "receive"
class SubscriberTwo:
    def __init__(self, name):
        self.name = name
    def receive(self, message):
        print(f'{self.name} got message "{message}"')

SubscriberOne is the same subscriber class we saw before. SubscriberTwo is almost the same: instead of update(), it has a method named receive(). Let’s modify Publisher so it can work with objects of either subscriber type:

class Publisher:
    def __init__(self):
        self.subscribers = dict()
    def register(self, who, callback=None):
        if callback is None:
            callback = who.update
        self.subscribers[who] = callback
    def dispatch(self, message):
        for callback in self.subscribers.values():
            callback(message)
    def unregister(self, who):
        del self.subscribers[who]

There’s a lot going on here, so let’s break it down. Look first at the constructor: it creates a dict instead of a set. You’ll see why in a moment.

Now focus on register():

    def register(self, who, callback=None):
        if callback is None:
            callback = who.update
        self.subscribers[who] = callback

It can be called with one or two arguments. With one argument, who is a subscriber of some sort, and callback defaults to None. In that case, the method body sets callback to who.update. Notice the lack of parentheses; who.update is a method object. It’s just like a function object, except it happens to be tied to an instance. And just like a function object, you can store it in a variable, pass it as an argument to another function, and so on (refer to Chapter 3 for more details). So we’re storing it in a variable called callback.

What if register() is called with two arguments? Here’s how that might look:

pub = Publisher()
alice = SubscriberTwo('Alice')
pub.register(alice, alice.receive)

alice.receive is another method object; this object is assigned to callback. Regardless of whether register() is called with one argument or two, the last line inserts callback into the dictionary:

        self.subscribers[who] = callback

Take a moment to appreciate the remarkable flexibility of Python dictionaries. Here, you are using an arbitrary instance of either SubscriberOne or SubscriberTwo as a key. These two classes are unrelated by inheritance, so from Python’s viewpoint they are completely distinct types. And for that key, you insert a method object as its value. Python does this seamlessly, without complaint! Many languages would make you jump through hoops to accomplish this.

Now it is clear why self.subscribers is a dict and not a set. Earlier, we only needed to keep track of who the subscribers were. Now, we also need to remember the callback for each subscriber. These are used in the dispatch() method:

    def dispatch(self, message):
        for callback in self.subscribers.values():
            callback(message)

dispatch() only needs to cycle through the values, because it just needs to call each subscriber’s update method (even if it’s not called update()). Notice we don’t have to reference the subscriber object to call that method; the method object internally has a reference to its instance (i.e., its "self"), so callback(message) calls the right method on the right object. In fact, the only reason we keep track of subscribers at all is so we can unregister() them.

Let’s put this together with a few subscribers:

pub = Publisher()
bob = SubscriberOne('Bob')
alice = SubscriberTwo('Alice')
john = SubscriberOne('John')

pub.register(bob, bob.update)
pub.register(alice, alice.receive)
pub.register(john)

pub.dispatch("It's lunchtime!")
pub.unregister(john)
pub.dispatch("Time for dinner")

Here’s the output:

Bob got message "It's lunchtime!"
Alice got message "It's lunchtime!"
John got message "It's lunchtime!"
Bob got message "Time for dinner"
Alice got message "Time for dinner"

Pop quiz. Look at the Publisher class again:

class Publisher:
    def __init__(self):
        self.subscribers = dict()
    def register(self, who, callback=None):
        if callback is None:
            callback = who.update
        self.subscribers[who] = callback
    def dispatch(self, message):
        for callback in self.subscribers.values():
            callback(message)

Does callback have to be a method of the subscriber? Or can it be a method of a different object, or something else? Think about this before you continue…

It turns out callback can be any callable, provided it has a signature compatible with how it’s called in dispatch(). That means it can be a method of some other object, or even a normal function. This lets you register subscriber objects without an update method at all:

# This subscriber doesn't have ANY suitable method!
class SubscriberThree:
    def __init__(self, name):
        self.name = name
# ... but we can define a function...
todd = SubscriberThree('Todd')
def todd_callback(message):
    print(f'Todd got message "{message}"')
# ... and pass it to register:
pub.register(todd, todd_callback)
# And then, dispatch a message:
pub.dispatch("Breakfast is Ready")

Sure enough, this works:

Todd got message "Breakfast is Ready"

Several Channels

So far, we’ve assumed that the publisher watches for only one kind of event. But what if there are several? Can we create a publisher that knows how to detect all of them, and let subscribers decide which they want to know about?

To implement this, let’s say a publisher has several channels that subscribers can subscribe to. Each channel notifies for a different event type. For example, if your program monitors a cluster of virtual machines, one channel signals when a certain machine’s disk usage exceeds 75% (a warning sign, but not an immediate emergency); and another signals when disk usage goes over 90% (much more serious, and may begin to impact performance on that virtual machine). Some subscribers will want to know when the 75% threshold is crossed; some, the 90% threshold; and some might want to be alerted for both. What’s a good way to express this in Python code?

Let’s work with the mealtime-announcement code above. Rather than jumping right into the code, let’s mock up the interface first. We want to create a publisher with two channels, like so:

# Two channels, named "lunch" and "dinner".
pub = Publisher(['lunch', 'dinner'])

This constructor is different; it takes a list of channel names. Let’s also pass the channel name to register(), since each subscriber will register for one or more:

# Three subscribers, of the original type.
bob = Subscriber('Bob')
alice = Subscriber('Alice')
john = Subscriber('John')

# Two args: channel name & subscriber
pub.register("lunch", bob)
pub.register("dinner", alice)
pub.register("lunch", john)
pub.register("dinner", john)

Now, on dispatch, the publisher needs to specify the event type. So just like with register(), we’ll prepend a channel argument:

pub.dispatch("lunch", "It's lunchtime!")
pub.dispatch("dinner", "Dinner is served")

When correctly working, we’d expect this output:

Bob got message "It's lunchtime!"
John got message "It's lunchtime!"
Alice got message "Dinner is served"
John got message "Dinner is served"

Pop quiz (and if it’s practical, pause here to write your own Python code): how would you implement this new, multi-channel Publisher?

There are several approaches, but the simplest I’ve found relies on creating a separate subscribers dictionary for each channel. Here’s one approach:

class Publisher:
    def __init__(self, channels):
        # Create an empty subscribers dict
        # for every channel
        self.channels = { channel : dict()
                          for channel in channels }

    def register(self, channel, who, callback=None):
        if callback is None:
            callback = who.update
        subscribers = self.channels[channel]
        subscribers[who] = callback

    def dispatch(self, channel, message):
        subscribers = self.channels[channel]
        for callback in subscribers.values():
            callback(message)

This Publisher has a dict called self.channels, which maps channel names (strings) to subscriber dictionaries. register() and dispatch() are not too different: they simply have an extra step, in which subscribers is looked up in self.channels. I use that variable just for readability, and I think it’s well worth the extra line of code:

# Works the same. But a bit less readable.
    def register(self, channel, who, callback=None):
        if callback is None:
            callback = who.update
        self.channels[channel][who] = callback

These are some variations of the general Observer pattern, and I’m sure you can imagine more. What I want you to notice are the options available in Python when you leverage function objects, and other Pythonic features.

Magic Methods

Suppose we want to create a class to work with angles, in degrees. We want this class to help us with some standard bookkeeping:

An angle will be at least 0, but less than 360.
If we create an angle outside this range, it automatically wraps around to an equivalent, in-range value:
- If we add 270 degrees and 270 degrees, it evaluates to 180 degrees instead of 540 degrees.
- If we subtract 180 degrees from 90 degrees, it evaluates to 270 degrees instead of -90 degrees.
- If we multiply an angle by a real number, it wraps the final value into the correct range.
And while we’re at it, we want to enable all the other behaviors we normally want with numbers: comparisons like “less than” and “greater than or equal to” or “==” (i.e., equals); division (which doesn’t normally require casting into a valid range, if you think about it); and so on.

Let’s see how we might approach this, by creating a basic Angle class:

class Angle:
    def __init__(self, value):
        self.value = value % 360

The modular division in the constructor is kind of neat: if you reason through it with a few positive and negative values, you’ll find the math works out correctly whether the angle is overshooting or undershooting. This meets one of our key criteria already: the angle is normalized to be from 0 up to 360.

But how does it handle addition? We get an error if we try it directly:

>>> Angle(30) + Angle(45)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'Angle' and 'Angle'
>>>

We can easily define a method called add, which will let us write code like angle3 = angle1.add(angle2). But it’s better to reuse the familiar arithmetic operators everyone knows. Python lets us do that, through a collection of object hooks called magic methods. Magic methods let you define classes so that their instances can be used with all of Python’s standard operators. That includes arithmetic (+ - * / //), equality (==), inequality (!=), comparisons (< > >= \<=), bit-shifting operations, and even concepts like exponentiation and absolute value.

Few classes will need all of these, but sometimes it’s valuable to have them available. Let’s see how they can improve our Angle type.

The pattern for each magic method is the same. For a given operation—say, addition—there is a special method name that starts with double-underscores. For addition, it’s __add__()—the others also have sensible names. All you have to do is define that method, and you can use instances of your class with that operator.

When you discuss magic methods in face-to-face, verbal conversation, you’ll find yourself saying things like “underscore underscore add underscore underscore” over and over. That’s a lot of syllables. So people in the Python community use a kind of verbal abbreviation, with a word they invented: dunder. That’s not a real word; Python people made it up. When you say “dunder foo”, it means “underscore underscore foo underscore underscore”. This isn’t used in writing, because it’s not needed—you can just write __foo__. But at Python gatherings, you’ll sometimes hear people say it. Use it; it saves you a lot of energy when talking.

Anyway, back to dunder add—I mean, __add__(). For operations like addition—which take two values, and return a third—you write the method like this:

    def __add__(self, other):
         return Angle(self.value + other.value)

The first argument needs to be called self, because this is Python. The second does not have to be called other, but often is. This lets us use the normal addition operator for arithmetic:

>>> total = Angle(30) + Angle(45)
>>> total.value
75

There are similar operators for subtraction, multiplication, and so on, as shown in Table 6-1:

Table 6-1. Arithmetic magic methods
Method	Operation
`__add__()`	`a + b`
`__sub__()`	`a - b`
`__mul__()`	`a * b`
`__truediv__()`	`a / b` (floating-point division)
`__mod__()`	`a % b`
`__pow__()`	`a ** b`

Essentially, Python translates a + b to a.__add__(b); a % b to a.__mod__(b); and so on. You can also hook into bit-operation operators (see Table 6-2).

Table 6-2. Bit-operation magic methods
Method	Operation
`__lshift__()`	`a << b`
`__rshift__()`	`a >> b`
`__and__()`	`a & b`
`__xor__()`	`a ^ b`
`__or__()`	`a \| b`

So a & b translates to a.__and__(b), for example.

Since __and__() corresponds to the bitwise-and operator (for expressions like “foo & bar”), you might wonder what the magic method is for logical-and (“foo and bar”), or logical-or (“foo or bar”). Sadly, there is none; because of how Python’s Boolean logic short-circuits, there is not really a good way to do magic methods for them. For this reason, sometimes libraries will hijack the & and | operators to mean logical and/or, instead of bitwise and/or.

The default representation of an Angle object isn’t very useful:

>>> Angle(30)
<__main__.Angle object at 0x106df9198>

It tells us the type, and the hex object ID, but we’d rather it tell us something about the value of the angle. There are two magic methods that can help. The first is __str__(), which is used when printing a result:

class Angle:
    # ...
    def __str__(self):
        return f"{self.value} degrees"

The print() function uses this, and so do str() and the string formatting operations:

>>> print(Angle(30))
30 degrees
>>> print(f"{Angle(30) + Angle(45)}")
75 degrees
>>> print("{}".format(Angle(30) + Angle(45)))
75 degrees
>>> str(Angle(135))
'135 degrees'
>>> some_angle = Angle(45)
>>> f"{some_angle}"
'45 degrees'

Sometimes you want a string representation that is more precise, which might be at odds with a human-friendly representation. Imagine you have several subclasses (for instance, PitchAngle and YawAngle in some kind of aircraft-related library), and you want an easy way to log the exact type and arguments needed to recreate the object. Python provides a second magic method for this purpose, called __repr__():

class Angle:
    # ...
    def __repr__(self):
        return f"Angle({self.value})"

You access this by calling either the repr() built-in function, or by passing the !r conversion to the formatting string:

>>> repr(Angle(75))
'Angle(75)'
>>> print('{!r}'.format(Angle(30) + Angle(45)))
Angle(75)
>>> print(f"{Angle(30) + Angle(45)!r}")
Angle(75)

You can think of both of these as working like str(), but invoking __repr__() instead of __str__().

The official guideline is that the output of __repr__() can be passed to eval() to recreate the object exactly. It’s not enforced by the language, and is not always practical, or even possible. But when you can follow that guideline, it is useful for logging and debugging.

We also want to be able to compare two Angle objects. The most basic comparison is equality, provided by __eq__(). It should return True or False:

class Angle:
    # ...
    def __eq__(self, other):
        return self.value == other.value

If defined, this method is used by the == operator:

>>> Angle(3) == Angle(3)
True
>>> Angle(7) == Angle(1)
False

By default, the == operator is based on the object ID. So an expression like x == y evaluates to True if x and y have the same ID, and otherwise evaluates to False. That is rarely useful:

>>> class BadAngle:
...     def __init__(self, value):
...         self.value = value
...
>>> BadAngle(3) == BadAngle(3)
False

What’s left are the fuzzier comparison operations: less than, greater than, and so on. Python’s documentation calls these “rich comparison” methods, so you can feel wealthy when using them (see Table 6-3).

Table 6-3. Rich comparison magic methods
Method	Operation
`__lt__()`	less than (`<`)
`__le__()`	less than or equal (`\<=`)
`__gt__()`	greater than (`>`)
`__ge__()`	greater than or equal (`>=`)

For example:

class Angle:
    # ...
    def __gt__(self, other):
        return self.value > other.value

Now the greater-than operator works correctly:

>>> Angle(100) > Angle(50)
True

Similarly, with __ge__(), __lt__(), etc. If you don’t define these, you get an error:

>>> BadAngle(8) > BadAngle(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: BadAngle() > BadAngle()

__gt__() and __lt__() are reflections of each other. What that means is that, in many cases, you only have to define one of them. Suppose you implement __gt__() but not __lt__(), then do this:

>>> a1 = Angle(3)
>>> a2 = Angle(7)
>>> a1 < a2
True

This works thanks to some just-in-time introspection the Python runtime does. The a1 < a2 is translated to a1.__lt__(a2). If Angle.__lt__() is indeed defined, that method is executed, and the expression evaluates to its return value.

a1 < a2 is true if and only if a2 > a1. For this reason, if __lt__() does not exist, but __gt__() does, then Python will rewrite the angle comparison: a1.__lt__(a2) becomes a2.__gt__(a1). This is then evaluated, and the expression a1 < a2 is set to its return value. There are some situations where you will need to define both, for example, if the comparison is based on several member variables.

Rebelliously Misusing Magic Methods

Magic methods are interesting enough, and quite handy when you need them. But depending on the kind of applications you work on, you will rarely need to define a class whose instances can be added, subtracted, or compared.

Things get much more interesting, though, when you don’t follow the rules.

Here’s a fascinating fact: methods like __add__() are supposed to do addition. But it turns out that Python does not enforce this. And methods like __gt__() are supposed to return True or False. But if you write a __gt__() which returns something that isn’t a bool…Python will not complain at all.

This creates amazing possibilities.

To illustrate, let me tell you about Pandas. You probably know that Pandas is an excellent data-processing library. It’s become extremely popular among data scientists who use Python. Pandas has a convenient data type called a DataFrame. It represents a two-dimensional collection of data, organized into rows, with labeled columns:

import pandas
df = pandas.DataFrame({
        'A': [-137, 22, -3, 4, 5],
        'B': [10, 11, 121, 13, 14],
        'C': [3, 6, 91, 12, 15],
    })

There are several ways to create a DataFrame; here I’ve chosen to use a dictionary.⁴ The keys are column names; the values are lists, which become that column’s data. So you visually rotate each list 90 degrees:

>>> print(df)
     A    B   C
0 -137   10   3
1   22   11   6
2   -3  121  91
3    4   13  12
4    5   14  15

The rows are numbered for you, and the columns nicely labeled in a header. The A column, for example, has different positive and negative numbers.

Now, one of the many useful things you can do with a DataFrame is filter out rows meeting certain criteria. This doesn’t change the original DataFrame; instead, it creates a new DataFrame, containing just the rows you want. For example, you can say, “Give me the rows of df in which the A column has a positive value”:

>>> positive_a = df[df.A > 0]
>>> print(positive_a)
    A   B   C
1  22  11   6
3   4  13  12
4   5  14  15

All you have to do is pass in “df.A > 0” in the square brackets.

But there’s something weird going on here. Take a look at the line in which positive_a is defined. Do you notice anything unusual there? Anything strange?

Here’s what is odd: the expression “df.A > 0” ought to evaluate to either True, or False. Right? It’s supposed to be a Boolean value with exactly one bit of information. But the source dataframe, df, has many rows. Real dataframes can easily have tens of thousands, even millions of rows of data. There’s no way a Boolean literal can express which of those rows to keep and which to discard. How does this even work?

Turns out, it’s not Boolean at all:

>>> comparison = (df.A > 0)
>>> type(comparison)
<class 'pandas.core.series.Series'>
>>> print(comparison)
0    False
1     True
2    False
3     True
4     True
Name: A, dtype: bool

Yes, you can do that, thanks to Python’s dynamic type system. Python translates “df.A > 0” into “df.A.__gt__(0)”. And that __gt__() method doesn’t have to return a bool. In fact, in Pandas, it returns a Series object (which is like a vector of data), containing True or False for each row. And when that’s passed into df[]—the square brackets being handled by the __getitem__() method—that Series object is used to filter rows.

To see what this looks like, let’s re-invent part of the interface of Pandas. I’ll create a library called fakepandas, which instead of DataFrame has a type called Dataset:

class Dataset:
    def __init__(self, data):
        self.data = data
        self.labels = sorted(data.keys())
    def __getattr__(self, label: str):
        # Makes references like df.A work.
        return Column(label)
    # Plus some other methods.

If I have a Dataset object named ds, with a column named A, the __getattr__() method causes references like ds.A to return a Column object:

import operator
class Column:
    def __init__(self, name):
        self.name = name
    def __gt__(self, value):
         return Comparison(self.name, value, operator.gt)

This Column class has a __gt__() method, which makes expressions like “ds.A > 0” return an instance of a class called Comparison. It represents a lazy calculation for when the actual filtering happens later. Notice its constructor arguments: a column name, a threshold value, and a callable to implement the comparison. (The operator module has a function called gt() that takes two arguments, expressing a greater-than comparison).

You can even support complex filtering criteria like ds[ds.C + 2 < ds.B]. It’s all possible by leveraging magic methods in these unorthodox ways. If you care about the details, I wrote an article delving into that. My goal here isn’t to tell you how to re-invent the Pandas interface, so much as to get you to realize what’s possible.

Have you ever implemented a compiler? If so, you know the parsing phase is a significant development challenge. Using Python magic methods in this manner does much of the hard work of lexing and parsing for you. And the best part is how natural and intuitive the result can be for end users. You are essentially implementing a domain-specific language on top of regular Python syntax, but consistently enough that people quickly become fluent and productive within its rules. They often won’t even think to ask why the rules seem to be bent; they won’t notice that “df.A > 0” isn’t acting like a Boolean. That’s a clear sign of success. It means you have designed your library so well, other developers become effortlessly productive.

Conclusion

Most Python users know how to write simple classes and methods. But as you can see, Python’s object system has a lot more to it than that. Learning more advanced OOP opens up great opportunities for you, and allows you to create code structures you would otherwise never produce. Take everything you learned in this chapter, and put it into action.

¹ In OOP, when working with properties, we call the method which “reads” the current value the getter. And the method you call to set a new value is called the setter.

² This is akin to protected in languages like Java. But unlike Java, Python does not enforce it. Instead, it is a convention you are expected to follow.

³ Technically, pub-sub is a more general architectural pattern that can apply to distributed systems. In contrast, the Observer pattern is always limited to what’s inside a single process. That is the scope we will focus on here.

⁴ Which you will rarely do in real code (you will ingest from a CSV file or something instead), but it is convenient for demonstrating here.