This chapter assumes you are familiar with the basics of object-oriented programming (OOP) in Python: creating classes, defining methods, and simple inheritance. You will build on that knowledge in this chapter.
As with any object-oriented language, it’s useful to learn about design patterns—reusable solutions to common problems involving classes and objects. A lot has been written about design patterns. And while much of it applies to Python, it tends to apply differently.
That is because many design-pattern books and articles are written for languages like Java, C++, and C#. But as a language, Python is different. Its dynamic typing, first-class functions, and other additions all mean the “standard” design patterns just work differently.
So let’s learn what Pythonic OOP is really about.
Python objects have attributes. “Attribute” is a general term
meaning “whatever is to the right of the dot” in an expression like
x.y or z.f(). Member variables and methods are two kinds of
attributes. But Python has another kind of attribute called properties.
A property is a hybrid: a cross between a method and a member variable. The idea is to create an attribute that acts like a member variable from the outside, but reading or writing to this attribute triggers method calls internally.
You’ll set this up with a special decorator called @property. A
simple example:
classPerson:def__init__(self,firstname,lastname):self.firstname=firstnameself.lastname=lastname@propertydeffullname(self):returnself.firstname+" "+self.lastname
By instantiating this, you can access fullname like it is a member variable:
>>>joe=Person("Joe","Smith")>>>joe.fullname'Joe Smith'
Look carefully for the actual member variables here. There are two,
firstname and lastname, set in the constructor. This class also
has a method called fullname. But after creating the instance, we
reference joe.fullname as a member variable; we don’t call
joe.fullname() as a method. However, when you read the value of
joe.fullname, the fullname() method is invoked.
This is all due to the @property decorator. When applied to a
method, this decorator makes it inaccessible as a method. You must
access it like a member variable. In fact, if you try to call it
as a method, you get an error:
>>>joe.fullname()Traceback (most recent call last):File"<stdin>", line1, in<module>TypeError:'str' object is not callable
As defined above, fullname is read-only. We cannot modify it:
>>>joe.fullname="Joseph Smith"Traceback (most recent call last):File"<stdin>", line1, in<module>AttributeError:can't set attribute
In other words, Python properties are read-only by default. Another
way of saying this is that @property automatically defines a
getter, but not a setter.1 If you want fullname to be writable, here is how you
define its setter:
classPerson:def__init__(self,firstname,lastname):self.firstname=firstnameself.lastname=lastname@propertydeffullname(self):returnself.firstname+" "+self.lastname@fullname.setterdeffullname(self,value):self.firstname,self.lastname=value.split(" ",1)
This lets you assign to joe.fullname:
>>>joe=Person("Joe","Smith")>>>joe.firstname'Joe'>>>joe.lastname'Smith'>>>joe.fullname="Joseph Smith">>>joe.firstname'Joseph'>>>joe.lastname'Smith'
So we have two methods named fullname(). The first one, decorated
with @property, is dispatched (invoked) when you read the value of
joe.fullname. The second one, decorated with @fullname.setter, is
dispatched when you assign to joe.fullname. Python picks which to
run, depending on whether you are getting or setting.
The first time I saw this, I had many questions. “Wait, why is this
fullname method defined twice? And why is the second decorator named
@fullname, and why does it have a setter attribute? How on earth
does this even work?!”
The code is actually designed to work this way. The @property line,
followed by def fullname, must come first. Those two lines create
the property to begin with, and also create the getter. By “create
the property”, I mean that an object named fullname exists in the
namespace of the class, and it has an attribute named
fullname.setter. This fullname.setter is a decorator that is
applied to the next def fullname, christening it as the setter for
the fullname property.
It’s okay to not fully understand how this works. A full explanation relies on understanding not only decorators, but also Python’s descriptor protocol, which is beyond the scope of this chapter. Fortunately, you don’t have to understand how it works in order to use it.
What you see here with the Person class is one way properties are
useful: they are magic attributes which act like member variables, but
their value is derived from other member variables. This denormalizes
the object’s data, and lets you access the attribute like it is a
member variable. You’ll see a situation where that is extremely useful
later.
Properties enable a useful collection of design patterns. One—as
mentioned—is creating read-only member variables. In the first
version of Person, the fullname “member variable” is a dynamic
attribute; it does not exist on its own, but instead calculates its
value at runtime.
It’s also common to have the property backed by a single, non-public member variable. That pattern looks like this:
classCoupon:def__init__(self,amount):self._amount=amount@propertydefamount(self):returnself._amount
This allows the class itself to modify the value internally, but prevent outside code from doing so:
>>>coupon=Coupon(1.25)>>>coupon.amount1.25>>>coupon.amount=1.50Traceback (most recent call last):File"<stdin>", line1, in<module>AttributeError:can't set attribute
In Python, prefixing a member variable or method by a single underscore signals it is protected; it should only be accessed internally, inside methods of that class or its subclasses.2 This property pattern says, “You can read the value of this attribute, but you cannot change it”.
There is another pattern between “regular member variable” and “read-only”:
the value can be changed, but you must validate it first. Suppose you
and I are developing some software that manages live events. We write
a Ticket class, representing tickets sold to attendees:
classTicket:def__init__(self,price):self.price=price# And some other methods...
One day, we find a bug in our web UI that lets shifty customers adjust the price to a negative value. So we end up paying them to go to the concert. Not good!
The first priority is, of course, to fix the bug in the UI. But how do
we modify our code to prevent this from ever happening again? Before
reading further, look at the Ticket class and ponder—how could you
use properties to make this kind of bug impossible in the future?
The answer: verify the new price is non-negative in the setter:
# Version 1...classTicket:def__init__(self,price):self._price=price@propertydefprice(self):returnself._price@price.setterdefprice(self,new_price):# Only allow non-negative prices.ifnew_price<0:raiseValueError("Nice try")self._price=new_price
This lets the price be adjusted, but only to sensible values:
>>>t=Ticket(42)>>>t.price=24# This is allowed.>>>(t.price)24>>>t.price=-1# This is NOT.Traceback (most recent call last):File"<stdin>", line1, in<module>File"<stdin>", line11, inpriceValueError:Nice try
However, there’s a defect in this new Ticket class. Can you spot
it? (And how to fix it?)
The problem is that while we can’t change the price to a negative
value, this first version lets us create a ticket with a negative
price to begin with. That’s because we wrote self._price = price in
the constructor. The solution is to use the setter in the
constructor instead:
# Final version, with modified constructor. (Constructor# is different; code for getter & setter is the same.)classTicket:def__init__(self,price):# instead of "self._price = price"self.price=price@propertydefprice(self):returnself._price@price.setterdefprice(self,new_price):# Only allow positive prices.ifnew_price<0:raiseValueError("Nice try")self._price=new_price
Yes, you can reference self.price in methods of the class. When we
write self.price = price, Python translates this to calling the
price setter—that is, the second price() method. This final version
of Ticket centralizes all reads and writes of self._price in the
property—a useful encapsulation technique. The idea is to centralize
any special behavior for that member variable in the getter and
setter, even for the class’s internal code. In practice, sometimes
other methods will need to violate this rule; if so, you simply
reference self._price and move on. But as much as you can, only use
the protected (underscore) member variable in the getter and setter,
and that will naturally tend to boost the quality of your code.
Imagine writing a simple money class:
classMoney:def__init__(self,dollars,cents):self.dollars=dollarsself.cents=cents# And some other methods...
Suppose you put this class in a library many developers use: people on your current team, perhaps developers on different teams. Maybe you release it as open source, so developers around the world use and rely on this class.
One day you realize that many of Money’s methods—which do calculations
on the money amount—could be simpler and more straightforward
if they operated on the total number of cents, rather than
dollars and cents separately. So you refactor the internal state:
classMoney:def__init__(self,dollars,cents):self.total_cents=dollars*100+cents
This creates a major maintainability problem. Do you spot it?
Here’s the trouble: your original Money has member variables named
dollars and cents. And since many developers are using these
variables, changing to total_cents breaks all their code!
money=Money(27,12)message="I have{:d}dollars and{:d}cents."# This line breaks, because there's no longer# dollars or cents attributes.(message.format(money.dollars,money.cents))
If no one but you uses this class, there’s no real problem—you can just refactor your own code. But otherwise, coordinating this change with everyone’s different codebases is a nightmare. It becomes a barrier to improving your own code.
So, what do you do? Can you think of a way to handle this situation?
You get out of this mess using properties. You want two things to happen:
The class uses total_cents internally.
All code using dollars and cents continues to work, without modification.
You’ll do this by replacing dollars and cents with total_cents
internally, but also creating getters and setters for these
attributes. Take a look:
classMoney:def__init__(self,dollars,cents):self.total_cents=dollars*100+cents# Getter and setter for dollars...@propertydefdollars(self):# // is integer divisionreturnself.total_cents//100@dollars.setterdefdollars(self,new_dollars):self.total_cents=100*new_dollars+self.cents# And for cents.@propertydefcents(self):returnself.total_cents%100@cents.setterdefcents(self,new_cents):self.total_cents=100*self.dollars+new_cents
Now, I can get and set dollars and cents all day:
>>>money=Money(27,12)>>>money.total_cents2712>>>money.cents12>>>money.dollars=35>>>money.total_cents3512
Python’s way of doing properties brings many benefits. In languages like Java, the following story can play out:
A newbie developer starts writing Java classes. They want to expose some state, so they create public member variables.
They use this class everywhere. Other developers use it too.
One day, the developer decides to change the name or type of that member
variable, or even delete it entirely (like what we did with Money).
But that would break everyone’s code. So they can’t.
This is not a problem for Java developers in practice, because they quickly learn to make all their variables private by default—proactively creating getters and setters for every publicly exposed chunk of data. They realize this boilerplate is far less painful than the alternative, because if everyone must use the public getters and setters to begin with, you always have the freedom to make internal changes later.
This works well enough. But it is distracting, and just enough trouble that there’s always the temptation to make that member variable public, and be done with it.
In Python, we have the best of both worlds. You can freely create member variables—which are public by default—and refactor them as properties if and when you ever need to. No one using your code even has to know.
There are several design patterns with the word “factory” in their names. Their unifying idea is providing a handy, simplified way to create useful, potentially complex objects. The two most important forms are:
Where the object’s type is fixed, but we want to have several different ways to create it. This is called the Simple Factory Pattern.
Where the factory dynamically chooses one of several different types. This is called the Factory Method Pattern.
Let’s look at how you do these in Python.
Imagine a simple Money class, suitable for currencies which have
dollars and cents:
classMoney:def__init__(self,dollars,cents):self.dollars=dollarsself.cents=cents
We looked at this in the previous section, changing what member variables it has. But let’s roll back, and focus instead on the constructor’s interface. This constructor is convenient when we have the dollars and cents as separate integer variables. But there are many other ways to specify an amount of money. Perhaps you’re modeling a giant jar of pennies:
# Emptying the penny jar...total_pennies=3274dollars=total_pennies//100cents=total_pennies%100total_cash=Money(dollars,cents)
Suppose your code repeatedly splits pennies into dollars and cents,
over and over. And you’re tired of re-re-typing this calculation, plus
there is a chance you could make a mistake eventually. You could
change the constructor, but that means refactoring all
Money-creating code, and perhaps a lot of code fits the current
constructor better anyway. Some languages let you define several
constructors, but Python makes you pick one.
In this case, you can usefully create a factory function. A factory function takes the data you have, uses that to calculate what the class constructor needs, then returns the instance. For example:
# Factory function taking a single argument, returning# an appropriate Money instance.defmoney_from_pennies(total_cents):dollars=total_cents//100cents=total_cents%100returnMoney(dollars,cents)
Imagine that, in the same codebase, you also need to parse strings
like "$140.75". Here’s another factory function for that:
# Another factory, creating Money from a string amount.importredefmoney_from_string(amount):match=re.search(r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$',amount)ifmatchisNone:raiseValueError(f"Invalid amount:{amount}")dollars=int(match.group('dollars'))cents=int(match.group('cents'))returnMoney(dollars,cents)
These are effectively alternate constructors: callables we can use
with different arguments, which are used to create the final
instance. But this approach has problems. First, it’s awkward to have
them as separate functions, defined outside of the class. But more importantly: what happens if you subclass Money? Suddenly
money_from_string() and money_from_pennies() are worthless,
because they are hard-coded to use Money.
Python solves these problems with a flexible and powerful feature: the
classmethod decorator. Use it like this:
classMoney:def__init__(self,dollars,cents):self.dollars=dollarsself.cents=cents@classmethoddeffrom_pennies(cls,total_cents):dollars=total_cents//100cents=total_cents%100returncls(dollars,cents)
The function money_from_pennies() is now a method of the Money
class, called from_pennies(). But it has a new argument: cls. When
applied to a method definition, classmethod modifies how that method
is invoked and interpreted. The first argument is not self, which
would be an instance of the class. The first argument is now the
class itself. In the method body, self isn’t mentioned at all;
instead, cls is a variable holding the current class object—Money in this case. So the last line is creating a new instance of
Money:
>>>piggie_bank_cash=Money.from_pennies(3217)>>>type(piggie_bank_cash)<class '__main__.Money'>>>>piggie_bank_cash.dollars32>>>piggie_bank_cash.cents17
Notice from_pennies() is invoked on the class itself, not an
instance of the class. This is already nicer code organization. But
now it works with inheritance:
>>>classTipMoney(Money):...pass...>>>tip=TipMoney.from_pennies(475)>>>type(tip)<class '__main__.TipMoney'>
This is the real benefit of class methods. You define it once on the
base class, and all subclasses can leverage it, substituting their own
type for cls. This makes class methods perfect for the simple
factory in Python. The final line returns an instance of cls,
using its regular constructor. And cls refers to whatever the
current class is: Money, TipMoney, or some other subclass.
For the record, here’s how to translate money_from_string():
classMoney:# ...deffrom_string(cls,amount):match=re.search(r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$',amount)ifmatchisNone:raiseValueError(f"Invalid amount:{amount}")dollars=int(match.group('dollars'))cents=int(match.group('cents'))returncls(dollars,cents)
Class methods are a superior way to implement factories in Python. If
we subclass Money, that subclass will have from_pennies() and
from_string() methods that create objects of that subclass, without
any extra work on our part. And if we change the name of the Money
class, we only have to change it in one place, not three.
This form of the factory pattern is called Simple Factory, a name I
don’t love. I prefer to call it Alternate Constructor. Especially in
the context of Python, it describes well what @classmethod is most
useful for. And it suggests a general principle for designing your
classes. Look at this complete code of the Money class, and I’ll
explain:
importreclassMoney:def__init__(self,dollars,cents):self.dollars=dollarsself.cents=cents@classmethoddeffrom_pennies(cls,total_cents):dollars=total_cents//100cents=total_cents%100returncls(dollars,cents)@classmethoddeffrom_string(cls,amount):match=re.search(r'^\$(?P<dollars>\d+)\.(?P<cents>\d\d)$',amount)ifmatchisNone:raiseValueError(f"Invalid amount:{amount}")dollars=int(match.group('dollars'))cents=int(match.group('cents'))returncls(dollars,cents)
You can think of this class as having several constructors. As a
general rule, you’ll want to make __init__() the most generic one,
and implement the others as class methods. Sometimes, that means one
of the class methods will be used more often than __init__().
When using a new class, most developers’ intuition will be to reach for the default constructor first, without thinking to check the provided class methods—if they even know about that feature of Python in the first place. You may need to educate your teammates. (Hint: Good examples in the class’s code docs go a long way.)
This next factory pattern, called Factory Method, is quite different. The idea is that the factory will create an object, but will choose its type from one of several possibilities, dynamically deciding at runtime based on some criteria. It’s typically used when you have one base class, and are creating an object that can be one of several different derived classes.
Let’s see an example. Imagine you are implementing an image processing
library, creating classes to read the image from storage. So you
create a base ImageReader class, and several derived types:
importabcclassImageReader(metaclass=abc.ABCMeta):def__init__(self,path):self.path=path@abc.abstractmethoddefread(self):pass# Subclass must implement.def__repr__(self):returnf"{self.__class__.__name__}({self.path})"classGIFReader(ImageReader):defread(self):# Read a GIFclassJPEGReader(ImageReader):defread(self):# Read a JPEGclassPNGReader(ImageReader):defread(self):# Read a PNG
The ImageReader class is marked abstract, requiring subclasses to
implement the read() method. So far, so good.
When reading an image file, if its extension is .gif, I want to
use GIFReader. And if it is a JPEG image, I want to use
JPEGReader, and so on. The logic is:
Analyze the file path name to get the extension.
Choose the correct reader class based on that.
Create the appropriate reader object.
This process is a prime candidate for automation. Let’s define a little helper function:
defextension_of(path):# returns "png", "gif", "jpg", etc.position_of_last_dot=path.rfind('.')returnpath[position_of_last_dot+1:]
With these pieces, we can now define the factory:
defget_image_reader(path):image_type=extension_of(path)ifimage_type=='gif':reader_class=GIFReaderelifimage_type=='jpg':reader_class=JPEGReaderelifimage_type=='png':reader_class=PNGReaderelse:raiseValueError(f"Unknown extension:{image_type}")returnreader_class(path)
Classes in Python can be put in variables, just like any other
object. We take full advantage here, by storing the appropriate
ImageReader subclass in reader_class. Once we decide on the proper
value, the last line creates and returns the reader object.
This correctly working code is already more concise, readable, and maintainable than what some languages force you to go through. But in Python, we can do even better. We can use the built-in dictionary type to make it more readable, and easier to update and maintain over time:
READERS={'gif':GIFReader,'jpg':JPEGReader,'png':PNGReader,}defget_image_reader(path):reader_class=READERS[extension_of(path)]returnreader_class(path)
Here we have a global variable mapping filename extensions to
ImageReader subclasses. This lets us readably implement
get_image_reader() in two lines. Finding the correct class is a
simple dictionary lookup, and then we instantiate and return the
instance. If we need to support new image formats in the future, we
can simply add a line in the READERS definition. (And, of course,
define its reader class.)
What if we encounter an extension not in the mapping, like .tiff? As
written above, the code will raise a KeyError. That may be what we
want. Or perhaps we want to catch that exception and re-raise a different
exception. ValueError is a good choice; this is what the previous
version of get_image_reader() raised.
Alternatively, we may want to fall back on some default. Let’s create a new reader class, meant as an all-purpose fallback:
classRawByteReader(ImageReader):defread(self):# Read raw bytes
Then you can write the factory like:
defget_image_reader(path):try:reader_class=READERS[extension_of(path)]exceptKeyError:reader_class=RawByteReaderreturnreader_class(path)
Or more briefly:
defget_image_reader(path):reader_class=READERS.get(extension_of(path),RawByteReader)returnreader_class(path)
This design pattern is commonly called the Factory Method pattern, which wins my award for Worst Design Pattern Name in History. That name (which appears to originate from a Java implementation detail) fails to tell you anything about what this pattern is actually for. I myself call it the Dynamic Type pattern, which I feel is much more descriptive and useful.
So far, we have looked at patterns that are mostly confined to a single class. But there are richer patterns involving multiple codesigned classes, interacting with each other. Let’s look at one.
The Observer pattern provides a “one-to-many” relationship. That’s a vague description, so let’s make it more specific.
In the Observer pattern, there’s one root object, called the observable. This object knows how to detect some kind of event of interest. It can literally be anything: a customer makes a new purchase; someone subscribes to an email list; or maybe it monitors a fleet of cloud instances, detecting when a machine’s disk usage exceeds 75%. You use this pattern when the code to reliably detect the event of interest is at least slightly complicated; that detection code is encapsulated inside the observable.
In this pattern, you also have other objects, called observers, which need to know when that event occurs, so they can take some action in response. You don’t want to reimplement the robust detection algorithm in each, of course. Instead, these observers register themselves with the observable. The observable then notifies each observer—by calling a method on that observer—for each event. This separation of concerns is the heart of the observer pattern.
I must tell you: I don’t like the names of things in this pattern. The words “observable” and “observer” are a bit obscure, and sound confusingly similar—especially if your native tongue is not English. There is another terminology, however, which many developers find easier: pub-sub.
In this terminology, instead of an “observable,” you create a publisher object, which watches for events. One or more subscribers (instead of “observers”) ask that publisher to notify them when the event happens. I’ve found the pattern is easier to reason about when looked at in this way, so that is the terminology I’m going to use.3
Let’s make this concrete, with code.
We’ll start with the basic Observer pattern, as it’s often documented
in design pattern books—except we’ll translate it to Python. In
this simple form, each subscriber must implement a method called
update(). Here’s an example:
classSubscriber:def__init__(self,name):self.name=namedefupdate(self,message):(f"{self.name}got message:{message}")
update() takes a string. It’s okay to define an update() method taking
other arguments, or even calling it something other than update(); the
publisher and subscriber just need to agree on the protocol. But we’ll
use a single string argument.
Now, when a publisher detects an event, it notifies the subscriber by
calling its update() method. Here’s what a basic Publisher class
looks like:
classPublisher:def__init__(self):self.subscribers=set()defregister(self,who):self.subscribers.add(who)defunregister(self,who):self.subscribers.discard(who)defdispatch(self,message):forsubscriberinself.subscribers:subscriber.update(message)# Plus other methods, for detecting the event.
Let’s step through:
A publisher needs to keep track of its subscribers, right? We’ll
store them in a set object, named self.subscribers, created
in the constructor.
A subscriber is added with register(). Its argument who is an
instance of Subscriber. Who calls register()? It could be
anyone. The subscriber can register itself, or some external code
can register a subscriber with a specific publisher.
unregister() is there in case a subscriber no longer needs to be
notified of the events.
When the event of interest occurs, the publisher notifies its
subscribers by calling its dispatch() method. Usually this is
invoked by the publisher itself, in some other method of the class
(not shown) that implements the event-detection logic. It simply
cycles through the subscribers, calling update() on each.
Using these two classes in code is straightforward enough:
# Create a publisher and some subscribers.pub=Publisher()bob=Subscriber('Bob')alice=Subscriber('Alice')john=Subscriber('John')# Register the subscribers, so they get notified.pub.register(bob)pub.register(alice)pub.register(john)
Now, the publisher can dispatch messages:
# Send a message...pub.dispatch("It's lunchtime!")# John unsubscribes...pub.unregister(john)# ... and a new message is sent.pub.dispatch("Time for dinner")
Here’s the output from running the above:
John got message "It's lunchtime!" Bob got message "It's lunchtime!" Alice got message "It's lunchtime!" Bob got message "Time for dinner" Alice got message "Time for dinner"
This is the basic Observer pattern, and pretty close to how you’d implement the idea in languages like Java, C#, and C++. But Python’s feature set differs from those languages. That means we can do different things.
Python’s functions are first-class objects. This means you can store a function in a variable—not the value returned when you call a function, but the function itself—as well as pass it as an argument to other functions and methods. Some other languages support this too (or something like it, such as function pointers), but Python’s strong support gives us a convenient opportunity for this design pattern.
The standard Observer pattern requires the publisher to hard-code a
certain method (usually named update()) that the subscriber must
implement. But maybe you need to register a subscriber which doesn’t
have that method. What then? If it’s your own class, you can just add
it. If you are importing the subscriber class from another library
(which you can’t or don’t want to modify), perhaps you can add the
method by subclassing it.
Sometimes you can’t do any of those things—or you could, but it’s a lot of trouble, and you want to avoid it. What then?
Let’s extend the traditional observer pattern, and make register()
more flexible. Suppose you have these subscribers:
# This subscriber uses the standard "update"classSubscriberOne:def__init__(self,name):self.name=namedefupdate(self,message):(f'{self.name}got message "{message}"')# This one wants to use "receive"classSubscriberTwo:def__init__(self,name):self.name=namedefreceive(self,message):(f'{self.name}got message "{message}"')
SubscriberOne is the same subscriber class we saw
before. SubscriberTwo is almost the same: instead of update(), it
has a method named receive(). Let’s modify Publisher so it
can work with objects of either subscriber type:
classPublisher:def__init__(self):self.subscribers=dict()defregister(self,who,callback=None):ifcallbackisNone:callback=who.updateself.subscribers[who]=callbackdefdispatch(self,message):forcallbackinself.subscribers.values():callback(message)defunregister(self,who):delself.subscribers[who]
There’s a lot going on here, so let’s break it down. Look first at
the constructor: it creates a dict instead of a set. You’ll see why
in a moment.
Now focus on register():
defregister(self,who,callback=None):ifcallbackisNone:callback=who.updateself.subscribers[who]=callback
It can be called with one or two arguments. With one argument, who
is a subscriber of some sort, and callback defaults to None. In
that case, the method body sets callback to who.update. Notice
the lack of parentheses; who.update is a method object. It’s just
like a function object, except it happens to be tied to an
instance. And just like a function object, you can store it in a
variable, pass it as an argument to another function, and so
on (refer to Chapter 3 for more details). So we’re storing it in a variable called callback.
What if register() is called with two arguments? Here’s how that might look:
pub=Publisher()alice=SubscriberTwo('Alice')pub.register(alice,alice.receive)
alice.receive is another method object; this object is
assigned to callback. Regardless of whether register() is called
with one argument or two, the last line inserts callback into the
dictionary:
self.subscribers[who]=callback
Take a moment to appreciate the remarkable flexibility of Python
dictionaries. Here, you are using an arbitrary instance of either
SubscriberOne or SubscriberTwo as a key. These two classes are
unrelated by inheritance, so from Python’s viewpoint they are
completely distinct types. And for that key, you insert a method
object as its value. Python does this seamlessly, without complaint!
Many languages would make you jump through hoops to accomplish this.
Now it is clear why self.subscribers is a dict and not a
set. Earlier, we only needed to keep track of who the subscribers
were. Now, we also need to remember the callback for each
subscriber. These are used in the dispatch() method:
defdispatch(self,message):forcallbackinself.subscribers.values():callback(message)
dispatch() only needs to cycle through the values,
because it just needs to call each subscriber’s update method (even if
it’s not called update()). Notice we don’t have to reference the
subscriber object to call that method; the method object internally
has a reference to its instance (i.e., its "self"), so
callback(message) calls the right method on the right object. In
fact, the only reason we keep track of subscribers at all is so we can
unregister() them.
Let’s put this together with a few subscribers:
pub=Publisher()bob=SubscriberOne('Bob')alice=SubscriberTwo('Alice')john=SubscriberOne('John')pub.register(bob,bob.update)pub.register(alice,alice.receive)pub.register(john)pub.dispatch("It's lunchtime!")pub.unregister(john)pub.dispatch("Time for dinner")
Here’s the output:
Bob got message "It's lunchtime!" Alice got message "It's lunchtime!" John got message "It's lunchtime!" Bob got message "Time for dinner" Alice got message "Time for dinner"
Pop quiz. Look at the Publisher class again:
classPublisher:def__init__(self):self.subscribers=dict()defregister(self,who,callback=None):ifcallbackisNone:callback=who.updateself.subscribers[who]=callbackdefdispatch(self,message):forcallbackinself.subscribers.values():callback(message)
Does callback have to be a method of the subscriber? Or can it be a
method of a different object, or something else? Think about this
before you continue…
It turns out callback can be any callable, provided it has a
signature compatible with how it’s called in dispatch(). That means it
can be a method of some other object, or even a normal function. This
lets you register subscriber objects without an update method at all:
# This subscriber doesn't have ANY suitable method!classSubscriberThree:def__init__(self,name):self.name=name# ... but we can define a function...todd=SubscriberThree('Todd')deftodd_callback(message):(f'Todd got message "{message}"')# ... and pass it to register:pub.register(todd,todd_callback)# And then, dispatch a message:pub.dispatch("Breakfast is Ready")
Sure enough, this works:
Todd got message "Breakfast is Ready"
So far, we’ve assumed that the publisher watches for only one kind of event. But what if there are several? Can we create a publisher that knows how to detect all of them, and let subscribers decide which they want to know about?
To implement this, let’s say a publisher has several channels that subscribers can subscribe to. Each channel notifies for a different event type. For example, if your program monitors a cluster of virtual machines, one channel signals when a certain machine’s disk usage exceeds 75% (a warning sign, but not an immediate emergency); and another signals when disk usage goes over 90% (much more serious, and may begin to impact performance on that virtual machine). Some subscribers will want to know when the 75% threshold is crossed; some, the 90% threshold; and some might want to be alerted for both. What’s a good way to express this in Python code?
Let’s work with the mealtime-announcement code above. Rather than jumping right into the code, let’s mock up the interface first. We want to create a publisher with two channels, like so:
# Two channels, named "lunch" and "dinner".pub=Publisher(['lunch','dinner'])
This constructor is different; it takes a list of channel names.
Let’s also pass the channel name to register(), since each
subscriber will register for one or more:
# Three subscribers, of the original type.bob=Subscriber('Bob')alice=Subscriber('Alice')john=Subscriber('John')# Two args: channel name & subscriberpub.register("lunch",bob)pub.register("dinner",alice)pub.register("lunch",john)pub.register("dinner",john)
Now, on dispatch, the publisher needs to specify the event type. So
just like with register(), we’ll prepend a channel argument:
pub.dispatch("lunch","It's lunchtime!")pub.dispatch("dinner","Dinner is served")
When correctly working, we’d expect this output:
Bob got message "It's lunchtime!" John got message "It's lunchtime!" Alice got message "Dinner is served" John got message "Dinner is served"
Pop quiz (and if it’s practical, pause here to write your own Python
code): how would you implement this new, multi-channel Publisher?
There are several approaches, but the simplest I’ve found relies on
creating a separate subscribers dictionary for each channel. Here’s one approach:
classPublisher:def__init__(self,channels):# Create an empty subscribers dict# for every channelself.channels={channel:dict()forchannelinchannels}defregister(self,channel,who,callback=None):ifcallbackisNone:callback=who.updatesubscribers=self.channels[channel]subscribers[who]=callbackdefdispatch(self,channel,message):subscribers=self.channels[channel]forcallbackinsubscribers.values():callback(message)
This Publisher has a dict called self.channels, which maps channel
names (strings) to subscriber dictionaries. register() and dispatch()
are not too different: they simply have an extra step, in which
subscribers is looked up in self.channels. I use that variable
just for readability, and I think it’s well worth the extra line of
code:
# Works the same. But a bit less readable.defregister(self,channel,who,callback=None):ifcallbackisNone:callback=who.updateself.channels[channel][who]=callback
These are some variations of the general Observer pattern, and I’m sure you can imagine more. What I want you to notice are the options available in Python when you leverage function objects, and other Pythonic features.
Suppose we want to create a class to work with angles, in degrees. We want this class to help us with some standard bookkeeping:
An angle will be at least 0, but less than 360.
If we create an angle outside this range, it automatically wraps around to an equivalent, in-range value:
If we add 270 degrees and 270 degrees, it evaluates to 180 degrees instead of 540 degrees.
If we subtract 180 degrees from 90 degrees, it evaluates to 270 degrees instead of -90 degrees.
If we multiply an angle by a real number, it wraps the final value into the correct range.
And while we’re at it, we want to enable all the other behaviors we normally want with numbers: comparisons like “less than” and “greater than or equal to” or “==” (i.e., equals); division (which doesn’t normally require casting into a valid range, if you think about it); and so on.
Let’s see how we might approach this, by creating a basic Angle class:
classAngle:def__init__(self,value):self.value=value%360
The modular division in the constructor is kind of neat: if you reason through it with a few positive and negative values, you’ll find the math works out correctly whether the angle is overshooting or undershooting. This meets one of our key criteria already: the angle is normalized to be from 0 up to 360.
But how does it handle addition? We get an error if we try it directly:
>>>Angle(30)+Angle(45)Traceback (most recent call last):File"<stdin>", line1, in<module>TypeError:unsupported operand type(s) for +: 'Angle' and 'Angle'>>>
We can easily define a method called add, which will let us write
code like angle3 = angle1.add(angle2). But it’s better to reuse the
familiar arithmetic operators everyone knows. Python lets us do that,
through a collection of object hooks called magic methods. Magic
methods let you define classes so that their instances can be used
with all of Python’s standard operators. That includes arithmetic (+
- * / //), equality (==), inequality (!=), comparisons
(< > >= \<=), bit-shifting operations, and even concepts like
exponentiation and absolute value.
Few classes will need all of these, but sometimes it’s valuable to
have them available. Let’s see how they can improve our Angle type.
The pattern for each magic method is the same. For a given operation—say, addition—there is a special method name that starts with
double-underscores. For addition, it’s __add__()—the others also
have sensible names. All you have to do is define that method, and you
can use instances of your class with that operator.
When you discuss magic methods in face-to-face, verbal conversation,
you’ll find yourself saying things like “underscore underscore add
underscore underscore” over and over. That’s a lot
of syllables. So people in the Python community use a kind of verbal
abbreviation, with a word they invented: dunder. That’s not a real
word; Python people made it up. When you say “dunder foo”, it means
“underscore underscore foo underscore underscore”. This isn’t used in
writing, because it’s not needed—you can just write __foo__. But
at Python gatherings, you’ll sometimes hear people say it. Use it;
it saves you a lot of energy when talking.
Anyway, back to dunder add—I mean, __add__(). For operations like
addition—which take two values, and return a third—you write the
method like this:
def__add__(self,other):returnAngle(self.value+other.value)
The first argument needs to be called self, because this is
Python. The second does not have to be called other, but often
is. This lets us use the normal addition operator for arithmetic:
>>>total=Angle(30)+Angle(45)>>>total.value75
There are similar operators for subtraction, multiplication, and so on, as shown in Table 6-1:
| Method | Operation |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Essentially, Python translates a + b to a.__add__(b); a % b to
a.__mod__(b); and so on. You can also hook into bit-operation operators (see Table 6-2).
| Method | Operation |
|---|---|
|
|
|
|
|
|
|
|
|
|
So a & b translates to a.__and__(b), for example.
Since __and__() corresponds to the bitwise-and operator (for
expressions like “foo & bar”), you might wonder what the magic
method is for logical-and (“foo and bar”), or logical-or (“foo or bar”). Sadly, there is none; because of how Python’s Boolean logic
short-circuits, there is not really a good way to do magic methods for
them. For this reason, sometimes libraries will hijack the & and |
operators to mean logical and/or, instead of bitwise and/or.
The default representation of an Angle object isn’t very useful:
>>>Angle(30)<__main__.Angle object at 0x106df9198>
It tells us the type, and the hex object ID, but we’d rather it tell
us something about the value of the angle. There are two magic methods
that can help. The first is __str__(), which is used when printing a
result:
classAngle:# ...def__str__(self):returnf"{self.value}degrees"
The print() function uses this, and so do str() and the string
formatting operations:
>>>(Angle(30))30 degrees>>>(f"{Angle(30)+Angle(45)}")75 degrees>>>("{}".format(Angle(30)+Angle(45)))75 degrees>>>str(Angle(135))'135 degrees'>>>some_angle=Angle(45)>>>f"{some_angle}"'45 degrees'
Sometimes you want a string representation that is more precise, which
might be at odds with a human-friendly representation. Imagine you
have several subclasses (for instance, PitchAngle and YawAngle in
some kind of aircraft-related library), and you want an easy way to log
the exact type and arguments needed to recreate the object. Python
provides a second magic method for this purpose, called __repr__():
classAngle:# ...def__repr__(self):returnf"Angle({self.value})"
You access this by calling either the repr() built-in function, or
by passing the !r conversion to the formatting string:
>>>repr(Angle(75))'Angle(75)'>>>('{!r}'.format(Angle(30)+Angle(45)))Angle(75)>>>(f"{Angle(30)+Angle(45)!r}")Angle(75)
You can think of both of these as working like str(), but invoking
__repr__() instead of __str__().
The official guideline is that the output of __repr__() can be
passed to eval() to recreate the object exactly. It’s not enforced
by the language, and is not always practical, or even possible. But
when you can follow that guideline, it is useful for logging and
debugging.
We also want to be able to compare two Angle objects. The most basic
comparison is equality, provided by __eq__(). It should return True
or False:
classAngle:# ...def__eq__(self,other):returnself.value==other.value
If defined, this method is used by the == operator:
>>>Angle(3)==Angle(3)True>>>Angle(7)==Angle(1)False
By default, the == operator is based on the object ID. So an
expression like x == y evaluates to True if x and y have the
same ID, and otherwise evaluates to False. That is rarely useful:
>>>classBadAngle:...def__init__(self,value):...self.value=value...>>>BadAngle(3)==BadAngle(3)False
What’s left are the fuzzier comparison operations: less than, greater than, and so on. Python’s documentation calls these “rich comparison” methods, so you can feel wealthy when using them (see Table 6-3).
| Method | Operation |
|---|---|
|
less than ( |
|
less than or equal ( |
|
greater than ( |
|
greater than or equal ( |
For example:
classAngle:# ...def__gt__(self,other):returnself.value>other.value
Now the greater-than operator works correctly:
>>>Angle(100)>Angle(50)True
Similarly, with __ge__(), __lt__(), etc. If you don’t define these, you get an
error:
>>>BadAngle(8)>BadAngle(4)Traceback (most recent call last):File"<stdin>", line1, in<module>TypeError:unorderable types: BadAngle() > BadAngle()
__gt__() and __lt__() are reflections of each other. What that means
is that, in many cases, you only have to define one of them. Suppose
you implement __gt__() but not __lt__(), then do this:
>>>a1=Angle(3)>>>a2=Angle(7)>>>a1<a2True
This works thanks to some just-in-time introspection the Python
runtime does. The a1 < a2 is translated to a1.__lt__(a2). If
Angle.__lt__() is indeed defined, that method is executed, and the
expression evaluates to its return value.
a1 < a2 is true if and only if a2 > a1. For this reason, if
__lt__() does not exist, but __gt__() does, then Python will
rewrite the angle comparison: a1.__lt__(a2) becomes
a2.__gt__(a1). This is then evaluated, and the expression a1 < a2
is set to its return value. There are some situations where you will
need to define both, for example, if the comparison is based on several
member variables.
Magic methods are interesting enough, and quite handy when you need them. But depending on the kind of applications you work on, you will rarely need to define a class whose instances can be added, subtracted, or compared.
Things get much more interesting, though, when you don’t follow the rules.
Here’s a fascinating fact: methods like __add__() are supposed to
do addition. But it turns out that Python does not enforce this. And methods
like __gt__() are supposed to return True or False. But if you
write a __gt__() which returns something that isn’t a
bool…Python will not complain at all.
This creates amazing possibilities.
To illustrate, let me tell you about Pandas. You
probably know that Pandas is an excellent data-processing
library. It’s become extremely popular among data scientists who use
Python. Pandas has a convenient data
type called a DataFrame. It represents a two-dimensional collection
of data, organized into rows, with labeled columns:
importpandasdf=pandas.DataFrame({'A':[-137,22,-3,4,5],'B':[10,11,121,13,14],'C':[3,6,91,12,15],})
There are several ways to create a DataFrame; here I’ve chosen to
use a dictionary.4 The keys are column names; the values are lists,
which become that column’s data. So you visually rotate each list 90
degrees:
>>>(df)A B C0 -137 10 31 22 11 62 -3 121 913 4 13 124 5 14 15
The rows are numbered for you, and the columns nicely labeled in a
header. The A column, for example, has different positive and
negative numbers.
Now, one of the many useful things you can do with a DataFrame is
filter out rows meeting certain criteria. This doesn’t change the
original DataFrame; instead, it creates a new DataFrame, containing
just the rows you want. For example, you can say, “Give me the rows of
df in which the A column has a positive value”:
>>>positive_a=df[df.A>0]>>>(positive_a)A B C1 22 11 63 4 13 124 5 14 15
All you have to do is pass in “df.A > 0” in the square
brackets.
But there’s something weird going on here. Take a look at the line in
which positive_a is defined. Do you notice anything unusual there?
Anything strange?
Here’s what is odd: the expression “df.A > 0” ought to evaluate to
either True, or False. Right? It’s supposed to be a Boolean
value with exactly one bit of information. But the source
dataframe, df, has many rows. Real dataframes can easily have
tens of thousands, even millions of rows of data. There’s no way a
Boolean literal can express which of those rows to keep and which to
discard. How does this even work?
Turns out, it’s not Boolean at all:
>>>comparison=(df.A>0)>>>type(comparison)<class 'pandas.core.series.Series'>>>>(comparison)0 False1 True2 False3 True4 TrueName: A, dtype: bool
Yes, you can do that, thanks to Python’s dynamic type system. Python
translates “df.A > 0” into “df.A.__gt__(0)”. And that __gt__()
method doesn’t have to return a bool. In fact, in Pandas, it returns a
Series object (which is like a vector of data), containing True or
False for each row. And when that’s passed into df[]—the square
brackets being handled by the __getitem__() method—that Series
object is used to filter rows.
To see what this looks like, let’s re-invent part of the interface of
Pandas. I’ll create a library called fakepandas, which instead of
DataFrame has a type called Dataset:
classDataset:def__init__(self,data):self.data=dataself.labels=sorted(data.keys())def__getattr__(self,label:str):# Makes references like df.A work.returnColumn(label)# Plus some other methods.
If I have a Dataset object named ds, with a column named A, the
__getattr__() method causes references like ds.A to return a Column
object:
importoperatorclassColumn:def__init__(self,name):self.name=namedef__gt__(self,value):returnComparison(self.name,value,operator.gt)
This Column class has a __gt__() method, which makes expressions
like “ds.A > 0” return an instance of a class called
Comparison. It represents a lazy calculation for when the actual
filtering happens later. Notice its constructor arguments: a column
name, a threshold value, and a callable to implement the
comparison. (The operator module has a function called gt() that
takes two arguments, expressing a greater-than comparison).
You can even support complex filtering criteria like ds[ds.C + 2 <
ds.B]. It’s all possible by leveraging magic methods in these
unorthodox ways. If you care about the details, I wrote
an
article delving into that. My goal here isn’t to tell you how to
re-invent the Pandas interface, so much as to get you to realize
what’s possible.
Have you ever implemented a compiler? If so, you know the parsing
phase is a significant development challenge. Using Python magic
methods in this manner does much of the hard work of lexing and
parsing for you. And the best part is how natural and intuitive the
result can be for end users. You are essentially implementing a
domain-specific
language on top of regular Python syntax, but consistently enough that
people quickly become fluent and productive within its rules. They often
won’t even think to ask why the rules seem to be bent; they won’t
notice that “df.A > 0” isn’t acting like a Boolean. That’s a clear sign
of success. It means you have designed your library so well, other
developers become effortlessly productive.
Most Python users know how to write simple classes and methods. But as you can see, Python’s object system has a lot more to it than that. Learning more advanced OOP opens up great opportunities for you, and allows you to create code structures you would otherwise never produce. Take everything you learned in this chapter, and put it into action.
1 In OOP, when working with properties, we call the method which “reads” the current value the getter. And the method you call to set a new value is called the setter.
2 This is akin to protected in languages like Java. But unlike Java, Python does not enforce it. Instead, it is a convention you are expected to follow.
3 Technically, pub-sub is a more general architectural pattern that can apply to distributed systems. In contrast, the Observer pattern is always limited to what’s inside a single process. That is the scope we will focus on here.
4 Which you will rarely do in real code (you will ingest from a CSV file or something instead), but it is convenient for demonstrating here.