4. Objects, Types, and Protocols

Python programs manipulate objects of various types. There are a variety of built-in types such as numbers, strings, lists, sets, and dictionaries. In addition, you can make your own types using classes. This chapter describes the underlying Python object model and mechanisms that make all objects work. Particular attention is given to the “protocols” that define the core behavior of various objects.

4.1 Essential Concepts

Every piece of data stored in a program is an object. Each object has an identity, a type (also known as its class), and a value. For example, when you write a = 42, an integer object is created with the value of 42. The identity of the object is a number representing its location in memory; a is a label that refers to this specific location although the label is not part of the object itself.

The type of an object, also known as the object’s class, defines the object’s internal data representation as well as supported methods. When an object of a particular type is created, that object is called an instance of that type. After an instance is created, its identity does not change. If an object’s value can be modified, the object is said to be mutable. If the value cannot be modified, the object is said to be immutable. An object that holds references to other objects is said to be a container.

Objects are characterized by their attributes. An attribute is a value associated with an object that is accessed using the dot operator (.). An attribute might be a simple data value such as a number. However, an attribute could also be a function that is invoked to carry out some operation. Such functions are called methods. The following example illustrates access to attributes:

a = 34                # Create an integer
n = a.numerator       # Get the numerator (an attribute)

b = [1, 2, 3]         # Create a list
b.append(7)           # Add a new element using the append method

Objects may also implement various operators, such as the + operator. For example:

c = a + 10            # c = 13 + 4j
d = b + [4, 5]        # d = [1, 2, 3, 4, 5]

Although operators use a different syntax, they are ultimately mapped to methods. For example, writing a + 10 executes a method a.__add__(10).

4.2 Object Identity and Type

The built-in function id() returns the identity of an object. The identity is an integer that usually corresponds to the object’s location in memory. The is and is not operators compare the identities of two objects. type() returns the type of an object. Here’s an example of different ways you might compare two objects:

# Compare two objects
def compare(a, b):
    if a is b:
        print('same object')
    if a == b:
        print('same value')
    if type(a) is type(b):
        print('same type')

Here is how this function works:

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> compare(a, a)
same object
same value
same type
>>> compare(a, b)
same value
same type
>>> compare(a, [4,5,6])
same type
>>>

The type of an object is itself an object, known as object’s class. This object is uniquely defined and is always the same for all instances of a given type. Classes usually have names (list, int, dict, and so on) that can be used to create instances, perform type checking, and provide type hints. For example:

items = list()

if isinstance(items, list):
    items.append(item)
def removeall(items: list, item) -> list:
    return [i for i in items if i != item]

A subtype is a type defined by inheritance. It carries all of the features of the original type plus additional and/or redefined methods. Inheritance is discussed in more detail in Chapter 7, but here is an example of defining a subtype of list with a new method added to it:

class mylist(list):
    def removeall(self, val):
        return [i for i in self if i != val]

# Example
items = mylist([5, 8, 2, 7, 2, 13, 9])
x = items.removeall(2)
print(x)      # [5, 8, 7, 13, 9]

The isinstance(instance, type) function is the preferred way to check a value against a type because it is aware of subtypes. It can also check against many possible types. For example:

if isinstance(items, (list, tuple)):
    maxval = max(items)

Although type checks can be added to a program, this is often not as useful as you might imagine. For one, excessive checking impacts performance. Second, programs don’t always define objects that neatly fit into a nice type hierarchy. For instance, if the purpose of the isinstance(items, list) statement above is to test whether items is “list-like,” it won’t work with objects that have the same programming interface as a list but don’t directly inherit from the built-in list type (one example is deque from the collections module).

4.3 Reference Counting and Garbage Collection

Python manages objects through automatic garbage collection. All objects are reference-counted. An object’s reference count is increased whenever it’s assigned to a new name or placed in a container such as a list, tuple, or dictionary:

a = 37       # Creates an object with value 37
b = a        # Increases reference count on 37
c = []
c.append(b)  # Increases reference count on 37

This example creates a single object containing the value 37. a is a name that initially refers to the newly created object. When b is assigned a, b becomes a new name for the same object, and the object’s reference count increases. When you place b into a list, the object’s reference count increases again. Throughout the example, only one object corresponds to 37. All other operations are creating references to that object.

An object’s reference count is decreased by the del statement or whenever a reference goes out of scope or is reassigned. Here’s an example:

del a      # Decrease reference count of 37
b = 42     # Decrease reference count of 37
c[0] = 2.0 # Decrease reference count of 37

The current reference count of an object can be obtained using the sys.getrefcount() function. For example:

>>> a = 37
>>> import sys
>>> sys.getrefcount(a)
7
>>>

The reference count is often much higher than you might expect. For immutable data such as numbers and strings, the interpreter aggressively shares objects between different parts of the program in order to conserve memory. You just don’t notice that because the objects are immutable.

When an object’s reference count reaches zero, it is garbage-collected. However, in some cases a circular dependency may exist in a collection of objects that are no longer in use. Here’s an example:

a = { }
b = { }
a['b'] = b      # a contains reference to b
b['a'] = a      # b contains reference to a
del a
del b

In this example, the del statements decrease the reference count of a and b and destroy the names used to refer to the underlying objects. However, since each object contains a reference to the other, the reference count doesn’t drop to zero and the objects remain allocated. The interpreter won’t leak memory, but the destruction of the objects will be delayed until a cycle detector executes to find and delete the inaccessible objects. The cycle-detection algorithm runs periodically as the interpreter allocates more and more memory during execution. The exact behavior can be fine-tuned and controlled using functions in the gc standard library module. The gc.collect() function can be used to immediately invoke the cyclic garbage collector.

In most programs, garbage collection is something that simply happens without you having to think much about it. However, there are certain situations where manually deleting objects might make sense. One such scenario arises when working with gigantic data structures. For example, consider this code:

def some_calculation():
    data = create_giant_data_structure()
    # Use data for some part of a calculation
    ...
    # Release the data
    del data

    # Calculation continues
    ...

In this code, the use of the del data statement indicates that the data variable is no longer needed. If this causes the reference count to reach 0, the object is garbage-collected at that point. Without the del statement, the object persists for some indeterminate amount of time until the data variable goes out of scope at the end of the function. You might only notice this when trying to figure out why your program is using more memory than it ought to.

4.4 References and Copies

When a program makes an assignment such as b = a, a new reference to a is created. For immutable objects such as numbers and strings, this assignment appears to create a copy of a (even though this is not the case). However, the behavior appears quite different for mutable objects such as lists and dictionaries. Here’s an example:

>>> a = [1,2,3,4]
>>> b = a                 # b is a reference to a
>>> b is a
True
>>> b[2] = -100           # Change an element in b
>>> a                     # Notice how a also changed
[1, 2, -100, 4]
>>>

Since a and b refer to the same object in this example, a change made to one of the variables is reflected in the other. To avoid this, you have to create a copy of an object rather than a new reference.

Two types of copy operations are applied to container objects such as lists and dictionaries: a shallow copy and a deep copy. A shallow copy creates a new object, but populates it with references to the items contained in the original object. Here’s an example:

>>> a = [ 1, 2, [3,4] ]
>>> b = list(a)           # Create a shallow copy of a.
>>> b is a
False
>>> b.append(100)         # Append element to b.
>>> b
[1, 2, [3, 4], 100]
>>> a                     # Notice that a is unchanged
[1, 2, [3, 4]]
>>> b[2][0] = -100        # Modify an element inside b
>>> b
[1, 2, [-100, 4], 100]
>>> a                     # Notice the change inside a
[1, 2, [-100, 4]]
>>>

In this case, a and b are separate list objects, but the elements they contain are shared. Therefore, a modification to one of the elements of a also modifies an element of b, as shown.

A deep copy creates a new object and recursively copies all the objects it contains. There is no built-in operator to create deep copies of objects, but you can use the copy.deepcopy() function in the standard library:

>>> import copy
>>> a = [1, 2, [3, 4]]
>>> b = copy.deepcopy(a)
>>> b[2][0] = -100
>>> b
[1, 2, [-100, 4]]
>>> a                  # Notice that a is unchanged
[1, 2, [3, 4]]
>>>

Use of deepcopy() is actively discouraged in most programs. Copying of an object is slow and often unnecessary. Reserve deepcopy() for situations where you actually need a copy because you’re about to mutate data and you don’t want your changes to affect the original object. Also, be aware that deepcopy() will fail with objects that involve system or runtime state (such as open files, network connections, threads, generators, and so on).

4.5 Object Representation and Printing

Programs often need to display objects—for example, to show data to the user or print it for the purposes of debugging. If you supply an object x to the print(x) function or convert it to a string using str(x), you will generally get a “nice” human-readable representation of the object’s value. For example, consider an example involving dates:

>>> from datetime import date
>>> d = date(2012, 12, 21)
>>> print(d)
2012-12-21
>>> str(d)
'2012-12-21'
>>>

This “nice” representation of an object may not be sufficient for debugging. For example, in the output of the above code, there is no obvious way to know if the variable d is a date instance or a simple string containing the text '2012-12-21'. To get more information, use the repr(x) function that creates a string with a representation of the object that you would have to type out in source code to create it. For example:

>>> d = date(2012, 12, 21)
>>> repr(d)
'datetime.date(2012, 12, 21)'
>>> print(repr(d))
datetime.date(2012, 12, 21)
>>> print(f'The date is: {d!r}')
The date is: datetime.date(2012, 12, 21)
>>>

In string formatting, the !r suffix can be added to a value to produce its repr() value instead of the normal string conversion.

4.6 First-Class Objects

All objects in Python are said to be first-class. This means that all objects that can be assigned to a name can also be treated as data. As data, objects can be stored as variables, passed as arguments, returned from functions, compared against other objects, and more. For example, here is a simple dictionary containing two values:

items = {
    'number' : 42
    'text' : "Hello World"
}

The first-class nature of objects can be seen by adding some more unusual items to this dictionary:

items['func']  = abs            # Add the abs() function
import math
items['mod']   = math           # Add a module
items['error'] = ValueError     # Add an exception type
nums = [1,2,3,4]
items['append'] = nums.append   # Add a method of another object

In this example, the items dictionary now contains a function, a module, an exception, and a method of another object. If you want, you can use dictionary lookups on items in place of the original names and the code will still work. For example:

>>> items['func'](-45)         # Executes abs(-45)
45
>>> items['mod'].sqrt(4)       # Executes math.sqrt(4)
2.0
>>> try:
...     x = int('a lot')
... except items['error'] as e:   # Same as except ValueError as e
...     print("Couldn't convert")
...
Couldn't convert
>>> items['append'](100)       # Executes nums.append(100)
>>> nums
[1, 2, 3, 4, 100]
>>>

The fact that everything in Python is first-class is often not fully appreciated by newcomers. However, it can be used to write very compact and flexible code.

For example, suppose you have a line of text such as “ACME,100,490.10” and you want to convert it into a list of values with appropriate type conversions. Here’s a clever way to do it by creating a list of types (which are first-class objects) and executing a few common list-processing operations:

>>> line = 'ACME,100,490.10'
>>> column_types = [str, int, float]
>>> parts = line.split(',')
>>> row = [ty(val) for ty, val in zip(column_types, parts)]
>>> row
['ACME', 100, 490.1]
>>>

Placing functions or classes in a dictionary is a common technique for eliminating complex if-elif-else statements. For example, if you have code like this:

if format == 'text':
    formatter = TextFormatter()
elif format == 'csv':
    formatter = CSVFormatter()
elif format == 'html':
    formatter = HTMLFormatter()
else:
    raise RuntimeError('Bad format')

you could rewrite it using a dictionary:

_formats = {
   'text': TextFormatter,
   'csv': CSVFormatter,
   'html': HTMLFormatter
}

if format in _formats:
   formatter = _formats[format]()
else:
   raise RuntimeError('Bad format')

This latter form is also more flexible as new cases can be added by inserting more entries into the dictionary without having to modify a large if-elif-else statement block.

4.7 Using None for Optional or Missing Data

Sometimes programs need to represent an optional or missing value. None is a special instance reserved for this purpose. None is returned by functions that don’t explicitly return a value. None is also frequently used as the default value of optional arguments, so that the function can detect whether the caller has actually passed a value for that argument. None has no attributes and evaluates to False in Boolean expressions.

Internally, None is stored as a singleton—that is, there is only one None value in the interpreter. Therefore, a common way to test a value against None is to use the is operator like this:

if value is None:
   statements
   ...

Testing for None using the == operator also works, but it’s not recommended and might be flagged as a style error by code-checking tools.

4.8 Object Protocols and Data Abstraction

Most Python language features are defined by protocols. Consider the following function:

def compute_cost(unit_price, num_units):
    return unit_price * num_units

Now, ask yourself the question: What inputs are allowed? The answer is deceptively simple—everything is allowed! At first glance, this function looks like it might apply to numbers:

>>> compute_cost(1.25, 50)
62.5
>>>

Indeed, it works as expected. However, the function works with much more. You can use specialized numbers such as fractions or decimals:

>>> from fractions import Fraction
>>> compute_cost(Fraction(5, 4), 50)
Fraction(125, 2)
>>> from decimal import Decimal
>>> compute_cost(Decimal('1.25'), Decimal('50'))
Decimal('62.50')
>>>

Not only that—the function works with arrays and other complex structures from packages such as numpy. For example:

>>> import numpy as np
>>> prices = np.array([1.25, 2.10, 3.05])
>>> units = np.array([50, 20, 25])
>>> compute_cost(prices, quantities)
array([62.5 , 42.  , 76.25])
>>>

The function might even work in unexpected ways:

>>> compute_cost('a lot', 10)
'a lota lota lota lota lota lota lota lota lota lot'
>>>

And yet, certain combinations of types fail:

>>> compute_cost(Fraction(5, 4), Decimal('50'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in compute_cost
TypeError: unsupported operand type(s) for *: 'Fraction' and 'decimal.Decimal'
>>>

Unlike a compiler for a static language, Python does not verify correct program behavior in advance. Instead, the behavior of an object is determined by a dynamic process that involves the dispatch of so-called “special” or “magic” methods. The names of these special methods are always preceded and followed by double underscores (__). The methods are automatically triggered by the interpreter as a program executes. For example, the operation x * y is carried out by a method x.__mul__(y). The names of these methods and their corresponding operators are hard-wired. The behavior of any given object depends entirely on the set of special methods that it implements.

The next few sections describe the special methods associated with different categories of core interpreter features. These categories are sometimes called “protocols.” An object, including a user-defined class, may define any combination of these features to make the object behave in different ways.

4.9 Object Protocol

The methods in Table 4.1 are related to the overall management of objects. This includes object creation, initialization, destruction, and representation.

Table 4.1 Methods for Object Management

Method

Description

__new__(cls [,*args [,**kwargs]])

A static method called to create a new instance.

__init__(self [,*args [,**kwargs]])

Called to initialize a new instance after it’s been created.

__del__(self)

Called when an instance is being destroyed.

__repr__(self)

Create a string representation.

The __new__() and __init__() methods are used together to create and initialize instances. When an object is created by calling SomeClass(args), it is translated into the following steps:

x = SomeClass.__new__(SomeClass, args)
if isinstance(x, SomeClass):
    x.__init__(args)

Normally, these steps are handled behind the scenes and you don’t need to worry about it. The most common method implemented in a class is __init__(). Use of __new__() almost always indicates the presence of advanced magic related to instance creation (for example, it is used in class methods that want to bypass __init__() or in certain creational design patterns such as singletons or caching). The implementation of __new__() doesn’t necessarily need to return an instance of the class in question—if not, the subsequent call to __init__() on creation is skipped.

The __del__() method is invoked when an instance is about to be garbage-collected. This method is invoked only when an instance is no longer in use. Note that the statement del x only decrements the instance reference count and doesn’t necessarily result in a call to this function. __del__() is almost never defined unless an instance needs to perform additional resource management steps upon destruction.

The __repr__() method, called by the built-in repr() function, creates a string representation of an object that can be useful for debugging and printing. This is also the method responsible for creating the output of values you see when inspecting variables in the interactive interpreter. The convention is for __repr__() to return an expression string that can be evaluated to re-create the object using eval(). For example:

a = [2, 3, 4, 5]   # Create a list
s = repr(a)        # s = '[2, 3, 4, 5]'
b = eval(s)        # Turns s back into a list

If a string expression cannot be created, the convention is for __repr__() to return a string of the form <...message...>, as shown here:

f = open('foo.txt')
a = repr(f)
# a = "<_io.TextIOWrapper name='foo.txt' mode='r' encoding='UTF-8'>

4.10 Number Protocol

Table 4.2 lists special methods that objects must implement to provide mathematical operations.

Table 4.2 Methods for Mathematical Operations

Method

Operation

__add__(self, other)

self + other

__sub__(self, other)

self - other

__mul__(self, other)

self * other

__truediv__(self, other)

self / other

__floordiv__(self, other)

self // other

__mod__(self, other)

self % other

__matmul__(self, other)

self @ other

__divmod__(self, other)

divmod(self, other)

__pow__(self, other [, modulo])

self ** other, pow(self, other, modulo)

__lshift__(self, other)

self << other

__rshift__(self, other)

self >> other

__and__(self, other)

self & other

__or__(self, other)

self | other

__xor__(self, other)

self ^ other

__radd__(self, other)

other + self

__rsub__(self, other)

other - self

__rmul__(self, other)

other * self

__rtruediv__(self, other)

other / self

__rfloordiv__(self, other)

other // self

__rmod__(self, other)

other % self

__rmatmul__(self, other)

other @ self

__rdivmod__(self, other)

divmod(other, self)

__rpow__(self, other)

other ** self

__rlshift__(self, other)

other << self

__rrshift__(self, other)

other >> self

__rand__(self, other)

other & self

__ror__(self, other)

other | self

__rxor__(self, other)

other ^ self

__iadd__(self, other)

self += other

__isub__(self, other)

self -= other

__imul__(self, other)

self *= other

__itruediv__(self, other)

self /= other

__ifloordiv__(self, other)

self //= other

__imod__(self, other)

self %= other

__imatmul__(self, other)

self @= other

__ipow__(self, other)

self **= other

__iand__(self, other)

self &= other

__ior__(self, other)

self |= other

__ixor__(self, other)

self ^= other

__ilshift__(self, other)

self <<= other

__irshift__(self, other)

self >>= other

__neg__(self)

–self

__pos__(self)

+self

__invert__(self)

~self

__abs__(self)

abs(self)

__round__(self, n)

round(self, n)

__floor__(self)

math.floor(self)

__ceil__(self)

math.ceil(self)

__trunc__(self)

math.trunc(self)

When presented with an expression such as x + y, the interpreter invokes a combination of the methods x.__add__(y) or y.__radd__(x) to carry out the operation. The initial choice is to try x.__add__(y) in all cases except for the special case where y happens to be a subtype of x; in that case, y.__radd__(x) executes first. If the initial method fails by returning NotImplemented, an attempt is made to invoke the operation with reversed operands such as y.__radd__(x). If this second attempt fails, the entire operation fails. Here is an example:

>>> a = 42       # int
>>> b = 3.7      # float
>>> a.__add__(b)
NotImplemented
>>> b.__radd__(a)
45.7
>>>

This example might seem surprising but it reflects the fact that integers don’t actually know anything about floating-point numbers. However, floating-point numbers do know about integers—as integers are, mathematically, a special kind of floating-point numbers. Thus, the reversed operand produces the correct answer.

The methods __iadd__(), __isub__(), and so forth are used to support in-place arithmetic operators such as a += b and a -= b (also known as augmented assignment). A distinction is made between these operators and the standard arithmetic methods because the implementation of the in-place operators might be able to provide certain customizations or performance optimizations. For instance, if the object is not shared, the value of an object could be modified in place without allocating a newly created object for the result. If the in-place operators are left undefined, an operation such as a += b is evaluated using a = a + b instead.

There are no methods that can be used to define the behavior of the logical and, or, or not operators. The and and or operators implement short-circuit evaluation where evaluation stops if the final result can already be determined. For example:

>>> True or 1 /0      # Does not evaluate 1/0
True
>>>

This behavior involving unevaluated subexpressions can’t be expressed using the evaluation rules of a normal function or method. Thus, there is no protocol or set of methods for redefining it. Instead, it is handled as a special case deep inside the implementation of Python itself.

4.11 Comparison Protocol

Objects can be compared in various ways. The most basic check is an identity check with the is operator. For example, a is b. Identity does not consider the values stored inside of an object, even if they happen to be the same. For example:

>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> c = [1, 2, 3]
>>> a is c
False
>>>

The is operator is an internal part of Python that can’t be redefined. All other comparisons on objects are implemented by the methods in Table 4.3.

Table 4.3 Methods for Instance Comparison and Hashing

Method

Description

__bool__(self)

Returns False or True for truth-value testing

__eq__(self, other)

self == other

__ne__(self, other)

self != other

__lt__(self, other)

self < other

__le__(self, other)

self <= other

__gt__(self, other)

self > other

__ge__(self, other)

self >= other

__hash__(self)

Computes an integer hash index

The __bool__() method, if present, is used to determine the truth value when an object is tested as part of a condition or conditional expression. For example:

if a:              # Executes a.__bool__()
   ...
else:
   ...

If __bool__() is undefined, then __len__() is used as a fallback. If both __bool__() and __len__() are undefined, an object is simply considered to be True.

The __eq__() method is used to determine basic equality for use with the == and != operators. The default implementation of __eq__() compares objects by identity using the is operator. The __ne__() method, if present, can be used to implement special processing for !=, but is usually not required as long as __eq__() is defined.

Ordering is determined by the relational operators (<, >, <=, and >=) using methods such as __lt__() and __gt__(). As with other mathematical operations, the evaluation rules are subtle. To evaluate a < b, the interpreter will first try to execute a.__lt__(b) except where b is a subtype of a. In that one specific case, b.__gt__(a) executes instead. If this initial method is not defined or returns NotImplemented, the interpreter tries a reversed comparison, calling b.__gt__(a). Similar rules apply to operators such as <= and >=. For example, evaluating <= first tries to evaluate a.__le__(b). If not implemented, b.__ge__(a) is tried.

Each of the comparison methods takes two arguments and is allowed to return any kind of value, including a Boolean value, a list, or any other Python type. For instance, a numerical package might use this to perform an element-wise comparison of two matrices, returning a matrix with the results. If comparison is not possible, the methods should return the built-in object NotImplemented. This is not the same as the NotImplementedError exception. For example:

>>> a = 42      # int
>>> b = 52.3    # float
>>> a.__lt__(b)
NotImplemented
>>> b.__gt__(a)
True
>>>

It is not necessary for an ordered object to implement all of the comparison operations in Table 4.3. If you want to be able to sort objects or use functions such as min() or max(), then __lt__() must be minimally defined. If you are adding comparison operators to a user-defined class, the @total_ordering class decorator in the functools module may be of some use. It can generate all of the methods as long as you minimally implement __eq__() and one of the other comparisons.

The __hash__() method is defined on instances that are to be placed into a set or be used as keys in a mapping (dictionary). The value returned is an integer that should be the same for two instances that compare as equal. Moreover, __eq__() should always be defined together with __hash__() because the two methods work together. The value returned by __hash__() is typically used as an internal implementation detail of various data structures. However, it’s possible for two different objects to have the same hash value. Therefore, __eq__() is necessary to resolve potential collisions.

4.12 Conversion Protocols

Sometimes, you must convert an object to a built-in type such as a string or a number. The methods in Table 4.4 can be defined for this purpose.

Table 4.4 Methods for Conversions

Method

Description

__str__(self)

Conversion to a string

__bytes__(self)

Conversion to bytes

__format__(self, format_spec)

Creates a formatted representation

__bool__(self)

bool(self)

__int__(self)

int(self)

__float__(self)

float(self)

__complex__(self)

complex(self)

__index__(self)

Conversion to a integer index [self]

The __str__() method is called by the built-in str() function and by functions related to printing. The __format__() method is called by the format() function or the format() method of strings. The format_spec argument is a string containing the format specification. This string is the same as the format_spec argument to format(). For example:

f'{x:spec}'                 # Calls x.__format__('spec')
format(x, 'spec')           # Calls x.__format__('spec')
'x is {0:spec}' .format(x)  # Calls x.__format__('spec')

The syntax of the format specification is arbitrary and can be customized on an object-by-object basis. However, there is a standard set of conventions used for the built-in types. More information about string formatting, including the general format of the specifier, can be found in Chapter 9.

The __bytes__() method is used to create a byte representation if an instance is passed to bytes(). Not all types support byte conversion.

The numeric conversions __bool__(), __int__(), __float__(), and __complex__() are expected to produce a value of the corresponding built-in type.

Python never performs implicit type conversions using these methods. Thus, even if an object x implements an __int__() method, the expression 3 + x will still produce a TypeError. The only way to execute __int__() is through an explicit use of the int() function.

The __index__() method performs an integer conversion of an object when it’s used in an operation that requires an integer value. This includes indexing in sequence operations. For example, if items is a list, performing an operation such as items[x] will attempt to execute items[x.__index__()] if x is not an integer. __index__() is also used in various base conversions such as oct(x) and hex(x).

4.13 Container Protocol

The methods in Table 4.5 are used by objects that want to implement containers of various kinds—lists, dicts, sets, and so on.

Table 4.5 Methods for Containers

Method

Description

__len__(self)

Returns the length of self

__getitem__(self, key)

Returns self[key]

__setitem__(self, key, value)

Sets self[key] = value

__delitem__(self, key)

Deletes self[key]

__contains__(self, obj)

obj in self

Here’s an example:

a = [1, 2, 3, 4, 5, 6]
len(a)               # a.__len__()
x = a[2]             # x = a.__getitem__(2)
a[1] = 7             # a.__setitem__(1,7)
del a[2]             # a.__delitem__(2)
5 in a               # a.__contains__(5)

The __len__() method is called by the built-in len() function to return a nonnegative length. This function also determines truth values unless the __bool__() method has also been defined.

For accessing individual items, the __getitem__() method can return an item by key value. The key can be any Python object, but it is expected to be an integer for ordered sequences such as lists and arrays. The __setitem__() method assigns a value to an element. The __delitem__() method is invoked whenever the del operation is applied to a single element. The __contains__() method is used to implement the in operator.

Slicing operations such as x = s[i:j] are also implemented using __getitem__(), __setitem__(), and __delitem__(). For slices, a special slice instance is passed as the key. This instance has attributes that describe the range of the slice being requested. For example:

a = [1,2,3,4,5,6]
x = a[1:5]           # x = a.__getitem__(slice(1, 5, None))
a[1:3] = [10,11,12]  # a.__setitem__(slice(1, 3, None), [10, 11, 12])
del a[1:4]           # a.__delitem__(slice(1, 4, None))

The slicing features of Python are more powerful than many programmers realize. For example, the following variations of extended slicing are all supported and may be useful for working with multidimensional data structures such as matrices and arrays:

a = m[0:100:10]          # Strided slice (step=10)
b = m[1:10, 3:20]        # Multidimensional slice
c = m[0:100:10, 50:75:5] # Multiple dimensions with strides
m[0:5, 5:10] = n         # extended slice assignment
del m[:10, 15:]          # extended slice deletion

The general format for each dimension of an extended slice is i:j[:stride], where stride is optional. As with ordinary slices, you can omit the starting or ending values for each part of a slice.

In addition, the Ellipsis (written as ...) is available to denote any number of trailing or leading dimensions in an extended slice:

a = m[..., 10:20]    # extended slice access with Ellipsis
m[10:20, ...] = n

When using extended slices, the __getitem__(), __setitem__(), and __delitem__() methods implement access, modification, and deletion, respectively. However, instead of an integer, the value passed to these methods is a tuple containing a combination of slice or Ellipsis objects. For example,

a = m[0:10, 0:100:5, ...]

invokes __getitem__() as follows:

a = m.__getitem__((slice(0,10,None), slice(0,100,5), Ellipsis))

Python strings, tuples, and lists currently provide some support for extended slices. No part of Python or its standard library make use of multidimensional slicing or the Ellipsis. Those features are reserved purely for third-party libraries and frameworks. Perhaps the most common place you would see them used is in a library such as numpy.

4.14 Iteration Protocol

If an instance, obj, supports iteration, it provides a method, obj.__iter__(), that returns an iterator. An iterator iter, in turn, implements a single method, iter.__next__(), that returns the next object or raises StopIteration to signal the end of iteration. These methods are used by the implementation of the for statement as well as other operations that implicitly perform iteration. For example, the statement for x in s is carried out by performing these steps:

_iter = s.__iter__()
while True:
    try:
         x = _iter.__next__()
    except StopIteration:
         break
    # Do statements in body of for loop
    ...

An object may optionally provide a reversed iterator if it implements the __reversed__() special method. This method should return an iterator object with the same interface as a normal iterator (that is, a __next__() method that raises StopIteration at the end of iteration). This method is used by the built-in reversed() function. For example:

>>> for x in reversed([1,2,3]):
...     print(x)
3
2
1
>>>

A common implementation technique for iteration is to use a generator function involving yield. For example:

class FRange:
    def __init__(self, start, stop, step):
        self.start = start
        self.stop = stop
        self.step = step

    def __iter__(self):
        x = self.start
        while x < self.stop:
            yield x
            x += self.step

# Example use:
nums = FRange(0.0, 1.0, 0.1)
for x in nums:
    print(x)     # 0.0, 0.1, 0.2, 0.3, ...

This works because generator functions conform to the iteration protocol themselves. It’s a bit easier to implement an iterator in this way since you only have to worry about the __iter__() method. The rest of the iteration machinery is already provided by the generator.

4.15 Attribute Protocol

The methods in Table 4.6 read, write, and delete the attributes of an object using the dot (.) operator and the del operator, respectively.

Table 4.6 Methods for Attribute Access

Method

Description

__getattribute__(self, name)

Returns the attribute self.name

__getattr__(self, name)

Returns the attribute self.name if it’s not found through __getattribute__()

__setattr__(self, name, value)

Sets the attribute self.name = value

__delattr__(self, name)

Deletes the attribute del self.name

Whenever an attribute is accessed, the __getattribute__() method is invoked. If the attribute is located, its value is returned. Otherwise, the __getattr__() method is invoked. The default behavior of __getattr__() is to raise an AttributeError exception. The __setattr__() method is always invoked when setting an attribute, and the __delattr__() method is always invoked when deleting an attribute.

These methods are fairly blunt—in that they allow a type to completely redefine attribute access for all attributes. User-defined classes can define properties and descriptors which allow for more fine-grained control of attribute access. This is discussed further in Chapter 7.

4.16 Function Protocol

An object can emulate a function by providing the __call__() method. If an object, x, provides this method, it can be invoked like a function. That is, x(arg1, arg2, ...) invokes x.__call__(arg1, arg2, ...).

There are many built-in types that support function calls. For example, types implement __call__() to create new instances. Bound methods implement __call__() to pass the self argument to instance methods. Library functions such as functools.partial() also create objects that emulate functions.

4.17 Context Manager Protocol

The with statement allows a sequence of statements to execute under the control of an instance known as a context manager. The general syntax is as follows:

with context [ as var]:
     statements

A context object shown here is expected to implement the methods listed in Table 4.7.

Table 4.7 Methods for Context Managers

Method

Description

__enter__(self)

Called when entering a new context. The return value is placed in the variable listed with the as specifier to the with statement.

__exit__(self, type, value, tb)

Called when leaving a context. If an exception occurred, type, value, and tb have the exception type, value, and traceback information.

The __enter__() method is invoked when the with statement executes. The value returned by this method is placed into the variable specified with the optional as var specifier. The __exit__() method is called as soon as control flow leaves the block of statements associated with the with statement. As arguments, __exit__() receives the current exception type, value, and a traceback if an exception has been raised. If no errors are being handled, all three values are set to None. The __exit__() method should return True or False to indicate if a raised exception was handled or not. If True is returned, any pending exception is cleared and program execution continues normally with the first statement after the with block.

The primary use of the context management interface is to allow for simplified resource control on objects involving system state such as open files, network connections, and locks. By implementing this interface, an object can safely clean up resources when execution leaves a context in which an object is being used. Further details are found in Chapter 3.

4.18 Final Words: On Being Pythonic

A commonly cited design goal is to write code that is “Pythonic.” That can mean many things, but basically it encourages you to follow established idioms used by the rest of Python. That means knowing Python’s protocols for containers, iterables, resource management, and so forth. Many of Python’s most popular frameworks use these protocols to provide good user experience. You should strive for that as well.

Of the different protocols, three deserve special attention because of their widespread use. One is creating a proper object representation using the __repr__() method. Python programs are often debugged and experimented with at the interactive REPL. It is also common to output objects using print() or a logging library. If you make it easy to observe the state of your objects, it will make all of these things easier.

Second, iterating over data is one of the most common programming tasks. If you’re going to do it, you should make your code work with Python’s for statement. Many core parts of Python and the standard library are designed to work with iterable objects. By supporting iteration in the usual way, you’ll automatically get a significant amount of extra functionality and your code will be intuitive to other programmers.

Finally, use context managers and the with statement for the common programming pattern where statements get sandwiched between some kind of startup and teardown steps—for example, opening and closing resources, acquiring and releasing locks, subscribing and unsubscribing, and so on.