Chapter 7. Functions

Defining functions using the def statement is a cornerstone of all programs. The goal of this chapter is to present some more advanced and unusual function definition and usage patterns. Topics include default arguments, functions that take any number of arguments, keyword-only arguments, annotations, and closures. In addition, some tricky control flow and data passing problems involving callback functions are addressed.

7.4. Returning Multiple Values from a Function

7.5. Defining Functions with Default Arguments

Defining functions with default arguments is easy, but there is a bit more to it than meets the eye.

First, the values assigned as a default are bound only once at the time of function definition. Try this example to see it:

>>> x = 42
>>> def spam(a, b=x):
...     print(a, b)
...
>>> spam(1)
1 42
>>> x = 23     # Has no effect
>>> spam(1)
1 42
>>>

Notice how changing the variable x (which was used as a default value) has no effect whatsoever. This is because the default value was fixed at function definition time.

Second, the values assigned as defaults should always be immutable objects, such as None, True, False, numbers, or strings. Specifically, never write code like this:

def spam(a, b=[]):     # NO!
    ...

If you do this, you can run into all sorts of trouble if the default value ever escapes the function and gets modified. Such changes will permanently alter the default value across future function calls. For example:

>>> def spam(a, b=[]):
...     print(b)
...     return b
...
>>> x = spam(1)
>>> x
[]
>>> x.append(99)
>>> x.append('Yow!')
>>> x
[99, 'Yow!']
>>> spam(1)       # Modified list gets returned!
[99, 'Yow!']
>>>

That’s probably not what you want. To avoid this, it’s better to assign None as a default and add a check inside the function for it, as shown in the solution.

The use of the is operator when testing for None is a critical part of this recipe. Sometimes people make this mistake:

def spam(a, b=None):
    if not b:      # NO! Use 'b is None' instead
        b = []
    ...

The problem here is that although None evaluates to False, many other objects (e.g., zero-length strings, lists, tuples, dicts, etc.) do as well. Thus, the test just shown would falsely treat certain inputs as missing. For example:

>>> spam(1)        # OK
>>> x = []
>>> spam(1, x)     # Silent error. x value overwritten by default
>>> spam(1, 0)     # Silent error. 0 ignored
>>> spam(1, '')    # Silent error. '' ignored
>>>

The last part of this recipe is something that’s rather subtle—a function that tests to see whether a value (any value) has been supplied to an optional argument or not. The tricky part here is that you can’t use a default value of None, 0, or False to test for the presence of a user-supplied argument (since all of these are perfectly valid values that a user might supply). Thus, you need something else to test against.

To solve this problem, you can create a unique private instance of object, as shown in the solution (the _no_value variable). In the function, you then check the identity of the supplied argument against this special value to see if an argument was supplied or not. The thinking here is that it would be extremely unlikely for a user to pass the _no_value instance in as an input value. Therefore, it becomes a safe value to check against if you’re trying to determine whether an argument was supplied or not.

The use of object() might look rather unusual here. object is a class that serves as the common base class for almost all objects in Python. You can create instances of object, but they are wholly uninteresting, as they have no notable methods nor any instance data (because there is no underlying instance dictionary, you can’t even set any attributes). About the only thing you can do is perform tests for identity. This makes them useful as special values, as shown in the solution.

This recipe is really related to the problem of making seemingly incompatible bits of code work together. A series of examples will help illustrate.

As a first example, suppose you have a list of points represented as tuples of (x,y) coordinates. You could use the following function to compute the distance between two points:

points = [ (1, 2), (3, 4), (5, 6), (7, 8) ]

import math
def distance(p1, p2):
    x1, y1 = p1
    x2, y2 = p2
    return math.hypot(x2 - x1, y2 - y1)

Now suppose you want to sort all of the points according to their distance from some other point. The sort() method of lists accepts a key argument that can be used to customize sorting, but it only works with functions that take a single argument (thus, distance() is not suitable). Here’s how you might use partial() to fix it:

>>> pt = (4, 3)
>>> points.sort(key=partial(distance,pt))
>>> points
[(3, 4), (1, 2), (5, 6), (7, 8)]
>>>

As an extension of this idea, partial() can often be used to tweak the argument signatures of callback functions used in other libraries. For example, here’s a bit of code that uses multiprocessing to asynchronously compute a result which is handed to a callback function that accepts both the result and an optional logging argument:

def output_result(result, log=None):
    if log is not None:
        log.debug('Got: %r', result)

# A sample function
def add(x, y):
    return x + y

if __name__ == '__main__':
    import logging
    from multiprocessing import Pool
    from functools import partial

    logging.basicConfig(level=logging.DEBUG)
    log = logging.getLogger('test')

    p = Pool()
    p.apply_async(add, (3, 4), callback=partial(output_result, log=log))
    p.close()
    p.join()

When supplying the callback function using apply_async(), the extra logging argument is given using partial(). multiprocessing is none the wiser about all of this—it simply invokes the callback function with a single value.

As a similar example, consider the problem of writing network servers. The socketserver module makes it relatively easy. For example, here is a simple echo server:

from socketserver import StreamRequestHandler, TCPServer

class EchoHandler(StreamRequestHandler):
    def handle(self):
        for line in self.rfile:
            self.wfile.write(b'GOT:' + line)

serv = TCPServer(('', 15000), EchoHandler)
serv.serve_forever()

However, suppose you want to give the EchoHandler class an __init__() method that accepts an additional configuration argument. For example:

class EchoHandler(StreamRequestHandler):
    # ack is added keyword-only argument. *args, **kwargs are
    # any normal parameters supplied (which are passed on)
    def __init__(self, *args, ack, **kwargs):
        self.ack = ack
        super().__init__(*args, **kwargs)
    def handle(self):
        for line in self.rfile:
            self.wfile.write(self.ack + line)

If you make this change, you’ll find there is no longer an obvious way to plug it into the TCPServer class. In fact, you’ll find that the code now starts generating exceptions like this:

Exception happened during processing of request from ('127.0.0.1', 59834)
Traceback (most recent call last):
 ...
TypeError: __init__() missing 1 required keyword-only argument: 'ack'

At first glance, it seems impossible to fix this code, short of modifying the source code to socketserver or coming up with some kind of weird workaround. However, it’s easy to resolve using partial()—just use it to supply the value of the ack argument, like this:

from functools import partial
serv = TCPServer(('', 15000), partial(EchoHandler, ack=b'RECEIVED:'))
serv.serve_forever()

In this example, the specification of the ack argument in the __init__() method might look a little funny, but it’s being specified as a keyword-only argument. This is discussed further in Recipe 7.2.

The functionality of partial() is sometimes replaced with a lambda expression. For example, the previous examples might use statements such as this:

points.sort(key=lambda p: distance(pt, p))

p.apply_async(add, (3, 4), callback=lambda result: output_result(result,log))

serv = TCPServer(('', 15000),
                 lambda *args, **kwargs: EchoHandler(*args,
                                                     ack=b'RECEIVED:',
                                                     **kwargs))

This code works, but it’s more verbose and potentially a lot more confusing to someone reading it. Using partial() is a bit more explicit about your intentions (supplying values for some of the arguments).)

This recipe pertains to the use of callback functions that are found in many libraries and frameworks—especially those related to asynchronous processing. To illustrate and for the purposes of testing, define the following function, which invokes a callback:

def apply_async(func, args, *, callback):
    # Compute the result
    result = func(*args)

    # Invoke the callback with the result
    callback(result)

In reality, such code might do all sorts of advanced processing involving threads, processes, and timers, but that’s not the main focus here. Instead, we’re simply focused on the invocation of the callback. Here’s an example that shows how the preceding code gets used:

>>> def print_result(result):
...     print('Got:', result)
...
>>> def add(x, y):
...     return x + y
...
>>> apply_async(add, (2, 3), callback=print_result)
Got: 5
>>> apply_async(add, ('hello', 'world'), callback=print_result)
Got: helloworld
>>>

As you will notice, the print_result() function only accepts a single argument, which is the result. No other information is passed in. This lack of information can sometimes present problems when you want the callback to interact with other variables or parts of the environment.

One way to carry extra information in a callback is to use a bound-method instead of a simple function. For example, this class keeps an internal sequence number that is incremented every time a result is received:

class ResultHandler:
    def __init__(self):
        self.sequence = 0
    def handler(self, result):
        self.sequence += 1
        print('[{}] Got: {}'.format(self.sequence, result))

To use this class, you would create an instance and use the bound method handler as the callback:

>>> r = ResultHandler()
>>> apply_async(add, (2, 3), callback=r.handler)
[1] Got: 5
>>> apply_async(add, ('hello', 'world'), callback=r.handler)
[2] Got: helloworld
>>>

As an alternative to a class, you can also use a closure to capture state. For example:

def make_handler():
    sequence = 0
    def handler(result):
        nonlocal sequence
        sequence += 1
        print('[{}] Got: {}'.format(sequence, result))
    return handler

Here is an example of this variant:

>>> handler = make_handler()
>>> apply_async(add, (2, 3), callback=handler)
[1] Got: 5
>>> apply_async(add, ('hello', 'world'), callback=handler)
[2] Got: helloworld
>>>

As yet another variation on this theme, you can sometimes use a coroutine to accomplish the same thing:

def make_handler():
    sequence = 0
    while True:
        result = yield
        sequence += 1
        print('[{}] Got: {}'.format(sequence, result))

For a coroutine, you would use its send() method as the callback, like this:

>>> handler = make_handler()
>>> next(handler)        # Advance to the yield
>>> apply_async(add, (2, 3), callback=handler.send)
[1] Got: 5
>>> apply_async(add, ('hello', 'world'), callback=handler.send)
[2] Got: helloworld
>>>

Last, but not least, you can also carry state into a callback using an extra argument and partial function application. For example:

>>> class SequenceNo:
...     def __init__(self):
...         self.sequence = 0
...
>>> def handler(result, seq):
...     seq.sequence += 1
...     print('[{}] Got: {}'.format(seq.sequence, result))
...
>>> seq = SequenceNo()
>>> from functools import partial
>>> apply_async(add, (2, 3), callback=partial(handler, seq=seq))
[1] Got: 5
>>> apply_async(add, ('hello', 'world'), callback=partial(handler, seq=seq))
[2] Got: helloworld
>>>

Software based on callback functions often runs the risk of turning into a huge tangled mess. Part of the issue is that the callback function is often disconnected from the code that made the initial request leading to callback execution. Thus, the execution environment between making the request and handling the result is effectively lost. If you want the callback function to continue with a procedure involving multiple steps, you have to figure out how to save and restore the associated state.

There are really two main approaches that are useful for capturing and carrying state. You can carry it around on an instance (attached to a bound method perhaps) or you can carry it around in a closure (an inner function). Of the two techniques, closures are perhaps a bit more lightweight and natural in that they are simply built from functions. They also automatically capture all of the variables being used. Thus, it frees you from having to worry about the exact state needs to be stored (it’s determined automatically from your code).

If using closures, you need to pay careful attention to mutable variables. In the solution, the nonlocal declaration is used to indicate that the sequence variable is being modified from within the callback. Without this declaration, you’ll get an error.

The use of a coroutine as a callback handler is interesting in that it is closely related to the closure approach. In some sense, it’s even cleaner, since there is just a single function. Moreover, variables can be freely modified without worrying about nonlocal declarations. The potential downside is that coroutines don’t tend to be as well understood as other parts of Python. There are also a few tricky bits such as the need to call next() on a coroutine prior to using it. That’s something that could be easy to forget in practice. Nevertheless, coroutines have other potential uses here, such as the definition of an inlined callback (covered in the next recipe).

The last technique involving partial() is useful if all you need to do is pass extra values into a callback. Instead of using partial(), you’ll sometimes see the same thing accomplished with the use of a lambda:

>>> apply_async(add, (2, 3), callback=lambda r: handler(r, seq))
[1] Got: 5
>>>

For more examples, see Recipe 7.8, which shows how to use partial() to change argument signatures.

Callback functions can be inlined into a function using generators and coroutines. To illustrate, suppose you have a function that performs work and invokes a callback as follows (see Recipe 7.10):

def apply_async(func, args, *, callback):
    # Compute the result
    result = func(*args)

    # Invoke the callback with the result
    callback(result)

Now take a look at the following supporting code, which involves an Async class and an inlined_async decorator:

from queue import Queue
from functools import wraps

class Async:
    def __init__(self, func, args):
        self.func = func
        self.args = args

def inlined_async(func):
    @wraps(func)
    def wrapper(*args):
        f = func(*args)
        result_queue = Queue()
        result_queue.put(None)
        while True:
            result = result_queue.get()
            try:
                a = f.send(result)
                apply_async(a.func, a.args, callback=result_queue.put)
            except StopIteration:
                break
    return wrapper

These two fragments of code will allow you to inline the callback steps using yield statements. For example:

def add(x, y):
    return x + y

@inlined_async
def test():
    r = yield Async(add, (2, 3))
    print(r)
    r = yield Async(add, ('hello', 'world'))
    print(r)
    for n in range(10):
        r = yield Async(add, (n, n))
        print(r)
    print('Goodbye')

If you call test(), you’ll get output like this:

5
helloworld
0
2
4
6
8
10
12
14
16
18
Goodbye

Aside from the special decorator and use of yield, you will notice that no callback functions appear anywhere (except behind the scenes).

Discussion

This recipe will really test your knowledge of callback functions, generators, and control flow.

First, in code involving callbacks, the whole point is that the current calculation will suspend and resume at some later point in time (e.g., asynchronously). When the calculation resumes, the callback will get executed to continue the processing. The apply_async() function illustrates the essential parts of executing the callback, although in reality it might be much more complicated (involving threads, processes, event handlers, etc.).

The idea that a calculation will suspend and resume naturally maps to the execution model of a generator function. Specifically, the yield operation makes a generator function emit a value and suspend. Subsequent calls to the __next__() or send() method of a generator will make it start again.

With this in mind, the core of this recipe is found in the inline_async() decorator function. The key idea is that the decorator will step the generator function through all of its yield statements, one at a time. To do this, a result queue is created and initially populated with a value of None. A loop is then initiated in which a result is popped off the queue and sent into the generator. This advances to the next yield, at which point an instance of Async is received. The loop then looks at the function and arguments, and initiates the asynchronous calculation apply_async(). However, the sneakiest part of this calculation is that instead of using a normal callback function, the callback is set to the queue put() method.

At this point, it is left somewhat open as to precisely what happens. The main loop immediately goes back to the top and simply executes a get() operation on the queue. If data is present, it must be the result placed there by the put() callback. If nothing is there, the operation blocks, waiting for a result to arrive at some future time. How that might happen depends on the precise implementation of the apply_async() function.

If you’re doubtful that anything this crazy would work, you can try it with the multiprocessing library and have async operations executed in separate processes:

if __name__ == '__main__':
    import multiprocessing
    pool = multiprocessing.Pool()
    apply_async = pool.apply_async

    # Run the test function
    test()

Indeed, you’ll find that it works, but unraveling the control flow might require more coffee.

Hiding tricky control flow behind generator functions is found elsewhere in the standard library and third-party packages. For example, the @contextmanager decorator in the contextlib performs a similar mind-bending trick that glues the entry and exit from a context manager together across a yield statement. The popular Twisted package has inlined callbacks that are also similar.

There are two main features that make this recipe work. First, nonlocal declarations make it possible to write functions that change inner variables. Second, function attributes allow the accessor methods to be attached to the closure function in a straightforward manner where they work a lot like instance methods (even though no class is involved).

A slight extension to this recipe can be made to have closures emulate instances of a class. All you need to do is copy the inner functions over to the dictionary of an instance and return it. For example:

import sys
class ClosureInstance:
    def __init__(self, locals=None):
        if locals is None:
            locals = sys._getframe(1).f_locals

        # Update instance dictionary with callables
        self.__dict__.update((key,value) for key, value in locals.items()
                             if callable(value) )
    # Redirect special methods
    def __len__(self):
        return self.__dict__['__len__']()

# Example use
def Stack():
    items = []

    def push(item):
        items.append(item)

    def pop():
        return items.pop()

    def __len__():
        return len(items)

    return ClosureInstance()

Here’s an interactive session to show that it actually works:

>>> s = Stack()
>>> s
<__main__.ClosureInstance object at 0x10069ed10>
>>> s.push(10)
>>> s.push(20)
>>> s.push('Hello')
>>> len(s)
3
>>> s.pop()
'Hello'
>>> s.pop()
20
>>> s.pop()
10
>>>

Interestingly, this code runs a bit faster than using a normal class definition. For example, you might be inclined to test the performance against a class like this:

class Stack2:
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.append(item)

    def pop(self):
        return self.items.pop()

    def __len__(self):
        return len(self.items)

If you do, you’ll get results similar to the following:

>>> from timeit import timeit
>>> # Test involving closures
>>> s = Stack()
>>> timeit('s.push(1);s.pop()', 'from __main__ import s')
0.9874754269840196
>>> # Test involving a class
>>> s = Stack2()
>>> timeit('s.push(1);s.pop()', 'from __main__ import s')
1.0707052160287276
>>>

As shown, the closure version runs about 8% faster. Most of that is coming from streamlined access to the instance variables. Closures are faster because there’s no extra self variable involved.

Raymond Hettinger has devised an even more diabolical variant of this idea. However, should you be inclined to do something like this in your code, be aware that it’s still a rather weird substitute for a real class. For example, major features such as inheritance, properties, descriptors, or class methods don’t work. You also have to play some tricks to get special methods to work (e.g., see the implementation of __len__() in ClosureInstance).

Lastly, you’ll run the risk of confusing people who read your code and wonder why it doesn’t look anything like a normal class definition (of course, they’ll also wonder why it’s faster). Nevertheless, it’s an interesting example of what can be done by providing access to the internals of a closure.

In the big picture, adding methods to closures might have more utility in settings where you want to do things like reset the internal state, flush buffers, clear caches, or have some kind of feedback mechanism.