Effective Python: 125 Specific Ways to Write Better Python, 3rd Edition

9 Concurrency and Parallelism

Concurrency enables a computer to do many different things seemingly at the same time. For example, on a computer with one CPU core, the operating system rapidly changes which program is running on the single processor. In doing so, it interleaves execution of the programs, providing the illusion that the programs are running simultaneously.

Parallelism, in contrast, involves actually doing many different things at the same time. Computers with multiple CPU cores can execute multiple programs simultaneously. Each CPU core runs the instructions of a separate program, allowing each program to make forward progress during the same instant.

Within a single program, concurrency is a tool that makes it easier for programmers to solve certain types of problems. Concurrent programs enable many distinct paths of execution, including separate streams of I/O, to make forward progress in a way that seems to be both simultaneous and independent.

The key difference between parallelism and concurrency is speedup. When two distinct paths of execution in a program make forward progress in parallel, the time it takes to do the total work is cut in half; the speed of execution is faster by a factor of two. In contrast, concurrent programs may run thousands of separate paths of execution seemingly in parallel but provide no speedup for the total work.

Python makes it easy to write concurrent programs in a variety of styles. Threads support a relatively small amount of concurrency, while asynchronous coroutines enable vast numbers of concurrent functions. Python can also be used to do parallel work through system calls, subprocesses, and C extensions. But it can be very difficult to make concurrent Python code truly run in parallel. It’s important to understand how to best utilize Python in these different situations.

Item 67: Use `subprocess` to Manage Child Processes

Python has battle-hardened libraries for running and managing child processes. This makes Python a great language for gluing together other tools, such as command-line utilities. When existing shell scripts get complicated, as they often do over time, graduating them to a rewrite in Python for the sake of readability and maintainability is a natural choice.

Child processes started by Python are able to run in parallel, enabling you to use Python to consume all of the CPU cores of your machine and maximize the throughput of your programs. Although Python itself may be CPU bound (see Item 68: “Use Threads for Blocking I/O; Avoid for Parallelism”), it’s easy to use Python to drive and coordinate CPU-intensive workloads.

Python has many ways to run subprocesses (e.g., os.popen, os.exec*), but the best choice for managing child processes is to use the subprocess built-in module. Running a child process with subprocess is simple. Here I use the module’s run convenience function to start a process, read its output, and verify that it terminated cleanly:

9

Concurrency and Parallelism

Item 67: Use subprocess to Manage Child Processes

Things to Remember

Item 68: Use Threads for Blocking I/O; Avoid for Parallelism

Things to Remember

Item 69: Use Lock to Prevent Data Races in Threads

Things to Remember

Item 70: Use Queue to Coordinate Work Between Threads

Queue to the Rescue

Things to Remember

Item 71: Know How to Recognize When Concurrency Is Necessary

Things to Remember

Item 72: Avoid Creating New Thread Instances for On-Demand Fan-out

Things to Remember

Item 73: Understand How Using Queue for Concurrency Requires Refactoring

Things to Remember

Item 74: Consider ThreadPoolExecutor When Threads Are Necessary for Concurrency

Things to Remember

Item 75: Achieve Highly Concurrent I/O with Coroutines

Things to Remember

Item 76: Know How to Port Threaded I/O to asyncio

Things to Remember

Item 77: Mix Threads and Coroutines to Ease the Transition to asyncio

Top-Down Approach

Bottom-Up Approach

Things to Remember

Item 78: Maximize Responsiveness of asyncio Event Loops with async-Friendly Worker Threads

Things to Remember

Item 79: Consider concurrent.futures for True Parallelism

Things to Remember

Item 67: Use `subprocess` to Manage Child Processes

Item 69: Use `Lock` to Prevent Data Races in Threads

Item 70: Use `Queue` to Coordinate Work Between Threads

`Queue` to the Rescue

Item 72: Avoid Creating New `Thread` Instances for On-Demand Fan-out

Item 73: Understand How Using `Queue` for Concurrency Requires Refactoring

Item 74: Consider `ThreadPoolExecutor` When Threads Are Necessary for Concurrency

Item 76: Know How to Port Threaded I/O to `asyncio`

Item 77: Mix Threads and Coroutines to Ease the Transition to `asyncio`

Item 78: Maximize Responsiveness of `asyncio` Event Loops with `async`-Friendly Worker Threads

Item 79: Consider `concurrent.futures` for True Parallelism