Effective Python: 125 Specific Ways to Write Better Python, 3rd Edition

11 Performance

There are a wide range of metrics for judging the execution quality of a program, including CPU utilization, throughput, latency, response time, memory usage, and cache hit rate. These metrics can also be assessed in different ways with respect to their statistical distribution. For example, with latency, depending on your goals, you might need to consider average latency, median latency, 99th percentile latency, or worst-case latency. There are also application-specific metrics that build on these lower-level ones, such as transactions per second, time to first paint, maximum frame rate, and goodput.

Achieving good performance means that code you’ve written meets your expectations for one or more of the quantitative measurements that you care about most. Which metrics matter depends on the problem domain, production environment, and user profile; there’s not a one-size-fits-all goal. Performance engineering is the discipline of analyzing program execution behavior, identifying areas for improvement, and implementing changes—large and small—to maximize or minimize the metrics that are most important.

Python is not considered to be a high-performance language, especially in comparison to lower-level languages built for the task. This reputation is understandable given the overhead and constraints of the Python runtime, especially when it comes to parallelism (see Item 68: “Use Threads for Blocking I/O; Avoid for Parallelism” for background). That said, Python includes a variety of capabilities that enable programs to achieve surprisingly impressive performance with relatively low amounts of effort. Using these features, it’s possible to extract maximum performance from a host system while retaining the productivity gains afforded by Python’s high-level nature.

Item 92: Profile Before Optimizing

The dynamic nature of Python causes surprising behaviors in its runtime performance. Operations you might assume would be slow are actually very fast (e.g., string manipulation, use of generators). Language features you might assume would be fast are actually very slow (e.g., attribute accesses, function calls). The true source of slowdowns in a Python program can be obscure.

The best approach is to ignore your intuition and directly measure the performance of a program before you try to optimize it. Python provides a built-in profiler for determining which parts of a program are responsible for its execution time. Profiling enables you to focus your optimization efforts on the biggest sources of trouble and ignore parts of the program that don’t impact speed (i.e., follow Amdahl’s law from academic literature).

For example, say that I want to determine why an algorithm in a program is slow. Here I define a function that sorts a list of data using an insertion sort:

11

Performance

Item 92: Profile Before Optimizing

Things to Remember

Item 93: Optimize Performance-Critical Code Using timeit Microbenchmarks

Things to Remember

Item 94: Know When and How to Replace Python with Another Programming Language

Things to Remember

Item 95: Consider ctypes to Rapidly Integrate with Native Libraries

Things to Remember

Item 96: Consider Extension Modules to Maximize Performance and Ergonomics

Things to Remember

Item 97: Rely on Precompiled Bytecode and File System Caching to Improve Startup Time

Things to Remember

Item 98: Lazy-Load Modules with Dynamic Imports to Reduce Startup Time

Things to Remember

Item 99: Consider memoryview and bytearray for Zero-Copy Interactions with bytes

Things to Remember

Item 93: Optimize Performance-Critical Code Using `timeit` Microbenchmarks

Item 95: Consider `ctypes` to Rapidly Integrate with Native Libraries

Item 99: Consider `memoryview` and `bytearray` for Zero-Copy Interactions with `bytes`