This chapter describes Python’s expressions, operators, and evaluation rules related to data manipulation. Expressions are at the heart of performing useful computations. Moreover, third-party libraries can customize Python’s behavior to provide a better user experience. This chapter describes expressions at a high level. Chapter 3 describes the underlying protocols that can be used to customize the behavior of the interpreter.
A literal is a value typed directly into a program such as 42, 4.2, or 'forty-two'.
Integer literals represent a signed integer value of arbitrary size. It’s possible to specify integers in binary, octal, or hexadecimal:
42 0b101010 # Binary integer 0o52 # Octal integer 0x2a # Hexadecimal integer
The base is not stored as part of the integer value. All of the above literals will display as 42 if printed. You can use the built-in functions bin(x), oct(x), or hex(x) to convert an integer to a string representing its value in different bases.
Floating-point numbers can be written by adding a decimal point or by using scientific notation where an e or E specifies an exponent. All of the following are floating-point numbers:
4.2 42. .42 4.2e+2 4.2E2 -4.2e-2
Internally, floating-point numbers are stored as IEEE 754 double-precision (64-bit) values.
In numeric literals, a single underscore ( _ ) can be used as a visual separator between digits. For example:
123_456_789 0x1234_5678 0b111_00_101 123.789_012
The digit separator is not stored as part of the number—it’s only used to make large numeric literals easier to read in source code.
Boolean literals are written as True and False.
String literals are written by enclosing characters with single, double, or triple quotes. Single- and double-quoted strings must appear on the same line. Triple-quoted strings can span multiple lines. For example:
'hello world' "hello world" '''hello world''' """hello world"""
Tuple, liss, set, and dictionary literals are written as follows:
(1, 2, 3) # tuple [1, 2, 3] # list {1, 2, 3} # set {'x':1, 'y':2, 'z': 3} # dict
An expression represents a computation that evaluates to a concrete value. It consists of a combination of literals, names, operators, and function or method calls. An expression can always appear on the right-hand side of an assignment statement, be used as an operand in operations in other expressions, or be passed as a function argument. For example:
value = 2 + 3 * 5 + sqrt(6+7)
Operators, such as + (addition) or * (multiplication), represent an operation performed on objects provided as operands. sqrt() is a function that is applied to input arguments.
The left hand side of an assignment represents a location where a reference to an object is stored. That location, as shown in the previous example, might be a simple identifier such as value. It could also be an attribute of an object or an index within a container. For example:
a = 4 + 2 b[1] = 4 + 2 c['key'] = 4 + 2 d.value = 4 + 2
Reading a value back from a location is also an expression. For example:
value = a + b[1] + c['key']
The assignment of a value and the evaluation of an expression are separate concepts. In particular, you can’t include the assignment operator as part of an expression:
while line=file.readline(): # Syntax Error. print(line)
However, an “assignment expression” operator (:=) can be used to perform this combined action of expression evaluation and assignment. For example:
while (line:=file.readline()): print(line)
The := operator is usually used in combination with statements such as if and while. In fact, using it as a normal assignment operator results in a syntax error unless you put parentheses around it.
Python objects can be made to work with any of the operators in Table 2.1.
Table 2.1 Standard Operators
Operation |
Description |
|---|---|
|
Addition |
|
Subtraction |
|
Multiplication |
|
Division |
|
Truncating division |
|
Matrix multiplication |
|
Power ( |
|
Modulo ( |
|
Left shift |
|
Right shift |
|
Bitwise and |
|
Bitwise or |
|
Bitwise xor (exclusive or) |
|
Bitwise negation |
|
Unary minus |
|
Unary plus |
|
Absolute value |
|
Returns |
|
Returns |
|
Rounds to the nearest multiple of 10- |
Usually these have a numeric interpretation. However, there are notable special cases. For example, the + operator is also used to concatenate sequences, * operator replicates sequences, - is used for set differences, and % performs string formatting:
[1,2,3] + [4,5] # [1,2,3,4,5] [1,2,3] * 4 # [1,2,3,1,2,3,1,2,3,1,2,3] '%s has %d messages' % ('Dave', 37)
Checking of operators is a dynamic process. Operations involving mixed data types will often “work” if there is an intuitive sense for the operation to work. For example, you can add integers and fractions:
>>> from fractions import Fraction >>> a = Fraction(2, 3) >>> b = 5 >>> a + b Fraction(17, 3) >>>
However, it’s not always foolproof. For example, it doesn’t work with decimals.
>>> from decimal import Decimal
>>> from fractions import Fraction
>>> a = Fraction(2, 3)
>>> b = Decimal('5')
>>> a + b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'Fraction' and 'decimal.Decimal'
>>>
For most combinations of numbers, however, Python follows a standard numeric hierarchy of Booleans, integers, fractions, floats, and complex numbers. Mixed-type operations will simply work—you don’t have to worry about it.
Python provides the “in-place” or “augmented” assignment operations in Table 2.2.
Table 2.2 Augmented Assignment Operators
Operation |
Description |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These are not considered to be expressions. Instead, they are a syntactic convenience for updating a value in place. For example:
a = 3 a = a + 1 # a = 4 a += 1 # a = 5
Mutable objects can use these operators to perform an in-place mutation of the data as an optimization. Consider this example:
>>> a = [1, 2, 3] >>> b = a # Creates a new reference to a >>> a += [4, 5] # In-place update (doesn't create a new list) >>> a [1, 2, 3, 4, 5] >>> b [1, 2, 3, 4, 5] >>>
In this example, a and b are references to the same list. When a += [4, 5] is performed, it updates the list object in place without creating a new list. Thus, b also sees this update. This is often surprising.
The equality operator (x == y) tests the values of x and y for equality. In the case of lists and tuples, they must be of equal size, have equal elements, and be in the same order. For dictionaries, True is returned only if x and y have the same set of keys and all the objects with the same key have equal values. Two sets are equal if they have the same elements.
An equality comparison between objects of incompatible types, such as a file and a floating-point number, does not trigger an error but returns False. However, sometimes a comparison between objects of different types will produce True. For example, comparing an integer and a floating-point number of the same value:
>>> 2 == 2.0 True >>>
The identity operators (x is y and x is not y) test two values to see whether they refer to literally the same object in memory (e.g., id(x) == id(y)). In general, it may be the case that x == y, but x is not y. For example:
>>> a = [1, 2, 3] >>> b = [1, 2, 3] >>> a is b False >>> a == b True >>>
In practice, comparing objects with the is operator is almost never what you want. Use the == operator for all comparisons unless you have a good reason to expect the two objects to have the same identity.
The ordered comparison operators in Table 2.3 have the standard mathematical interpretation for numbers. They return a Boolean value.
Table 2.3 Ordered Comparison Operators
Operation |
Description |
|---|---|
|
Less than |
|
Greater than |
|
Greater than or equal to |
|
Less than or equal to |
For sets, x < y tests if x is strict subset of y (i.e., has fewer elements, but is not equal to y).
When comparing two sequences, the first elements of each sequence are compared. If they differ, this determines the result. If they’re the same, the comparison moves to the second element of each sequence. This process continues until two different elements are found or no more elements exist in either of the sequences. If the end of both sequences is reached, the sequences are considered equal. If a is a subsequence of b, then a < b.
Strings and bytes are compared using lexicographical ordering. Each character is assigned a unique numerical index determined by the character set (such as ASCII or Unicode). A character is less than another character if its index is less.
Not all types support the ordered comparisons. For example, trying to use < on dictionaries is undefined and results in a TypeError. Similarly, applying ordered comparisons to incompatible types (such as a string and a number) will result in a TypeError.
The and, or, and not operators can form complex Boolean expressions. The behavior of these operators is shown in Table 2.4.
Table 2.4 Logical Operators
Operator |
Description |
|---|---|
|
If |
|
If |
|
If |
When you use an expression to determine a true or false value, True, any nonzero number, a nonempty string, list, tuple, or dictionary is taken to be true. False, zero, None, and empty lists, tuples, and dictionaries evaluate as false.
Boolean expressions are evaluated from left to right and consume the right operand only if it’s needed to determine the final value. For example, a and b evaluates b only if a is true. This is known as short-circuit evaluation. It can be useful to simplify code involving a test and a subsequent operation. For example:
if y != 0: result = x / y else: result = 0 # Alternative result = y and x / y
In the second version, the x / y division is only performed if y is nonzero.
Relying on implicit “truthiness” of objects may lead to difficult-to-find bugs. For example, consider this function:
def f(x, items=None): if not items: items = [] items.append(x) return items
This function has an optional argument that, if not given, causes a new list to be created and returned. For example,
>>> foo(4) [4] >>>
However, the function has really strange behavior if you give it an existing empty list as an argument:
>>> a = [] >>> foo(3, a) [3] >>> a # Notice how a did NOT update [] >>>
This is a truth-checking bug. Empty lists evaluate to False so the code created a new list instead of using the one (a) that was passed in as an argument. To fix this, you need to be more precise in your checking against None:
def f(x, items=None): if items is None: items = [] items.append(x) return items
It’s always good practice to be precise when writing conditional checks.
A common programming pattern is assigning a value conditionally based on the result of an expression. For example:
if a <= b: minvalue = a else: minvalue = b
This code can be shortened using a conditional expression. For example:
minvalue = a if a <= b else b
In such expressions, the condition in the middle is evaluated first. The expression to the left of the if is then evaluated if the result is True. Otherwise, the expression after the else is evaluated. The else clause is always required.
Iteration is an important Python feature supported by all of Python’s containers (lists, tuples, dicts, and so on), files, as well as generator functions. The operations in Table 2.5 can be applied to any iterable object s.
Table 2.5 Operations on Iterables
Operation |
Description |
|---|---|
|
Iteration |
|
Variable unpacking |
|
Membership |
[a, *s, b], (a, *s, b), {a, *s, b} |
Expansion in list, tuple, or set literals |
The most essential operation on an iterable is the for loop. This is how you iterate through the values one by one. All the other operations build upon this.
The x in s operator tests whether the object x appears as one of the items produced by the iterable s and returns True or False. The x not in s operator is the same as not (x in s). For strings, the in and not in operators accept substrings. For example, 'hello' in 'hello world' produces True. Note that the in operator does not support wildcards or any kind of pattern matching.
Any object supporting iteration can have its values unpacked into a series of locations. For example:
items = [ 3, 4, 5 ] x, y, z = items # x = 3, y = 4, z = 5 letters = "abc" x, y, z = letters # x = 'a', y = 'b', z = 'c'
The locations on the left don’t have to be simple variable names. Any valid location that could appear on the left-hand side of an equal sign is acceptable. So, you could write code like this:
items = [3, 4, 5] d = { } d['x'], d['y'], d['z'] = items
When unpacking values into locations, the number of locations on the left must exactly match the number of items in the iterable on the right. For nested data structures, match locations and data by following the same structural pattern. Consider this example of unpacking two nested 3-tuples:
datetime = ((5, 19, 2008), (10, 30, "am")) (month, day, year), (hour, minute, am_pm) = datetime
Sometimes, the _ variable is used to indicate a throw-away value when unpacking. For example, if you only care about the day and the hour, you can use:
(_, day, _), (hour, _, _) = datetime
If the number of items being unpacked isn’t known, you can use an extended form of unpacking by including a starred variable, such as *extra in the following example:
items = [1, 2, 3, 4, 5] a, b, *extra = items # a = 1, b = 2, extra = [3,4,5] *extra, a, b # extra = [1,2,3], a = 4, b = 5 a, *extra, b # a = 1, extra = [2,3,4], b = 5
In this example, *extra receives all of the extra items. It is always a list. No more than one starred variable can be used when unpacking a single iterable. However, multiple starred variables could be used when unpacking more complex data structures involving different iterables. For example:
datetime = ((5, 19, 2008), (10, 30, "am")) (month, *_), (hour, *_) = datetime
Any iterable can be expanded when writing out list, tuple, and set literals. This is also done using the star (*). For example:
items = [1, 2, 3] a = [10, *items, 11] # a = [10, 1, 2, 3, 11] (list) b = (*items, 10, *items) # b = [1, 2, 3, 10, 1, 2, 3] (tuple) c = {10, 11, *items} # c = {1, 2, 3, 10, 11} (set)
In this example, the contents of items are simply pasted into the list, tuple, or set being created as if you typed it in place at that location. This expansion is known as “splatting.” You can include as many * expansions as you like when defining a literal. However, many iterable objects (such as files or generators) only support one-time iteration. If you use *-expansion, the contents will be consumed and the iterable won’t produce any more values on subsequent iterations.
A variety of built-in functions accept any iterable as input. Table 2.6 shows some of these operations.
Table 2.6 Functions consuming iterables
Function |
Description |
|---|---|
|
Create a list from |
|
Create a tuple from |
|
Create a set from |
|
Minimum item in |
|
Maximum item in |
|
Return |
|
Return |
|
Sum of items with an optional initial value |
|
Create a sorted list |
This applies to many other library functions as well—for example, functions in the statistics module.
A sequence is an iterable container that has a size and allows items to be accessed by an integer index starting at 0. Examples include strings, lists, and tuples. In addition to all of the operations involving iteration, the operators in Table 2.7 can be applied to a sequence.
Table 2.7 Operations on Sequences
Operation |
Description |
|---|---|
|
Concatenation |
|
Makes |
|
Indexing |
|
Slicing |
|
Extended slicing |
|
Length |
The + operator concatenates two sequences of the same type. For example:
>>> a = [3, 4, 5] >>> b = [6, 7] >>> a + b [3, 4, 5, 6, 7] >>>
The s * n operator makes n copies of a sequence. However, these are shallow copies that replicate elements by reference only. Consider the following code:
>>> a = [3, 4, 5] >>> b = [a] >>> c = 4 * b >>> c [[3, 4, 5], [3, 4, 5], [3, 4, 5], [3, 4, 5]] >>> a[0] = -7 >>> c [[-7, 4, 5], [-7, 4, 5], [-7, 4, 5], [-7, 4, 5]] >>>
Notice how the change to a modifies every element of the list c. In this case, a reference to the list a was placed in the list b. When b was replicated, four additional references to a were created. Finally, when a was modified, this change was propagated to all the other copies of a. This behavior of sequence multiplication is often not the intent of the programmer. One way to work around the problem is to manually construct the replicated sequence by duplicating the contents of a. Here’s an example:
a = [ 3, 4, 5 ] c = [list(a) for _ in range(4)] # list() makes a copy of a list
The indexing operator s[n] returns the nth object from a sequence; s[0] is the first object. Negative indices can be used to fetch characters from the end of a sequence. For example, s[-1] returns the last item. Otherwise, attempts to access elements that are out of range result in an IndexError exception.
The slicing operator s[i:j] extracts a subsequence from s consisting of the elements with index k, where i <= k < j. Both i and j must be integers. If the starting or ending index is omitted, the beginning or end of the sequence is assumed, respectively. Negative indices are allowed and assumed to be relative to the end of the sequence.
The slicing operator may be given an optional stride, s[i:j:stride], that causes the slice to skip elements. However, the behavior is somewhat more subtle. If a stride is supplied, i is the starting index, j is the ending index, and the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included). The stride may also be negative. If the starting index i is omitted, it is set to the beginning of the sequence if stride is positive or the end of the sequence if stride is negative. If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative. Here are some examples:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] a[2:5] # [2, 3, 4] a[:3] # [0, 1, 2] a[-3:] # [7, 8, 9] a[::2] # [0, 2, 4, 6, 8 ] a[::-2] # [9, 7, 5, 3, 1 ] a[0:5:2] # [0, 2] a[5:0:-2] # [5, 3, 1] a[:5:1] # [0, 1, 2, 3, 4] a[:5:-1] # [9, 8, 7, 6] a[5::1] # [5, 6, 7, 8, 9] a[5::-1] # [5, 4, 3, 2, 1, 0] a[5:0:-1] # [5, 4, 3, 2, 1]
Fancy slices may result in code that is hard to understand later. Thus, some judgment is probably warranted. Slices can be named using slice(). For example:
firstfive = slice(0, 5) s = 'hello world' print(s[firstfive]) # Prints 'hello'
Strings and tuples are immutable and cannot be modified after creation. The contents of a list or other mutable sequence can be modified in-place with the operators in Table 2.8.
Table 2.8 Mutable Sequence Operations
Operation |
Description |
|---|---|
|
Index assignment |
|
Slice assignment |
|
Extended slice assignment |
|
Deletes an element |
|
Deletes a slice |
|
Deletes an extended slice |
The s[i] = x operator changes element i of a sequence to refer to object x, increasing the reference count of x. Negative indices are relative to the end of the list and attempts to assign a value to an out-of-range index result in an IndexError exception. The slicing assignment operator s[i:j] = r replaces elements k, where i <= k < j, with elements from sequence r. Indices have the same meaning as for slicing. If necessary, the sequence s may be expanded or reduced in size to accommodate all the elements in r. Here’s an example:
a = [1, 2, 3, 4, 5] a[1] = 6 # a = [1, 6, 3, 4, 5] a[2:4] = [10, 11] # a = [1, 6, 10, 11, 5] a[3:4] = [-1, -2, -3] # a = [1, 6, 10, -1, -2, -3, 5] a[2:] = [0] # a = [1, 6, 0]
A slicing assignment may be supplied with an optional stride argument. However, the behavior is then more restricted in that the argument on the right side must have exactly the same number of elements as the slice that’s being replaced. Here’s an example:
a = [1, 2, 3, 4, 5] a[1::2] = [10, 11] # a = [1, 10, 3, 11, 5] a[1::2] = [30, 40, 50] # ValueError. Only two elements in slice on left
The del s[i] operator removes element i from a sequence and decrements its reference count. del s[i:j] removes all the elements in a slice. A stride may also be supplied, as in del s[i:j:stride].
The semantics described here apply to the built-in list type. Operations involving sequence slicing is a rich area for customization in third-party packages. You may find that slices on non-list objects have different rules concerning reassignment, deletion, and sharing of objects. For example, the popular numpy package has different slicing semantics than Python lists.
A set is an unordered collection of unique values. The operations in Table 2.9 can be performed on sets.
Table 2.9 Operations on Sets
Operation |
Description |
|---|---|
|
Union of |
|
Intersection of |
|
Set difference (items in |
|
Symmetric difference (items not in both |
|
Number of items in the set |
|
Membership test |
|
Add an item to set |
|
Remove an item from |
|
Discard an item from |
Here are some examples:
>>> a = {'a', 'b', 'c' }
>>> b = {'c', 'd'}
>>> a | b
{'a', 'b', 'c', 'd'}
>>> a & b
>>> {'c' }
>>> a - b
{'a', 'b'}
>>> b - a
{'d'}
>>> a ^ b
{'a', 'b', 'd'}
>>>
Set operations also work on the key-view and item-view objects of dictionaries. For example, to find out which keys two dictionaries have in common, do this:
>>> a = { 'x': 1, 'y': 2, 'z': 3 }
>>> b = { 'z': 3, 'w': 4, 'q': 5 }
>>> a.keys() & b.keys()
{ 'z' }
>>>
A mapping is an association between keys and values. The built-in dict type is an example. The operations in Table 2.10 may be applied to mappings.
Table 2.10 Operations on Mappings
Operation |
Description |
|---|---|
|
Indexing by key |
|
Assignment by key |
|
Deletes an item by key |
|
Membership testing |
|
Number of items in the mapping |
|
Return the keys |
|
Return the values |
|
Return |
Key values can be any immutable object, such as strings, numbers, and tuples. When using a tuple as the key, you can omit the parentheses and write comma-separated values like this:
d = { } d[1,2,3] = "foo" d[1,0,3] = "bar"
In this case, the key values represent a tuple, making these assignments equivalent to the following:
d[(1,2,3)] = "foo" d[(1,0,3)] = "bar"
Using a tuple as a key is a common technique for creating composite keys in a mapping. For example, a key may consist of a “first name” and “last name.”
One of the most common operations involving data is transforming a collection of data into another data structure. For example, here we take all the items in a list, apply an operation, and create a new list:
nums = [1, 2, 3, 4, 5] squares = [] for n in nums: nums.append(n * n)
Since this kind of operation is so common, it is available as an operator known as a list comprehension. Here is a more compact version of this code:
nums = [1, 2, 3, 4, 5] squares = [n * n for n in nums]
It is also possible to apply a filter to the operation:
squares = [n * n for n in nums if n > 2] # [9, 16, 25]
The general syntax for a list comprehension is as follows:
[expression for item1 in iterable1 if condition1 for item2 in iterable2 if condition2 ... for itemN in iterableN if conditionN ]
This syntax is equivalent to the following code:
result = [] for item1 in iterable1: if condition1: for item2 in iterable2: if condition2: ... for itemN in iterableN: if conditionN: result.append(expression)
List comprehensions are a very useful way to process list data of various forms. Here are some practical examples:
# Some data (a list of dictionaries) portfolio = [ {'name': 'IBM', 'shares': 100, 'price': 91.1 }, {'name': 'MSFT', 'shares': 50, 'price': 45.67 }, {'name': 'HPE', 'shares': 75, 'price': 34.51 }, {'name': 'CAT', 'shares': 60, 'price': 67.89 }, {'name': 'IBM', 'shares': 200, 'price': 95.25 } ] # Collect all names ['IBM', 'MSFT', 'HPE', 'CAT', 'IBM' ] names = [s['name'] for s in portfolio] # Find all entries with more than 100 shares ['IBM'] more100 = [s['name'] for s in portfolio if s['shares'] > 100 ] # Find the total shares*price cost = sum([s['shares']*s['price'] for s in portfolio]) # Collect (name, shares) tuples name_shares = [ (s['name'], s['shares']) for s in portfolio ]
All of the variables used inside a list comprehension are private to the comprehension. You don’t need to worry about such variables overwriting other variables with the same name. For example:
>>> x = 42 >>> squares = [x*x for x in [1,2,3]] >>> squares [1, 4, 9] >>> x 42 >>>
Instead of creating a list, you can also create a set by changing the brackets into curly braces. This is known as a set comprehension. A set comprehension will give you a set of distinct values. For example:
# Set comprehension names = { s['name'] for s in portfolio } # names = { 'IBM', 'MSFT', 'HPE', 'CAT' }
If you specify key:value pairs, you’ll create a dictionary instead. This is known as a dictionary comprehension. For example:
prices = { s['name']:s['price'] for s in portfolio } # prices = { 'IBM': 95.25, 'MSFT': 45.67, 'HPE': 34.51, 'CAT': 67.89 }
When creating sets and dictionaries, be aware that later entries might overwrite earlier entries. For example, in the prices dictionary you get the last price for 'IBM'. The first price is lost.
Within a comprehension, it’s not possible to include any sort of exception handling. If this is a concern, consider wrapping exceptions with a function, as shown here:
def toint(x): try: return int(x) exceptValueError: return None values = [ '1', '2', '-4', 'n/a', '-3', '5' ] data1 = [ toint(x) for x in values ] # data1 = [1, 2, -4, None, -3, 5] data2 = [ toint(x) for x in values if toint(x) is not None ] # data2 = [1, 2, -4, -3, 5]
The double evaluation of toint(x) in the last example can be avoided by using the := operator. For example:
data3 = [ v for x in values if (v:=toint(x)) is not None ] # data3 = [1, 2, -4, -3, 5] data4 = [ v for x in values if (v:=toint(x)) is not None and v >= 0 ] # data4 = [1, 2, 5]
A generator expression is an object that carries out the same computation as a list comprehension but produces the result iteratively. The syntax is the same as for a list comprehension except that you use parentheses instead of square brackets. Here’s an example:
nums = [1,2,3,4] squares = (x*x for x in nums)
Unlike a list comprehension, a generator expression does not actually create a list or immediately evaluate the expression inside the parentheses. Instead, it creates a generator object that produces the values on demand via iteration. If you look at the result of the above example, you’ll see the following:
>>> squares <generator object at 0x590a8> >>> next(squares) 1 >>> next(squares) 4 ... >>> for n in squares: ... print(n) 9 16 >>>
A generator expression can only be used once. If you try to iterate a second time, you’ll get nothing:
>>> for n in squares: ... print(n) ... >>>
The difference between list comprehensions and generator expressions is important but subtle. With a list comprehension, Python actually creates a list that contains the resulting data. With a generator expression, Python creates a generator that merely knows how to produce data on demand. In certain applications, this can greatly improve performance and memory use. Here’s an example:
# Read a file f = open('data.txt') # Open a file lines = (t.strip() for t in f) # Read lines, strip # trailing/leading whitespace comments = (t for t in lines if t[0] == '#') # All comments for c in comments: print(c)
In this example, the generator expression that extracts lines and strips whitespace does not actually read and hold the entire file in memory. The same is true of the expression that extracts comments. Instead, the lines of the file are read one-by-one when the program starts iterating in the for loop that follows. During this iteration, the lines of the file are produced upon demand and filtered accordingly. In fact, at no time will the entire file be loaded into memory during this process. This is therefore a highly efficient way to extract comments from a gigabyte-sized Python source file.
Unlike a list comprehension, a generator expression does not create an object that works like a sequence. It can’t be indexed, and none of the usual list operations (such as append()) will work. However, the items produced by a generator expression can be converted into a list using list():
clist = list(comments)
When passed as a single function argument, one set of parentheses can be removed. For example, the following statements are equivalent:
sum((x*x for x in values)) sum(x*x for x in values) # Extra parens removed
In both cases, a generator (x*x for x in values) is created and passed to the sum() function.
.) OperatorThe dot (.) operator is used to access the attributes of an object. Here’s an example:
foo.x = 3 print(foo.y) a = foo.bar(3,4,5)
More than one dot operator can appear in a single expression, for example foo.y.a.b. The dot operator can also be applied to the intermediate results of functions, as in a = foo.bar(3,4,5).spam. Stylistically, however, it is not so common for programs to create long chains of attribute lookups.
() OperatorThe f(args) operator is used to make a function call on f. Each argument to a function is an expression. Prior to calling the function, all of the argument expressions are fully evaluated from left to right. This is known as applicative order evaluation. More information about functions can be found in Chapter 5.
Table 2.11 lists the order of operation (precedence rules) for Python operators. All operators except the power (**) operator are evaluated from left to right and are listed in the table from highest to lowest precedence. That is, operators listed first in the table are evaluated before operators listed later. Operators included together within subsections, such as x * y, x / y, x // y, x @ y, and x % y, have equal precedence.
Table 2.11 Order of Evaluation (Highest Precedence to Lowest)
Operator |
Name |
|---|---|
|
Tuple, list, and dictionary creation |
|
Indexing and slicing |
|
Attribute lookup |
|
Function calls |
|
Unary operators |
|
Power (right associative) |
|
Multiplication, division, floor division, modulo, matrix multiplication |
|
Addition, subtraction |
|
Bit-shifting |
|
Bitwise and |
|
Bitwise exclusive or |
|
Bitwise or |
|
Comparison, identity, and sequence membership tests |
|
Logical negation |
|
Logical and |
|
Logical or |
|
Anonymous function |
|
Conditional expression |
|
Assignment expression |
The order of evaluation in Table 2.11 does not depend on the types of x and y. So, even though user-defined objects can redefine individual operators, it is not possible to customize the underlying evaluation order, precedence, and associativity rules.
A common confusion with precedence rules is when bitwise-and (&) and bitwise-or (|) operators are used to mean logical-and (and) and logical-or (or). For example:
>>> a = 10 >>> a <= 10 and 1 < a True >>> a <= 10 & 1 < a False >>>
The latter expression gets evaluated as a <= (10 & 1) < a or a <= 0 < a. You can fix it by adding parentheses:
>>> (a <= 10) & (1 < a) True >>>
This might seem like an esoteric edge case, but it arises with some frequency in data-oriented packages such as numpy and pandas. The logical operators and and or can’t be customized so the bitwise operators are used instead—even though they have a higher precedence level and evaluate differently when used in Boolean relations.
One of the most frequent uses of Python is in applications involving data manipulation and analysis. Here, Python provides a kind of “domain language” for thinking about your problem. The built-in operators and expressions are at the core of that language and everything else builds from it. Thus, once you build a kind of intuition around Python’s built-in objects and operations, you will find that your intuition applies everywhere.
As an example, suppose you’re working with a database and you want to iterate over the records returned by a query. Chances are, you will use the for statement to do just that. Or, suppose you’re working with numeric arrays and want to perform element-by-element mathematics on arrays. You might think that the standard math operators would work— and your intuition would be correct. Or, suppose you’re using a library to fetch data over HTTP and you want to access the contents of the HTTP headers. There’s a good chance that data will be presented in a way that looks like a dictionary.
More information about Python’s internal protocols and how to customize them is given in Chapter 4.