This chapter gives an overview of the core of the Python language. It covers variables, data types, expressions, control flow, functions, classes, and input/output. The chapter concludes with a discussion of modules, script writing, packages, and a few tips on organizing larger programs. This chapter is not trying to provide comprehensive coverage of every feature, nor does it concern itself with all of the tooling that might surround a larger Python project. However, experienced programmers should be able to extrapolate from the material here to write more advanced programs. Newcomers are encouraged to try the examples in a simple environment, such as a terminal window and a basic text editor.
Python programs are executed by an interpreter. There are many different environments in which the Python interpreter might run—an IDE, a browser, or a terminal window. However, underneath all that, the core of the interpreter is a text-based application that can be started by typing python in a command shell such as bash. Since Python 2 and Python 3 might both be installed on the same machine, you might need to type python2 or python3 to pick a version. This book assumes Python 3.8 or newer.
When the interpreter starts, a prompt appears where you can type programs into a so-called “read-evaluation-print loop” (or REPL). For example, in the following output, the interpreter displays its copyright message and presents the user with the >>> prompt, at which the user types a familiar “Hello World” program:
Python 3.8.0 (default, Feb 3 2019, 05:53:21)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print('Hello World')
Hello World
>>>
Certain environments may display a different prompt. The following output is from ipython (an alternate shell for Python):
Python 3.8.0 (default, Feb 4, 2019, 07:39:16)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: print('Hello World')
Hello World
In [2]:
Regardless of the exact form of output you see, the underlying principle is the same. You type a command, it runs, and you immediately see the output.
Python’s interactive mode is one of its most useful features because you can type any valid statement and immediately see the result. This is useful for debugging and experimentation. Many people, including the author, use interactive Python as their desktop calculator. For example:
>>> 6000 + 4523.50 + 134.25 10657.75 >>> _ + 8192.75 18850.5 >>>
When you use Python interactively, the variable _ holds the result of the last operation. This is useful if you want to use that result in subsequent statements. This variable only gets defined when working interactively, so don’t use it in saved programs.
You can exit the interactive interpreter by typing quit() or the EOF (end of file) character. On UNIX, EOF is Ctrl+D; on Windows, it’s Ctrl+Z.
If you want to create a program that you can run repeatedly, put statements in a text file. For example:
# hello.py print('Hello World')
Python source files are UTF-8-encoded text files that normally have a .py suffix. The # character denotes a comment that extends to the end of the line. International (Unicode) characters can be freely used in the source code as long as you use the UTF-8 encoding (this is the default in most editors, but it never hurts to check your editor settings if you’re unsure).
To execute the hello.py file, provide the filename to the interpreter as follows:
shell % python3 hello.py Hello World shell %
It is common to use #! to specify the interpreter on the first line of a program, like this:
#!/usr/bin/env python3 print('Hello World')
On UNIX, if you give this file execute permissions (for example, by chmod +x hello.py), you can run the program by typing hello.py into your shell.
On Windows, you can double-click on a .py file or type the name of the program into the Run command on the Windows Start menu to launch it. The #! line, if given, is used to pick the interpreter version (Python 2 versus 3). Execution of a program might take place in a console window that disappears immediately after the program completes— often before you can read its output. For debugging, it’s better to run the program within a Python development environment.
The interpreter runs statements in order until it reaches the end of the input file. At that point, the program terminates and Python exits.
Python provides a collection of primitive types such as integers, floats, and strings:
42 # int 4.2 # float 'forty-two' # str True # bool
A variable is a name that refers to a value. A value represents an object of some type:
x = 42
Sometimes you might see a type explicitly attached to a name. For example:
x: int = 42
The type is merely a hint to improve code readability. It can be used by third-party code-checking tools. Otherwise, it is completely ignored. It does not prevent you from assigning a different kind of value later.
An expression is a combination of primitives, names, and operators that produces a value:
2 + 3 * 4 # -> 14
The following program uses variables and expressions to perform a compound-interest calculation:
# interest.py principal = 1000 # Initial amount rate = 0.05 # Interest rate numyears = 5 # Number of years year = 1 while year <= numyears: principal = principal * (1 + rate) print(year, principal) year += 1
When executed, it produces the following output:
1 1050.0
2 1102.5
3 1157.625
4 1215.5062500000001
5 1276.2815625000003
The while statement tests the conditional expression that immediately follows. If the tested condition is true, the body of the while statement executes. The condition is then retested and the body executed again until the condition becomes false. The body of the loop is denoted by indentation. Thus, the three statements following while in interest.py execute on each iteration. Python doesn’t specify the amount of required indentation, as long as it’s consistent within a block. It is most common to use four spaces per indentation level.
One problem with the interest.py program is that the output isn’t very pretty. To make it better, you could right-align the columns and limit the precision of principal to two digits. Change the print() function to use a so-called f-string like this:
print(f'{year:>3d} {principal:0.2f}')
In the f-string, variable names and expressions can be evaluated by enclosing them in curly braces. Optionally, each substitution can have a formatting specifier attached to it. '>3d' means a three-digit decimal number, right aligned. '0.2f' means a floating-point number with two decimal places of accuracy. More information about these formatting codes can be found in Chapter 9.
Now the output of the program looks like this:
1 1050.00
2 1102.50
3 1157.62
4 1215.51
5 1276.28
Python has a standard set of mathematical operators, shown in Table 1.1. These operators have the same meaning they do in most other programming languages.
Table 1.1 Arithmetic Operators
Operation |
Description |
|---|---|
|
Addition |
|
Subtraction |
|
Multiplication |
|
Division |
|
Truncating division |
|
Power ( |
|
Modulo ( |
|
Unary minus |
|
Unary plus |
The division operator (/) produces a floating-point number when applied to integers. Therefore, 7/4 is 1.75. The truncating division operator //, also known as floor division, truncates the result to an integer and works with both integers and floating-point numbers. The modulo operator returns the remainder of the division x // y. For example, 7 % 4 is 3. For floating-point numbers, the modulo operator returns the floating-point remainder of x // y, which is x – (x // y) * y.
In addition, the built-in functions in Table 1.2 provide a few more commonly used numerical operations.
Table 1.2 Common Mathematic Functions
Function |
Description |
|---|---|
|
Absolute value |
|
Returns |
|
Returns |
|
Rounds to the nearest multiple of 10 to the |
The round() function implements “banker’s rounding.” If the value being rounded is equally close to two multiples, it is rounded to the nearest even multiple (for example, 0.5 is rounded to 0.0, and 1.5 is rounded to 2.0).
Integers provide a few additional operators to support bit manipulation, shown in Table 1.3.
Table 1.3 Bit Manipulation Operators
Operation |
Description |
|---|---|
|
Left shift |
|
Right shift |
|
Bitwise and |
|
Bitwise or |
|
Bitwise xor (exclusive or) |
|
Bitwise negation |
One would commonly use these with binary integers. For example:
a = 0b11001001 mask = 0b11110000 x = (a & mask) >> 4 # x = 0b1100 (12)
In this example, 0b11001001 is how you write an integer value in binary. You could have written it as decimal 201 or hexadecimal 0xc9, but if you’re fiddling with bits, binary makes it easier to visualize what you’re doing.
The semantics of the bitwise operators assumes that the integers use a two’s complement binary representation and that the sign bit is infinitely extended to the left. Some care is required if you are working with raw bit patterns that are intended to map to native integers on the hardware. This is because Python does not truncate the bits or allow values to overflow—instead, the result will grow arbitrarily large in magnitude. It’s up to you to make sure the result is properly sized or truncated if needed.
To compare numbers, use the comparison operators in Table 1.4.
Table 1.4 Comparison Operators
Operation |
Description |
|---|---|
|
Equal to |
|
Not equal to |
|
Less than |
|
Greater than |
|
Greater than or equal to |
|
Less than or equal to |
The result of a comparison is a Boolean value True or False.
The and, or, and not operators (not to be confused with the bit-manipulation operators above) can form more complex Boolean expressions. The behavior of these operators is as shown in Table 1.5.
Table 1.5 Logical Operators
Operator |
Description |
|---|---|
|
If |
|
If |
|
If |
A value is considered false if it is literally False, None, numerically zero, or empty. Otherwise, it’s considered true.
It is common to write an expression that updates a value. For example:
x = x + 1 y = y * n
For these, you can write the following shortened operation instead:
x += 1 y *= n
This shortened form of update can be used with any of the +, -, *, **, /, //, %, &, |, ^, <<, >> operators. Python does not have increment (++) or decrement (--) operators found in some other languages.
The while, if and else statements are used for looping and conditional code execution. Here’s an example:
if a < b: print('Computer says Yes') else: print('Computer says No')
The bodies of the if and else clauses are denoted by indentation. The else clause is optional. To create an empty clause, use the pass statement, as follows:
if a < b: pass # Do nothing else: print('Computer says No')
To handle multiple-test cases, use the elif statement:
if suffix == '.htm': content = 'text/html' elif suffix == '.jpg': content = 'image/jpeg' elif suffix == '.png': content = 'image/png' else: raise RuntimeError(f'Unknown content type {suffix!r}')
If you are assigning a value in combination with a test, use a conditional expression:
maxval = a if a > b else b
This is the same as the longer:
if a > b: maxval = a else: maxval = b
Sometimes, you may see the assignment of a variable and a conditional combined together using the := operator. This is known as an assignment expression (or more colloquially as the "walrus operator" because := looks like a walrus tipped over on its side—presumably playing dead). For example:
x = 0 while (x := x + 1) < 10: # Prints 1, 2, 3, ..., 9 print(x)
The parentheses used to surround an assignment expression are always required.
The break statement can be used to abort a loop early. It only applies to the innermost loop. For example:
x = 0 while x < 10: if x == 5: break # Stops the loop. Moves to Done below print(x) x += 1 print('Done')
The continue statement skips the rest of the loop body and goes back to the top of the loop. For example:
x = 0 while x < 10: x += 1 if x == 5: continue # Skips the print(x). Goes back to loop start. print(x) print('Done')
To define a string literal, enclose it in single, double, or triple quotes as follows:
a = 'Hello World' b = "Python is groovy" c = '''Computer says no.''' d = """Computer still says no."""
The same type of quote used to start a string must be used to terminate it. Triple-quoted strings capture all the text until the terminating triple quote—as opposed to single-and double-quoted strings which must be specified on one logical line. Triple-quoted strings are useful when the contents of a string literal span multiple lines of text:
print('''Content-type: text/html <h1> Hello World </h1> Click <a href="http://www.python.org">here</a>. ''')
Immediately adjacent string literals are concatenated into a single string. Thus, the above example could also be written as:
print( 'Content-type: text/html\n' '\n' '<h1> Hello World </h1>\n' 'Clock <a href="http://www.python.org">here</a>\n' )
If the opening quotation mark of a string is prefaced by an f, escaped expressions within a string are evaluated. For example, in earlier examples, the following statement was used to output values of a calculation:
print(f'{year:>3d} {principal:0.2f}')
Although this is only using simple variable names, any valid expression can appear. For example:
base_year = 2020 ... print(f'{base_year + year:>4d} {principal:0.2f}')
As an alternative to f-strings, the format() method and % operator are also sometimes used to format strings. For example:
print('{0:>3d} {1:0.2f}'.format(year, principal)) print('%3d %0.2f' % (year, principal))
More information about string formatting is found in Chapter 9.
Strings are stored as sequences of Unicode characters indexed by integers, starting at zero. Negative indices index from the end of the string. The length of a string s is computed using len(s). To extract a single character, use the indexing operator s[i] where i is the index.
a = 'Hello World' print(len(a)) # 11 b = a[4] # b = 'o' c = a[-1] # c = 'd'
To extract a substring, use the slicing operator s[i:j]. It extracts all characters from s whose index k is in the range i <= k < j. If either index is omitted, the beginning or end of the string is assumed, respectively:
c = a[:5] # c = 'Hello' d = a[6:] # d = 'World' e = a[3:8] # e = 'lo Wo' f = a[-5:] # f = 'World'
Strings have a variety of methods for manipulating their contents. For example, the replace() method performs a simple text replacement:
g = a.replace('Hello', 'Hello Cruel') # f = 'Hello Cruel World'
Table 1.6 shows a few common string methods. Here and elsewhere, arguments enclosed in square brackets are optional.
Table 1.6 Common String Methods
Method |
Description |
|---|---|
|
Checks whether a string ends with |
|
Finds the first occurrence of the specified substring |
|
Converts to lowercase. |
|
Replaces a substring. |
|
Splits a string using |
|
Checks whether a string starts with |
|
Removes leading and trailing whitespace or characters supplied in |
|
Converts a string to uppercase. |
Strings are concatenated with the plus (+) operator:
g = a + 'ly' # g = 'Hello Worldly'
Python never implicitly interprets the contents of a string as numerical data. Thus, + always concatenates strings:
x = '37' y = '42' z = x + y # z = '3742' (String Concatenation)
To perform mathematical calculations, a string first has to be converted into a numeric value using a function such as int() or float(). For example:
z = int(x) + int(y) # z = 79 (Integer Addition)
Non-string values can be converted into a string representation by using the str(), repr(), or format() functions. Here’s an example:
s = 'The value of x is ' + str(x) s = 'The value of x is ' + repr(x) s = 'The value of x is ' + format(x, '4d')
Although str() and repr() both create strings, their output is often different. str() produces the output that you get when you use the print() function, whereas repr() creates a string that you type into a program to exactly represent the value of an object. For example:
>>> s = 'hello\nworld' >>> print(str(s)) hello world >>> print(repr(s)) 'hello\nworld' >>>
When debugging, use repr(s) to produce output because it shows you more information about a value and its type.
The format() function is used to convert a single value to a string with a specific formatting applied. For example:
>>> x = 12.34567 >>> format(x, '0.2f') '12.35' >>>
The format code given to format() is the same code you would use with f-strings when producing formatted output. For example, the above code could be replaced by the following:
>>> f'{x:0.2f}'
'12.35'
>>>
The following program opens a file and reads its contents line by line as text strings:
with open('data.txt') as file: for line in file: print(line, end='') # end='' omits the extra newline
The open() function returns a new file object. The with statement that precedes it declares a block of statements (or context) where the file (file) is going to be used. Once control leaves this block, the file is automatically closed. If you don’t use the with statement, the code would need to look like this:
file = open('data.txt') for line in file: print(line, end='') # end='' omits the extra newline file.close()
It’s easy to forget the extra step of calling close() so it’s better to use the with statement and have the file closed for you.
The for loop iterates line-by-line over the file until no more data is available.
If you want to read the file in its entirety as a string, use the read() method like this:
with open('data.txt') as file: data = file.read()
If you want to read a large file in chunks, give a size hint to the read() method as follows:
with open('data.txt') as file: while (chunk := file.read(10000)): print(chunk, end='')
The := operator used in this example assigns to a variable and returns its value so that it can be tested by the while loop to break out. When the end of a file is reached, read() returns an empty string. An alternate way to write the above function is using break:
with open('data.txt') as file: while True: chunk = file.read(10000) if not chunk: break print(chunk, end='')
To make the output of a program go to a file, supply a file argument to the print() function:
with open('out.txt', 'wt') as out: while year <= numyears: principal = principal * (1 + rate) print(f'{year:>3d} {principal:0.2f}', file=out) year += 1
In addition, file objects support a write() method that can be used to write string data. For example, the print() function in the previous example could have been written this way:
out.write(f'{year:3d} {principal:0.2f}\n')
By default, files contain text encoded as UTF-8. If you’re working with a different text encoding, use the extra encoding argument when opening the file. For example:
with open('data.txt', encoding='latin-1') as file: data = file.read()
Sometimes you might want to read data typed interactively in the console. To do that, use the input() function. For example:
name = input('Enter your name : ') print('Hello', name)
The input() function returns all of the typed text up to the terminating newline, which is not included.
Lists are an ordered collection of arbitrary objects. Create a list by enclosing values in square brackets:
names = [ 'Dave', 'Paula', 'Thomas', 'Lewis' ]
Lists are indexed by integers, starting with zero. Use the indexing operator to access and modify individual items of the list:
a = names[2] # Returns the third item of the list, 'Thomas' names[2] = 'Tom' # Changes the third item to 'Tom' print(names[-1]) # Print the last item ('Lewis')
To append new items to the end of a list, use the append() method:
names.append('Alex')
To insert an item in the list at a specific position, use the insert() method:
names.insert(2, 'Aya')
To iterate over the items in a list, use a for loop:
for name in names: print(name)
You can extract or reassign a portion of a list by using the slicing operator:
b = names[0:2] # b -> ['Dave', 'Paula'] c = names[2:] # c -> ['Aya', 'Tom', 'Lewis', 'Alex'] names[1] = 'Becky' # Replaces 'Paula' with 'Becky' names[0:2] = ['Dave', 'Mark', 'Jeff'] # Replace the first two items # with ['Dave','Mark','Jeff']
Use the plus (+) operator to concatenate lists:
a = ['x','y'] + ['z','z','y'] # Result is ['x','y','z','z','y']
An empty list is created in one of two ways:
names = [] # An empty list names = list() # An empty list
Specifying [] for an empty list is more idiomatic. list is the name of the class associated with the list type. It’s more common to see it used when performing conversions of data to a list. For example:
letters = list('Dave') # letters = ['D', 'a', 'v', 'e']
Most of the time, all of the items in a list are of the same type (for example, a list of numbers or a list of strings). However, lists can contain any mix of Python objects, including other lists, as in the following example:
a = [1, 'Dave', 3.14, ['Mark', 7, 9, [100, 101]], 10]
Items contained in nested lists are accessed by applying more than one indexing operation:
a[1] # Returns 'Dave' a[3][2] # Returns 9 a[3][3][1] # Returns 101
The following program pcost.py illustrates how to read data into a list and perform a simple calculation. In this example, lines are assumed to contain comma-separated values. The program computes the sum of the product of two columns.
# pcost.py # # Reads input lines of the form 'NAME,SHARES,PRICE'. # For example: # # SYM,123,456.78 import sys if len(sys.argv) != 2: raise SystemExit(f'Usage: {sys.argv[0]} filename') rows = [] with open(sys.argv[1], 'rt') as file: for line in file: rows.append(line.split(',')) # rows is a list of this form # [ # ['SYM', '123', '456.78'] # ... # ] total = sum([int(row[1]) * float(row[2]) for row in rows ]) print(f'Total cost: {total:0.2f}')
The first line of this program uses the import statement to load the sys module from the Python library. This module is used to obtain command-line arguments which are found in the list sys.argv. The initial check makes sure that a filename has been provided. If not, a SystemExit exception is raised with a helpful error message. In this message, sys.argv[0] inserts the name of the program that’s running.
The open() function uses the filename that was specified on the command line. The for line in file loop is reading the file line by line. Each line is split into a small list using the comma character as a delimiter. This list is appended to rows. The final result, rows, is a list of lists—remember that a list can contain anything including other lists.
The expression [ int(row[1]) * float(row[2]) for row in rows ] constructs a new list by looping over all of the lists in rows and computing the product of the second and third items. This useful technique for constructing a list is known as a list comprehension. The same computation could have been expressed more verbosely as follows:
values = [] for row in rows: values.append(int(row[1]) * float(row[2])) total = sum(values)
As a general rule, list comprehensions are a preferred technique for performing simple calculations. The built-in sum() function computes the sum for all items in a sequence.
To create simple data structures, you can pack a collection of values into an immutable object known as a tuple. You create a tuple by enclosing a group of values in parentheses:
holding = ('GOOG', 100, 490.10) address = ('www.python.org', 80)
For completeness, 0- and 1-element tuples can be defined, but have special syntax:
a = () # 0-tuple (empty tuple) b = (item,) # 1-tuple (note the trailing comma)
The values in a tuple can be extracted by numerical index just like a list. However, it is more common to unpack tuples into a set of variables, like this:
name, shares, price = holding host, port = address
Although tuples support most of the same operations as lists (such as indexing, slicing, and concatenation), the elements of a tuple cannot be changed after creation—that is, you cannot replace, delete, or append new elements to an existing tuple. A tuple is best viewed as a single immutable object that consists of several parts, not as a collection of distinct objects like a list.
Tuples and lists are often used together to represent data. For example, this program shows how you might read a file containing columns of data separated by commas:
# File containing lines of the form "name,shares,price" filename = 'portfolio.csv' portfolio = [] with open(filename) as file: for line in file: row = line.split(',') name = row[0] shares = int(row[1]) price = float(row[2]) holding = (name, shares, price) portfolio.append(holding)
The resulting portfolio list created by this program looks like a two-dimensional array of rows and columns. Each row is represented by a tuple and can be accessed as follows:
>>> portfolio[0]
('AA', 100, 32.2)
>>> portfolio[1]
('IBM', 50, 91.1)
>>>
Individual items of data can be accessed like this:
>>> portfolio[1][1] 50 >>> portfolio[1][2] 91.1 >>>
Here’s how to loop over all of the records and unpack fields into a set of variables:
total = 0.0 for name, shares, price in portfolio: total += shares * price
Alternatively, you could use a list comprehension:
total = sum([shares * price for _, shares, price in portfolio])
When iterating over tuples, the variable _ can be used to indicate a discarded value. In the above calculation, it means we’re ignoring the first item (the name).
A set is an unordered collection of unique objects. Sets are used to find distinct values or to manage problems related to membership. To create a set, enclose a collection of values in curly braces or give an existing collection of items to set(). For example:
names1 = { 'IBM', 'MSFT', 'AA' } names2 = set(['IBM', 'MSFT', 'HPE', 'IBM', 'CAT'])
The elements of a set are typically restricted to immutable objects. For example, you can make a set of numbers, strings, or tuples. However, you can’t make a set containing lists. Most common objects will probably work with a set, however—when in doubt, try it.
Unlike lists and tuples, sets are unordered and cannot be indexed by numbers. Moreover, the elements of a set are never duplicated. For example, if you inspect the value of names2 from the preceding code, you get the following:
>>> names2
{'CAT', 'IBM', 'MSFT', 'HPE'}
>>>
Notice that 'IBM' only appears once. Also, the order of items can’t be predicted; the output may vary from what’s shown. The order might even change between interpreter runs on the same computer.
If working with existing data, you can also create a set using a set comprehension. For example, this statement turns all of the stock names from the data in the previous section into a set:
names = { s[0] for s in portfolio }
To create an empty set, use set() with no arguments:
r = set() # Initially empty set
Sets support a standard collection of operations including union, intersection, difference, and symmetric difference. Here’s an example:
a = t | s # Union {'MSFT', 'CAT', 'HPE', 'AA', 'IBM'} b = t & s # Intersection {'IBM', 'MSFT'} c = t - s # Difference { 'CAT', 'HPE' } d = s - t # Difference { 'AA' } e = t ^ s # Symmetric difference { 'CAT', 'HPE', 'AA' }
The difference operation s - t gives items in s that aren’t in t. The symmetric difference s ^ t gives items that are in either s or t but not in both.
New items can be added to a set using add() or update():
t.add('DIS') # Add a single item s.update({'JJ', 'GE', 'ACME'}) # Adds multiple items to s
An item can be removed using remove() or discard():
t.remove('IBM') # Remove 'IBM' or raise KeyError if absent. s.discard('SCOX') # Remove 'SCOX' if it exists.
The difference between remove() and discard() is that discard() doesn’t raise an exception if the item isn’t present.
A dictionary is a mapping between keys and values. You create a dictionary by enclosing the key-value pairs, each separated by a colon, in curly braces ({ }), like this:
s = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 }
To access members of a dictionary, use the indexing operator as follows:
name = s['name'] cost = s['shares'] * s['price']
Inserting or modifying objects works like this:
s['shares'] = 75 s['date'] = '2007-06-07'
A dictionary is a useful way to define an object that consists of named fields. However, dictionaries are also commonly used as a mapping for performing fast lookups on unordered data. For example, here’s a dictionary of stock prices:
prices = { 'GOOG' : 490.1, 'AAPL' : 123.5, 'IBM' : 91.5, 'MSFT' : 52.13 }
Given such a dictionary, you can look up a price:
p = prices['IBM']
Dictionary membership is tested with the in operator:
if 'IBM' in prices: p = prices['IBM'] else: p = 0.0
This particular sequence of steps can also be performed more compactly using the get() method:
p = prices.get('IBM', 0.0) # prices['IBM'] if it exists, else 0.0
Use the del statement to remove an element of a dictionary:
del prices['GOOG']
Although strings are the most common type of key, you can use many other Python objects, including numbers and tuples. For example, tuples are often used to construct composite or multipart keys:
prices = { } prices[('IBM', '2015-02-03')] = 91.23 prices['IBM', '2015-02-04'] = 91.42 # Parens omitted
Any kind of object can be placed into a dictionary, including other dictionaries. However, mutable data structures such as lists, sets, and dictionaries cannot be used as keys.
Dictionaries are often used as building blocks for various algorithms and data-handling problems. One such problem is tabulation. For example, here’s how you could count the total number of shares for each stock name in earlier data:
portfolio = [ ('ACME', 50, 92.34), ('IBM', 75, 102.25), ('PHP', 40, 74.50), ('IBM', 50, 124.75) ] total_shares = { s[0]: 0 for s in portfolio } for name, shares, _ in portfolio: total_shares[name] += shares # total_shares = {'IBM': 125, 'ACME': 50, 'PHP': 40}
In this example, { s[0]: 0 for s in portfolio } is an example of a dictionary comprehension. It creates a dictionary of key-value pairs from another collection of data. In this case, it’s making an initial dictionary mapping stock names to 0. The for loop that follows iterates over the dictionary and adds up all of the held shares for each stock symbol.
Many common data processing tasks such as this one have already been implemented by library modules. For example, the collections module has a Counter object that can be used for this task:
from collections import Counter total_shares = Counter() for name, shares, _ in portfolio: total_shares[name] += shares # total_shares = Counter({'IBM': 125, 'ACME': 50, 'PHP': 40})
An empty dictionary is created in one of two ways:
prices = {} # An empty dict prices = dict() # An empty dict
It is more idiomatic to use {} for an empty dictionary—although caution is required since it might look like you are trying to create an empty set (use set() instead). dict() is commonly used to create dictionaries from key-value values. For example:
pairs = [('IBM', 125), ('ACME', 50), ('PHP', 40)] d = dict(pairs)
To obtain a list of dictionary keys, convert a dictionary to a list:
syms = list(prices) # syms = ['AAPL', 'MSFT', 'IBM', 'GOOG']
Alternatively, you can obtain the keys using dict.keys():
syms = prices.keys()
The difference between these two methods is that keys() returns a special “keys view” that is attached to the dictionary and actively reflects changes made to the dictionary. For example:
>>> d = { 'x': 2, 'y':3 }
>>> k = d.keys()
>>> k
dict_keys(['x', 'y'])
>>> d['z'] = 4
>>> k
dict_keys(['x', 'y', 'z'])
>>>
The keys always appear in the same order as the items were initially inserted into the dictionary. The list conversion above will preserve this order. This can be useful when dicts are used to represent key-value data read from files and other data sources. The dictionary will preserve the input order. This might help readability and debugging. It’s also nice if you want to write the data back to a file. Prior to Python 3.6, however, this ordering was not guaranteed, so you cannot rely upon it if compatibility with older versions of Python is required. Order is also not guaranteed if multiple deletions and insertions have taken place.
To obtain the values stored in a dictionary, use the dict.values() method. To obtain key-value pairs, use dict.items(). For example, here’s how to iterate over the entire contents of a dictionary as key-value pairs:
for sym, price in prices.items(): print(f'{sym} = {price}')
The most widely used looping construct is the for statement that iterates over a collection of items. One common form of iteration is to loop over all the members of a sequence—such as a string, list, or tuple. Here’s an example:
for n in [1, 2, 3, 4, 5, 6, 7, 8, 9]: print(f'2 to the {n} power is {2**n}')
In this example, the variable n will be assigned successive items from the list [1, 2, 3, 4, ..., 9] on each iteration. Since looping over ranges of integers is quite common, there is a shortcut:
for n in range(1, 10): print(f'2 to the {n} power is {2**n}')
The range(i, j [,step]) function creates an object that represents a range of integers with values from i up to, but not including, j. If the starting value is omitted, it’s taken to be zero. An optional stride can also be given as a third argument. Here are some examples:
a = range(5) # a = 0, 1, 2, 3, 4 b = range(1, 8) # b = 1, 2, 3, 4, 5, 6, 7 c = range(0, 14, 3) # c = 0, 3, 6, 9, 12 d = range(8, 1, -1) # d = 8, 7, 6, 5, 4, 3, 2
The object created by range() computes the values it represents on demand when lookups are requested. Thus, it’s efficient to use even with a large range of numbers.
The for statement is not limited to sequences of integers. It can be used to iterate over many kinds of objects including strings, lists, dictionaries, and files. Here’s an example:
message = 'Hello World' # Print out the individual characters in message for c in message: print(c) names = ['Dave', 'Mark', 'Ann', 'Phil'] # Print out the members of a list for name in names: print(name) prices = { 'GOOG' : 490.10, 'IBM' : 91.50, 'AAPL' : 123.15 } # Print out all of the members of a dictionary for key in prices: print(key, '=', prices[key]) # Print all of the lines in a file with open('foo.txt') as file: for line in file: print(line, end='')
The for loop is one of Python’s most powerful language features because you can create custom iterator objects and generator functions that supply it with sequences of values. More details about iterators and generators can be found later in Chapter 6.
Use the def statement to define a function:
def remainder(a, b): q = a // b # // is truncating division. r = a - q * b return r
To invoke a function, use its name followed by its arguments in parentheses, for example result = remainder(37, 15).
It is common practice for a function to include a documentation string as the first statement. This string feeds the help() command and may be used by IDEs and other development tools to assist the programmer. For example:
def remainder(a, b): ''' Computes the remainder of dividing a by b ''' q = a // b r = a - q * b return r
If the inputs and outputs of a function aren’t clear from their names, they might be annotated with types:
def remainder(a: int, b: int) -> int: ''' Computes the remainder of dividing a by b ''' q = a // b r = a - q * b return r
Such annotations are merely informational and are not actually enforced at runtime. Someone could still call the above function with non-integer values, such as result = remainder(37.5, 3.2).
Use a tuple to return multiple values from a function:
def divide(a, b): q = a // b # If a and b are integers, q is integer r = a - q * b return (q, r)
When multiple values are returned in a tuple, they can be unpacked into separate variables like this:
quotient, remainder = divide(1456, 33)
To assign a default value to a function parameter, use assignment:
def connect(hostname, port, timeout=300): # Function body ...
When default values are given in a function definition, they can be omitted from subsequent function calls. An omitted argument will take on the supplied default value. Here’s an example:
connect('www.python.org', 80) connect('www.python.org', 80, 500)
Default arguments are often used for optional features. If there are many such arguments, readability can suffer. It’s therefore recommended to specify such arguments using keyword arguments. For example:
connect('www.python.org', 80, timeout=500)
If you know the names of the arguments, all of them can be named when calling a function. When named, the order in which they are listed doesn’t matter. For example, this is fine:
connect(port=80, hostname='www.python.org')
When variables are created or assigned inside a function, their scope is local. That is, the variable is only defined inside the body of the function and is destroyed when the function returns. Functions can also access variables defined outside of a function as long as they are defined in the same file. For example:
debug = True # Global variable def read_data(filename): if debug: print('Reading', filename) ...
Scoping rules are described in more detail in Chapter 5.
If an error occurs in your program, an exception is raised and a traceback message appears:
Traceback (most recent call last): File "readport.py", line 9, in <module> shares = int(row[1]) ValueError: invalid literal for int() with base 10: 'N/A'
The traceback message indicates the type of error that occurred, along with its location. Normally, errors cause a program to terminate. However, you can catch and handle exceptions using try and except statements, like this:
portfolio = [] with open('portfolio.csv') as file: for line in file: row = line.split(',') try: name = row[0] shares = int(row[1]) price = float(row[2]) holding = (name, shares, price) portfolio.append(holding) except ValueError as err: print('Bad row:', row) print('Reason:', err)
In this code, if a ValueError occurs, details concerning the cause of the error are placed in err and control passes to the code in the except block. If some other kind of exception is raised, the program crashes as usual. If no errors occur, the code in the except block is ignored. When an exception is handled, program execution resumes with the statement that immediately follows the last except block. The program does not return to the location where the exception occurred.
The raise statement is used to signal an exception. You need to give the name of an exception. For instance, here’s how to raise RuntimeError, a built-in exception:
raise RuntimeError('Computer says no')
Proper management of system resources such as locks, files, and network connections is often tricky when combined with exception handling. Sometimes there are actions that must be performed no matter what happens. For this, use try-finally. Here is an example involving a lock that must be released to avoid deadlock:
import threading lock = threading.Lock() ... lock.acquire() # If a lock has been acquired, it MUST be released try: ... statements ... finally: lock.release() # Always runs
To simplify such programming, most objects that involve resource management also support the with statement. Here is a modified version of the above code:
with lock:
...
statements
...
In this example, the lock object is automatically acquired when the with statement executes. When execution leaves the context of the with block, the lock is automatically released. This is done regardless of what happens inside the with block. For example, if an exception occurs, the lock is released when control leaves the context of the block.
The with statement is normally only compatible with objects related to system resources or the execution environment—such as files, connections, and locks. However, user-defined objects can have their own custom processing, as described further in Chapter 3.
A program terminates when no more statements exist to execute in the input program or when an uncaught SystemExit exception is raised. If you want to force a program to quit, here’s how to do it:
raise SystemExit() # Exit with no error message raise SystemExit("Something is wrong") # Exit with error
On exit, the interpreter makes a best effort to garbage-collect all active objects. However, if you need to perform a specific cleanup action (remove files, close a connection), you can register it with the atexit module as follows:
import atexit # Example connection = open_connection("deaddot.com") def cleanup(): print "Going away..." close_connection(connection) atexit.register(cleanup)
All values used in a program are objects. An object consists of internal data and methods that perform various kinds of operations involving that data. You have already used objects and methods when working with the built-in types such as strings and lists. For example:
items = [37, 42] # Create a list object items.append(73) # Call the append() method
The dir() function lists the methods available on an object. It is a useful tool for interactive experimentation when no fancy IDE is available. For example:
>>> items = [37, 42] >>> dir(items) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', ... 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>>
When inspecting objects, you will see familiar methods such as append() and insert() listed. However, you will also see special methods whose names begin and end with a double underscore. These methods implement various operators. For example, the __add__() method is used to implement the + operator. These methods are explained in more detail in later chapters.
>>> items.__add__([73, 101]) [37, 42, 73, 101] >>>
The class statement is used to define new types of objects and for object-oriented programming. For example, the following class defines a stack with push() and pop() operations:
class Stack: def __init__(self): # Initialize the stack self._items = [ ] def push(self, item): self._items.append(item) def pop(self): return self._items.pop() def __repr__(self): return f'<{type(self).__name__} at 0x{id(self):x}, size={len(self)}>' def __len__(self): return len(self._items)
Inside the class definition, methods are defined using the def statement. The first argument in each method always refers to the object itself. By convention, self is the name used for this argument. All operations involving the attributes of an object must explicitly refer to the self variable. Methods with leading and trailing double underscores are special methods. For example, __init__ is used to initialize an object. In this case, __init__ creates an internal list for storing the stack data.
To use a class, write code such as this:
s = Stack() # Create a stack s.push('Dave') # Push some things onto it s.push(42) s.push([3, 4, 5]) x = s.pop() # x gets [3,4,5] y = s.pop() # y gets 42
Within the class, you will notice that the methods use an internal _items variable. Python does not have any mechanism for hiding or protecting data. However, there is a programming convention wherein names preceded by a single underscore are taken to be “private.” In this example, _items should be treated (by you) as internal implementation and not used outside the Stack class itself. Be aware that there is no actual enforcement of this convention—if you want to access _items, you can do so at any time. You’ll just have to answer to your coworkers when they review your code.
The __repr__() and __len__() methods are there to make the object play nicely with the rest of the environment. Here, __len__() makes a Stack work with the built-in len() function and __repr__() changes the way that a Stack is displayed and printed. It’s a good idea to always define __repr__() as it can simplify debugging.
>>> s = Stack()
>>> s.push('Dave')
>>> s.push(42)
>>> len(s)
2
>>> s
<Stack at 0x10108c1d0, size=2>
>>>
A major feature of objects is that you can add to or redefine the capabilities of existing classes via inheritance.
Suppose you wanted to add a method to swap the top two items on the stack. You might write a class like this:
class MyStack(Stack):
def swap(self):
a = self.pop()
b = self.pop()
self.push(a)
self.push(b)
MyStack is identical to Stack except that it has a new method, swap().
>>> s = MyStack()
>>> s.push('Dave')
>>> s.push(42)
>>> s.swap()
>>> s.pop()
'Dave'
>>> s.pop()
42
>>>
Inheritance can also be used to change the behavior of an existing method. Suppose you want to restrict the stack to only hold numeric data. Write a class like this:
class NumericStack(Stack): def push(self, item): if not isinstance(item, (int, float)): raise TypeError('Expected an int or float') super().push(item)
In this example, the push() method has been redefined to add extra checking. The super() operation is a way to invoke the prior definition of push(). Here’s how this class would work:
>>> s = NumericStack()
>>> s.push(42)
>>> s.push('Dave')
Traceback (most recent call last):
...
TypeError: Expected an int or float
>>>
Often, inheritance is not the best solution. Suppose you wanted to define a simple stack-based 4-function calculator that worked like this:
>>> # Calculate 2 + 3 * 4 >>> calc = Calculator() >>> calc.push(2) >>> calc.push(3) >>> calc.push(4) >>> calc.mul() >>> calc.add() >>> calc.pop() 14 >>>
You might look at this code, see the use of push() and pop(), and think that Calculator could be defined by inheriting from Stack. Although that would work, it is probably better to define Calculator as a completely separate class:
class Calculator: def __init__(self): self._stack = Stack() def push(self, item): self._stack.push(item) def pop(self): return self._stack.pop() def add(self): self.push(self.pop() + self.pop()) def mul(self): self.push(self.pop() * self.pop()) def sub(self): right = self.pop() self.push(self.pop() - right) def div(self): right = self.pop() self.push(self.pop() / right)
In this implementation, a Calculator contains a Stack as an internal implementation detail. This is an example of composition. The push() and pop() methods delegate to the internal Stack. The main reason for taking this approach is that you don’t really think of the Calculator as a Stack. It’s a separate concept—a different kind of object. By analogy, your phone contains a central processing unit (CPU) but you don’t usually think of your phone as a type of CPU.
As your programs grow in size, you will want to break them into multiple files for easier maintenance. To do this, use the import statement. To create a module, put the relevant statements and definitions into a file with a .py suffix and the same name as the module. Here’s an example:
# readport.py # # Reads a file of 'NAME,SHARES,PRICE' data def read_portfolio(filename): portfolio = [] with open(filename) as file: for line in file: row = line.split(',') try: name = row[0] shares = int(row[1]) price = float(row[2]) holding = (name, shares, price) portfolio.append(holding) except ValueError as err: print('Bad row:', row) print('Reason:', err) return portfolio
To use your module in other files, use the import statement. For example, here is a module pcost.py that uses the above read_portfolio() function:
# pcost.py import readport def portfolio_cost(filename): ''' Compute the total shares*price of a portfolio ''' port = readport.read_portfolio(filename) return sum(shares * price for _, shares, price in port)
The import statement creates a new namespace (or environment) and executes all the statements in the associated .py file within that namespace. To access the contents of the namespace after import, use the name of the module as a prefix, as in readport.read_portfolio() in the preceding example.
If the import statement fails with an ImportError exception, you need to check a few things in your environment. First, make sure you created a file called readport.py. Next, check the directories listed on sys.path. If your file isn’t saved in one of those directories, Python won’t be able to find it.
If you want to import a module under a different name, supply the import statement with an optional as qualifier:
import readport as rp port = rp.read_portfolio('portfolio.dat')
To import specific definitions into the current namespace, use the from statement:
from readport import read_portfolio port = read_portfolio('portfolio.dat')
As with objects, the dir() function lists the contents of a module. It is a useful tool for interactive experimentation.
>>> import readport >>> dir(readport) ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'read_portfolio'] ... >>>
Python provides a large standard library of modules that simplify certain programming tasks. For example, the csv module is a standard library for dealing with files of comma-separated values. You could use it in your program as follows:
# readport.py # # Reads a file of 'NAME,SHARES,PRICE' data import csv def read_portfolio(filename): portfolio = [] with open(filename) as file: rows = csv.reader(file) for row in rows: try: name = row[0] shares = int(row[1]) price = float(row[2]) holding = (name, shares, price) portfolio.append(holding) except ValueError as err: print('Bad row:', row) print('Reason:', err) return portfolio
Python also has a vast number of third-party modules that can be installed to solve almost any imaginable task (including the reading of CSV files). See https://pypi.org.
Any file can execute either as a script or as a library imported with import. To better support imports, script code is often enclosed with a conditional check against the module name:
# readport.py # # Reads a file of 'NAME,SHARES,PRICE' data import csv def read_portfolio(filename): ... def main(): portfolio = read_portfolio('portfolio.csv') for name, shares, price in portfolio: print(f'{name:>10s} {shares:10d} {price:10.2f}') if __name__ == '__main__': main()
__name__ is a built-in variable that always contains the name of the enclosing module. If a program is run as the main script with a command such as python readport.py, the __name__ variable is set to '__main__'. Otherwise, if the code is imported using a statement such as import readport, the __name__ variable is set to 'readport'.
As shown, the program is hardcoded to use a filename 'portfolio.csv'. Instead, you may want to prompt the user for a filename or accept the filename as a command-line argument. To do this, use the built-in input() function or the sys.argv list. For example, here is a modified version of the main() function:
def main(argv): if len(argv) == 1: filename = input('Enter filename: ') elif len(argv) == 2: filename = argv[1] else: raise SystemExit(f'Usage: {argv[0]} [ filename ]') portfolio = read_portfolio(filename) for name, shares, price in portfolio: print(f'{name:>10s} {shares:10d} {price:10.2f}') if __name__ == '__main__': import sys main(sys.argv)
This program can be run in two different ways from the command line:
bash % python readport.py Enter filename: portfolio.csv ... bash % python readport.py portfolio.csv ... bash % python readport.py a b c Usage: readport.py [ filename ] bash %
For very simple programs, it is often enough to process arguments in sys.argv as shown. For more advanced usage, the argparse standard library module can be used.
In large programs, it’s common to organize code into packages. A package is a hierarchical collection of modules. On the filesystem, put your code as a collection of files in a directory like this:
tutorial/
__init__.py
readport.py
pcost.py
stack.py
...
The directory should have an __init__.py file, which may be empty. Once you’ve done this, you should be able to make nested import statements. For example:
import tutorial.readport port = tutorial.readport.read_portfolio('portfolio.dat')
If you don’t like the long names, you can shorten things using an import like this:
from tutorial.readport import read_portfolio port = read_portfolio('portfolio.dat')
One tricky issue with packages is imports between files within the same package. In an earlier example, a pcost.py module was shown that started with an import like this:
# pcost.py
import readport
...
If the pcost.py and readport.py files are moved into a package, this import statement breaks. To fix it, you must use a fully qualified module import:
# pcost.py
from tutorial import readport
...
Alternatively, you can use a package-relative import like this:
# pcost.py
from . import readport
...
The latter form has the benefit of not hardcoding the package name. This makes it easier to later rename a package or move it around within your project.
Other subtle details concerning packages are covered later (see Chapter 8).
As you start to write more Python code, you may find yourself working on larger applications that include a mix of your own code as well as third-party dependencies. Managing all of this is a complex topic that continues to evolve. There are also many conflicting opinions about what constitutes “best practice.” However, there are a few essential facets to it that you should know.
First, it is standard practice to organize large code bases into packages (that is, directories of .py files that include the special __init__.py file). When doing this, pick a unique package name for the top-level directory name. The primary purpose of the package directory is to manage import statements and the namespaces of modules used while programming. You want your code isolated from everyone else’s code.
In addition to your main project source code, you might additionally have tests, examples, scripts, and documentation. This additional material usually lives in a separate set of directories than the package containing your source code. Thus, it is common to create an enclosing top-level directory for your project and to put all of your work under that. For example, a fairly typical project organization might look like this:
tutorial-project/
tutorial/
__init__.py
readport.py
pcost.py
stack.py
...
tests/
test_stack.py
test_pcost.py
...
examples/
sample.py
...
doc/
tutorial.txt
...
Keep in mind that there is more than one way to do it. The nature of the problem you’re solving might dictate a different structure. Nevertheless, as long as your main set of source code files lives in a proper package (again, the directory with the __init__.py file), you should be fine.
Python has a large library of contributed packages that can be found at the Python Package Index (https://pypi.org). You may need to depend on some of these packages in your own code. To install a third-party package, use a command such as pip:
bash % python3 -m pip install somepackage
Installed packages are placed into a special site-packages directory that you can find if you inspect the value of sys.path. For example, on a UNIX machine, packages might be placed in /usr/local/lib/python3.8/site-packages. If you are ever left wondering where a package is coming from, inspect the __file__ attribute of a package after importing it in the interpreter:
>>> import pandas >>> pandas.__file__ '/usr/local/lib/python3.8/site-packages/pandas/__init__.py' >>>
One potential problem with installing a package is that you might not have permission to change the locally installed version of Python. Even if you had permission, it still might not be a good idea. For example, many systems already have Python installed for use by various system utilities. Altering the installation of that version of Python is often a bad idea.
To make a sandbox where you can install packages and work without worrying about breaking anything, create a virtual environment by a command like this:
bash % python3 -m venv myproject
This will set up a dedicated Python installation for you in a directory called myproject/. Within that directory, you’ll find an interpreter executable and library where you can safely install packages. For example, if you run myproject/bin/python3, you’ll get an interpreter configured for your personal use. You can install packages into this interpreter without worrying about breaking any part of the default Python installation. To install a package, use pip as before but make sure to specify the correct interpreter:
bash % ./myproject/bin/python3 -m pip install somepackage
There are various tools that aim to simplify the use of pip and venv. Such matters might also be handled automagically by your IDE. As this is a fluid and ever-evolving part of Python, no further advice is given here.
In the early days of Python, “it fits your brain” was a common motto. Even today, the core of Python is a small programming language along with a useful collection of built-in objects—lists, sets, and dictionaries. A vast array of practical problems can be solved using nothing more than the basic features presented in this chapter. This is a good thing to keep in mind as you begin your Python adventure—although there are always more complicated ways to solve a problem, there might also be a simple way to do it with the basic features Python already provides. When in doubt, you’ll probably thank your past self for doing just that.