One of my goals for this book has been to teach you as little Python as possible. When there were two ways to do something, I picked one and avoided mentioning the other. Or sometimes I put the second one into an exercise.
Now I want to go back for some of the good bits that got left behind. Python provides a number of features that are not really necessary—you can write good code without them—but with them you can write code that’s more concise, readable, or efficient, and sometimes all three.
Python provides a class called set that represents a collection of unique elements. To create an empty set, we can use the class object like a function:
s1=set()s1
set()
We can use the add method to add elements:
s1.add('a')s1.add('b')s1
{'a', 'b'}
Or we can pass any kind of sequence to set:
s2=set('acd')s2
{'a', 'c', 'd'}
An element can only appear once in a set. If you add an element that’s already there, it has no effect:
s1.add('a')s1
{'a', 'b'}
Or if you create a set with a sequence that contains duplicates, the result contains only unique elements:
set('banana')
{'a', 'b', 'n'}
Some of the exercises in this book can be done concisely and efficiently with sets. For example, here is a solution to an exercise in Chapter 11 that uses a dictionary to check whether there are any duplicate elements in a sequence:
defhas_duplicates(t):d={}forxint:d[x]=Truereturnlen(d)<len(t)
This version adds the element of t as keys in a dictionary, and then checks whether there are fewer keys than elements. Using sets, we can write the same function like this:
defhas_duplicates(t):s=set(t)returnlen(s)<len(t)
An element can only appear in a set once, so if an element in t appears more than once, the set will be smaller than t. If there are no duplicates, the set will be the same size as t.
set objects provide methods that perform set operations. For example, union computes the union of two sets, which is a new set that contains all elements that appear in either set:
s1.union(s2)
{'a', 'b', 'c', 'd'}
Some arithmetic operators work with sets. For example, the - operator performs set subtraction—the result is a new set that contains all elements from the first set that are not in the second set:
s1-s2
{'b'}
In “Dictionary Subtraction” we used dictionaries to find the words that appear in a document but not in a word list. We used the following function, which takes two dictionaries and returns a new dictionary that contains only the keys from the first that don’t appear in the second:
defsubtract(d1,d2):res={}forkeyind1:ifkeynotind2:res[key]=d1[key]returnres
With sets, we don’t have to write this function ourselves. If word_counter is a dictionary that contains the unique words in the document and word_list is a list of valid words, we can compute the set difference like this:
set(word_counter)-set(word_list)
The result is a set that contains the words in the document that don’t appear in the word list.
The comparison operators work with sets. For example, <= checks whether one set is a subset of another, including the possibility that they are equal:
set('ab')<=set('abc')
True
With these operators, we can use sets to do some of the exercises in Chapter 7. For example, here’s a version of uses_only that uses a loop:
defuses_only(word,available):forletterinword:ifletternotinavailable:returnFalsereturnTrue
uses_only checks whether all letters in word are in available. With sets, we can rewrite it like this:
defuses_only(word,available):returnset(word)<=set(available)
If the letters in word are a subset of the letters in available, that means that word uses only the letters in available.
A Counter is like a set, except that if an element appears more than once, the Counter keeps track of how many times it appears. If you are familiar with the mathematical idea of a “multiset,” a Counter is a natural way to represent a multiset.
The Counter class is defined in a standard module called collections, so you have to import it. Then you can use the class object as a function and pass as an argument a string, list, or any other kind of sequence:
fromcollectionsimportCountercounter=Counter('banana')counter
Counter({'a': 3, 'n': 2, 'b': 1})
fromcollectionsimportCountert=(1,1,1,2,2,3)counter=Counter(t)counter
Counter({1: 3, 2: 2, 3: 1})
A Counter object is like a dictionary that maps from each key to the number of times it appears. As in dictionaries, the keys have to be hashable.
Unlike dictionaries, Counter objects don’t raise an exception if you access an element that doesn’t appear. Instead, they return 0:
counter['d']
0
We can use Counter objects to solve one of the exercises from Chapter 10, which asks for a function that takes two words and checks whether they are anagrams—that is, whether the letters from one can be rearranged to spell the other.
Here’s a solution using Counter objects:
defis_anagram(word1,word2):returnCounter(word1)==Counter(word2)
If two words are anagrams, they contain the same letters with the same counts, so their Counter objects are equivalent.
Counter provides a method called most_common that returns a list of value-frequency pairs, sorted from most common to least:
counter.most_common()
[('a', 3), ('n', 2), ('b', 1)]
They also provide methods and operators to perform set-like operations, including addition, subtraction, union, and intersection. For example, the + operator combines two Counter objects and creates a new Counter that contains the keys from both and the sums of the counts.
We can test it by making a Counter with the letters from 'bans' and adding it to the letters from 'banana':
counter2=Counter('bans')counter+counter2
Counter({'a': 4, 'n': 3, 'b': 2, 's': 1})
You’ll have a chance to explore other Counter operations in the exercises at the end of this chapter.
The collections module also provides defaultdict, which is like a dictionary except that if you access a key that doesn’t exist, it generates a new value automatically.
When you create a defaultdict, you provide a function that’s used to create new values. A function that creates objects is sometimes called a factory. The built-in functions that create lists, sets, and other types can be used as factories.
For example, here’s a defaultdict that creates a new list when needed:
fromcollectionsimportdefaultdictd=defaultdict(list)d
defaultdict(list, {})
Notice that the argument is list, which is a class object, not list(), which is a function call that creates a new list. The factory function doesn’t get called unless we access a key that doesn’t exist:
t=d['new key']t
[]
The new list, which we’re calling t, is also added to the dictionary. So if we modify t, the change appears in d:
t.append('new value')d['new key']
['new value']
If you are making a dictionary of lists, you can often write simpler code using defaultdict.
In one of the exercises in Chapter 11, I made a dictionary that maps from a sorted string of letters to the list of words that can be spelled with those letters. For example, the string 'opst' maps to the list ['opts', 'post', 'pots', 'spot', 'stop', 'tops'].
Here’s the original code:
defall_anagrams(filename):d={}forlineinopen(filename):word=line.strip().lower()t=signature(word)iftnotind:d[t]=[word]else:d[t].append(word)returnd
And here’s a simpler version using a defaultdict:
defall_anagrams(filename):d=defaultdict(list)forlineinopen(filename):word=line.strip().lower()t=signature(word)d[t].append(word)returnd
In the exercises at the end of the chapter, you’ll have a chance to practice using defaultdict objects:
fromcollectionsimportdefaultdictd=defaultdict(list)key=('into','the')d[key].append('woods')d[key]
['woods']
Conditional statements are often used to choose one of two values, like this:
ifx>0:y=math.log(x)else:y=float('nan')
This statement checks whether x is positive. If so, it computes its logarithm. If not, math.log would raise a ValueError. To avoid stopping the program, we generate a NaN, which is a special floating-point value that represents “Not a Number.”
We can write this statement more concisely using a conditional expression:
y=math.log(x)ifx>0elsefloat('nan')
You can almost read this line like English: “y gets log-x if x is greater than 0; otherwise, it gets NaN.”
Recursive functions can sometimes be written concisely using conditional expressions. For example, here is a version of factorial with a conditional statement:
deffactorial(n):ifn==0:return1else:returnn*factorial(n-1)
And here’s a version with a conditional expression:
deffactorial(n):return1ifn==0elsen*factorial(n-1)
Another use of conditional expressions is handling optional arguments. For example, here is class definition with an __init__ method that uses a conditional statement to check a parameter with a default value:
classKangaroo:def__init__(self,name,contents=None):self.name=nameifcontentsisNone:contents=[]self.contents=contents
Here’s a version that uses a conditional expression:
def__init__(self,name,contents=None):self.name=nameself.contents=[]ifcontentsisNoneelsecontents
In general, you can replace a conditional statement with a conditional expression if both branches contain a single expression and no statements.
In previous chapters, we’ve seen a few examples where we start with an empty list and add elements, one at a time, using the append method. For example, suppose we have a string that contains the title of a movie, and we want to capitalize all of the words:
title='monty python and the holy grail'
We can split it into a list of strings, loop through the strings, capitalize them, and append them to a list:
t=[]forwordintitle.split():t.append(word.capitalize())' '.join(t)
'Monty Python And The Holy Grail'
We can do the same thing more concisely using a list comprehension:
t=[word.capitalize()forwordintitle.split()]' '.join(t)
'Monty Python And The Holy Grail'
The bracket operators indicate that we are constructing a new list. The expression inside the brackets specifies the elements of the list, and the for clause indicates what sequence we are looping through.
The syntax of a list comprehension might seem strange, because the loop variable—word in this example—appears in the expression before we get to its definition. But you get used to it.
As another example, in “Making a Word List” we used this loop to read words from a file and append them to a list:
word_list=[]forlineinopen('words.txt'):word=line.strip()word_list.append(word)
Here’s how we can write that as a list comprehension:
word_list=[line.strip()forlineinopen('words.txt')]
A list comprehension can also have an if clause that determines which elements are included in the list. For example, here’s a for loop we used in “Accumulating a List” to make a list of only the words in word_list that are palindromes:
palindromes=[]forwordinword_list:ifis_palindrome(word):palindromes.append(word)
Here’s how we can do the same thing with a list comprehension:
palindromes=[wordforwordinword_listifis_palindrome(word)]
When a list comprehension is used as an argument to a function, we can often omit the brackets. For example, suppose we want to add up for values of from 0 to 9. We can use a list comprehension like this:
sum([1/2**nforninrange(10)])
1.998046875
Or we can leave out the brackets like this:
sum(1/2**nforninrange(10))
1.998046875
In this example, the argument is technically a generator expression, not a list comprehension, and it never actually makes a list. But other than that, the behavior is the same.
List comprehensions and generator expressions are concise and easy to read, at least for simple expressions. And they are usually faster than the equivalent for loops, sometimes much faster. So if you are mad at me for not mentioning them earlier, I understand.
But, in my defense, list comprehensions are harder to debug because you can’t put a print statement inside the loop. I suggest you use them only if the computation is simple enough that you are likely to get it right the first time. Or consider writing and debugging a for loop and then converting it to a list comprehension.
Python provides a built-in function, any, that takes a sequence of boolean values and returns True if any of the values are True:
any([False,False,True])
True
any is often used with generator expressions:
any(letter=='t'forletterin'monty')
True
That example isn’t very useful because it does the same thing as the in operator. But we could use any to write concise solutions to some of the exercises in Chapter 7. For example, we can write uses_none like this:
defuses_none(word,forbidden):"""Checks whether a word avoids forbidden letters."""returnnotany(letterinforbiddenforletterinword)
This function loops through the letters in word and checks whether any of them are in forbidden. Using any with a generator expression is efficient because it stops immediately if it finds a True value, so it doesn’t have to loop through the whole sequence.
Python provides another built-in function, all, that returns True if every element of the sequence is True. We can use it to write a concise version of uses_all:
defuses_all(word,required):"""Check whether a word uses all required letters."""returnall(letterinwordforletterinrequired)
Expressions using any and all can be concise, efficient, and easy to read.
The collections module provides a function called namedtuple that can be used to create simple classes. For example, the Point object in “Creating a Point” has only two attributes, x and y.
Here’s how we defined it:
classPoint:"""Represents a point in 2-D space."""def__init__(self,x,y):self.x=xself.y=ydef__str__(self):returnf'({self.x},{self.y})'
That’s a lot of code to convey a small amount of information. namedtuple provides a more concise way to define classes like this:
fromcollectionsimportnamedtuplePoint=namedtuple('Point',['x','y'])
The first argument is the name of the class you want to create. The second is a list of the attributes Point objects should have. The result is a class object, which is why it is assigned to a capitalized variable name.
A class created with namedtuple provides an __init__ method that assigns values to the attributes and a __str__ that displays the object in a readable form. So we can create and display a Point object like this:
p=Point(1,2)p
Point(x=1, y=2)
Point also provides an __eq__ method that checks whether two Point objects are equivalent—that is, whether their attributes are the same:
p==Point(1,2)
True
You can access the elements of a named tuple by name or by index:
p.x,p.y
(1, 2)
p[0],p[1]
(1, 2)
You can also treat a named tuple as a tuple, as in this assignment:
x,y=px,y
(1, 2)
But namedtuple objects are immutable. After the attributes are initialized, they can’t be changed:
p[0]=3
TypeError: 'Point' object does not support item assignment
p.x=3
AttributeError: can't set attribute
namedtuple provides a quick way to define simple classes. The drawback is that simple classes don’t always stay simple. You might decide later that you want to add methods to a named tuple. In that case, you can define a new class that inherits from the named tuple:
classPointier(Point):"""This class inherits from Point"""
Or at that point you could switch to a conventional class definition.
In “Argument Packing”, we wrote a function that packs its arguments into a tuple:
defmean(*args):returnsum(args)/len(args)
You can call this function with any number of positional arguments:
mean(1,2,3)
2.0
But the * operator doesn’t pack keyword arguments. So calling this function with a keyword argument causes an error:
mean(1,2,start=3)
TypeError: mean() got an unexpected keyword argument 'start'
To pack keyword arguments, we can use the ** operator:
defmean(*args,**kwargs):(kwargs)returnsum(args)/len(args)
The keyword-packing parameter can have any name, but kwargs is a common choice. The result is a dictionary that maps from keywords to values:
mean(1,2,start=3)
{'start': 3}
1.5
In this example, the value of kwargs is printed, but otherwise is has no effect.
But the ** operator can also be used in an argument list to unpack a dictionary. For example, here’s a version of mean that packs any keyword arguments it gets and then unpacks them as keyword arguments for sum:
defmean(*args,**kwargs):returnsum(args,**kwargs)/len(args)
Now if we call mean with start as a keyword argument, it gets passed along to sum, which uses it as the starting point of the summation. In the next example, start=3 adds 3 to the sum before computing the mean, so the sum is 6 and the result is 3:
mean(1,2,start=3)
3.0
As another example, if we have a dictionary with keys x and y, we can use it with the unpack operator to create a Point object:
d=dict(x=1,y=2)Point(**d)
Point(x=1, y=2)
Without the unpack operator, d is treated as a single positional argument, so it gets assigned to x, and we get a TypeError because there’s no second argument to assign to y:
d=dict(x=1,y=2)Point(d)
TypeError: Point.__new__() missing 1 required positional argument: 'y'
When you are working with functions that have a large number of keyword arguments, it is often useful to create and pass around dictionaries that specify frequently used options:
defpack_and_print(**kwargs):(kwargs)pack_and_print(a=1,b=2)
{'a': 1, 'b': 2}
In previous chapters, we used doctest to test functions. For example, here’s a function called add that takes two numbers and returns their sum. In includes a doctest that checks whether 2 + 2 is 4:
defadd(a,b):'''Add two numbers.>>> add(2, 2)4'''returna+b
This function takes a function object and runs its doctests:
fromdoctestimportrun_docstring_examplesdefrun_doctests(func):run_docstring_examples(func,globals(),name=func.__name__)
So we can test add like this:
run_doctests(add)
There’s no output, which means all tests passed.
Python provides another tool for running automated tests, called unittest. It is a little more complicated to use, but here’s an example:
fromunittestimportTestCaseclassTestExample(TestCase):deftest_add(self):result=add(2,2)self.assertEqual(result,4)
First, we import TestCase, which is a class in the unittest module. To use it, we have to define a new class that inherits from TestCase and provides at least one test method. The name of the test method must begin with test and should indicate which function it tests.
In this example, test_add tests the add function by calling it, saving the result, and invoking assertEqual, which is inherited from TestCase. assertEqual takes two arguments and checks whether they are equal.
In order to run this test method, we have to run a function in unittest called main and provide several keyword arguments. The following function shows the details—if you are curious, ask a virtual assistant to explain how it works:
importunittestdefrun_unittest():unittest.main(argv=[''],verbosity=0,exit=False)
run_unittest does not take TestExample as an argument—instead, it searches for classes that inherit from TestCase. Then it searches for methods that begin with test and runs them. This process is called test discovery.
Here’s what happens when we call run_unittest:
run_unittest()
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
unittest.main reports the number of tests it ran and the results. In this case OK indicates that the tests passed. To see what happens when a test fails, we’ll add an incorrect test method to TestExample:
%%add_method_toTestExampledeftest_add_broken(self):result=add(2,2)self.assertEqual(result,100)
Here’s what happens when we run the tests:
run_unittest()
======================================================================
FAIL: test_add_broken (__main__.TestExample)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/ipykernel_29273/3833266738.py", line 3, in test_add_broken
self.assertEqual(result, 100)
AssertionError: 4 != 100
----------------------------------------------------------------------
Ran 2 tests in 0.000s
FAILED (failures=1)
The report includes the test method that failed and an error message showing where. The summary indicates that two tests ran and one failed.
In the following exercises, I’ll suggest some prompts you can use to ask a virtual assistant for more information about unittest.
factory: A function used to create objects, often passed as a parameter to a function.
conditional expression: An expression that uses a conditional to select one of two values.
list comprehension: A concise way to loop through a sequence and create a list.
generator expression: Similar to a list comprehension except that it does not create a list.
test discovery: A process used to find and run tests.
There are a few topics in this chapter you might want to learn about. Here are some questions to ask a virtual assistant:
“What are the methods and operators of Python’s set class?”
“What are the methods and operators of Python’s counter class?”
“What is the difference between a Python list comprehension and a generator expression?”
“When should I use Python’s namedtuple rather than define a new class?”
“What are some uses of packing and unpacking keyword arguments?”
“How does unittest do test discovery?”
“Along with assertequal, what are the most commonly used methods in unittest.TestCase?”
“What are the pros and cons of doctest and unittest?”
For the following exercises, consider asking a virtual assistant for help, but as always, remember to test the results.
One of the exercises in Chapter 7 asks for a function called uses_none that takes a word and a string of forbidden letters, and returns True if the word does not use any of the letters. Here’s a solution:
defuses_none(word,forbidden):forletterinword.lower():ifletterinforbidden.lower():returnFalsereturnTrue
Write a version of this function that uses set operations instead of a for loop. Hint: ask a virtual assistant “How do I compute the intersection of Python sets?”
Scrabble is a board game where the objective is to use letter tiles to spell words. For example, if we have tiles with the letters T, A, B, L, E, we can spell BELT and LATE using a subset of the tiles—but we can’t spell BEET because we don’t have two Es.
Write a function that takes a string of letters and a word, and checks whether the letters can spell the word, taking into account how many times each letter appears.
In one of the exercises from Chapter 17, my solution to has_straightflush uses the following method, which partitions a PokerHand into a list of four hands, where each hand contains cards of the same suit:
defpartition(self):"""Make a list of four hands, each containing only one suit."""hands=[]foriinrange(4):hands.append(PokerHand())forcardinself.cards:hands[card.suit].add_card(card)returnhands
Write a simplified version of this function using a defaultdict.
Here’s the function from Chapter 11 that computes Fibonacci numbers:
deffibonacci(n):ifn==0:return0ifn==1:return1returnfibonacci(n-1)+fibonacci(n-2)
Write a version of this function with a single return statement that uses two conditional expressions, one nested inside the other.
The following is a function that recursively computes the binomial coefficient:
defbinomial_coeff(n,k):"""Compute the binomial coefficient "n choose k".n: number of trialsk: number of successesreturns: int"""ifk==0:return1ifn==0:return0returnbinomial_coeff(n-1,k)+binomial_coeff(n-1,k-1)
Rewrite the body of the function using nested conditional expressions.
This function is not very efficient because it ends up computing the same values over and over. Make it more efficient by memoizing it, as described in “Memos”:
binomial_coeff(10,4)# should be 210
210
Here’s the __str__ method from the Deck class in “Printing the Deck”:
%%add_method_toDeckdef__str__(self):res=[]forcardinself.cards:res.append(str(card))return'\n'.join(res)
Write a more concise version of this method with a list comprehension or generator expression.