10
Files and Exceptions

Now that you’ve mastered the basic skills you need to write organized programs that are easy to use, it’s time to think about making your programs even more relevant and usable. In this chapter, you’ll learn to work with files so your programs can quickly analyze lots of data.

You’ll learn to handle errors so your programs don’t crash when they encounter unexpected situations. You’ll learn about exceptions, which are special objects Python creates to manage errors that arise while a program is running. You’ll also learn about the json module, which allows you to save user data so it isn’t lost when your program stops running.

Learning to work with files and save data will make your programs easier for people to use. Users will be able to choose what data to enter and when to enter it. People will be able to run your program, do some work, and then close the program and pick up where they left off. Learning to handle exceptions will help you deal with situations in which files don’t exist and deal with other problems that can cause your programs to crash. This will make your programs more robust when they encounter bad data, whether it comes from innocent mistakes or from malicious attempts to break your programs. With the skills you’ll learn in this chapter, you’ll make your programs more applicable, usable, and stable.

Reading from a File

An incredible amount of data is available in text files. Text files can contain weather data, traffic data, socioeconomic data, literary works, and more. Reading from a file is particularly useful in data analysis applications, but it’s also applicable to any situation in which you want to analyze or modify information stored in a file. For example, you can write a program that reads in the contents of a text file and rewrites the file with formatting that allows a browser to display it.

When you want to work with the information in a text file, the first step is to read the file into memory. You can then work through all of the file’s contents at once or work through the contents line by line.

Reading the Contents of a File

To begin, we need a file with a few lines of text in it. Let’s start with a file that contains pi to 30 decimal places, with 10 decimal places per line:

pi_digits.txt

3.1415926535
  8979323846
  2643383279

To try the following examples yourself, you can enter these lines in an editor and save the file as pi_digits.txt, or you can download the file from the book’s resources through https://ehmatthes.github.io/pcc_3e. Save the file in the same directory where you’ll store this chapter’s programs.

Here’s a program that opens this file, reads it, and prints the contents of the file to the screen:

file_reader.py

from pathlib import Path

❶ path = Path('pi_digits.txt')
❷ contents = path.read_text()
print(contents)

To work with the contents of a file, we need to tell Python the path to the file. A path is the exact location of a file or folder on a system. Python provides a module called pathlib that makes it easier to work with files and directories, no matter which operating system you or your program’s users are working with. A module that provides specific functionality like this is often called a library, hence the name pathlib.

We start by importing the Path class from pathlib. There’s a lot you can do with a Path object that points to a file. For example, you can check that the file exists before working with it, read the file’s contents, or write new data to the file. Here, we build a Path object representing the file pi_digits.txt, which we assign to the variable path ❶. Since this file is saved in the same directory as the .py file we’re writing, the filename is all that Path needs to access the file.

Once we have a Path object representing pi_digits.txt, we use the read_text() method to read the entire contents of the file ❷. The contents of the file are returned as a single string, which we assign to the variable contents. When we print the value of contents, we see the entire contents of the text file:

3.1415926535
  8979323846
  2643383279

The only difference between this output and the original file is the extra blank line at the end of the output. The blank line appears because read_text() returns an empty string when it reaches the end of the file; this empty string shows up as a blank line.

We can remove the extra blank line by using rstrip() on the contents string:

from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()
contents = contents.rstrip()
print(contents)

Recall from Chapter 2 that Python’s rstrip() method removes, or strips, any whitespace characters from the right side of a string. Now the output matches the contents of the original file exactly:

3.1415926535
  8979323846
  2643383279

We can strip the trailing newline character when we read the contents of the file, by applying the rstrip() method immediately after calling read_text():

contents = path.read_text().rstrip()

This line tells Python to call the read_text() method on the file we’re working with. Then it applies the rstrip() method to the string that read_text() returns. The cleaned-up string is then assigned to the variable contents. This approach is called method chaining, and you’ll see it used often in programming.

Relative and Absolute File Paths

When you pass a simple filename like pi_digits.txt to Path, Python looks in the directory where the file that’s currently being executed (that is, your .py program file) is stored.

Sometimes, depending on how you organize your work, the file you want to open won’t be in the same directory as your program file. For example, you might store your program files in a folder called python_work; inside python_work, you might have another folder called text_files to distinguish your program files from the text files they’re manipulating. Even though text_files is in python_work, just passing Path the name of a file in text_files won’t work, because Python will only look in python_work and stop there; it won’t go on and look in text_files. To get Python to open files from a directory other than the one where your program file is stored, you need to provide the correct path.

There are two main ways to specify paths in programming. A relative file path tells Python to look for a given location relative to the directory where the currently running program file is stored. Since text_files is inside python_work, we need to build a path that starts with the directory text_files, and ends with the filename. Here’s how to build this path:

path = Path('text_files/filename.txt')

You can also tell Python exactly where the file is on your computer, regardless of where the program that’s being executed is stored. This is called an absolute file path. You can use an absolute path if a relative path doesn’t work. For instance, if you’ve put text_files in some folder other than python_work, then just passing Path the path 'text_files/filename.txt' won’t work because Python will only look for that location inside python_work. You’ll need to write out an absolute path to clarify where you want Python to look.

Absolute paths are usually longer than relative paths, because they start at your system’s root folder:

path = Path('/home/eric/data_files/text_files/filename.txt')

Using absolute paths, you can read files from any location on your system. For now it’s easiest to store files in the same directory as your program files, or in a folder such as text_files within the directory that stores your program files.

Accessing a File’s Lines

When you’re working with a file, you’ll often want to examine each line of the file. You might be looking for certain information in the file, or you might want to modify the text in the file in some way. For example, you might want to read through a file of weather data and work with any line that includes the word sunny in the description of that day’s weather. In a news report, you might look for any line with the tag <headline> and rewrite that line with a specific kind of formatting.

You can use the splitlines() method to turn a long string into a set of lines, and then use a for loop to examine each line from a file, one at a time:

file_reader.py

from pathlib import Path

path = Path('pi_digits.txt')
❶ contents = path.read_text()

❷ lines = contents.splitlines()
for line in lines:
    print(line)

We start out by reading the entire contents of the file, as we did earlier ❶. If you’re planning to work with the individual lines in a file, you don’t need to strip any whitespace when reading the file. The splitlines() method returns a list of all lines in the file, and we assign this list to the variable lines ❷. We then loop over these lines and print each one:

3.1415926535
  8979323846
  2643383279

Since we haven’t modified any of the lines, the output matches the original text file exactly.

Working with a File’s Contents

After you’ve read the contents of a file into memory, you can do whatever you want with that data, so let’s briefly explore the digits of pi. First, we’ll attempt to build a single string containing all the digits in the file with no whitespace in it:

pi_string.py

from pathlib import Path

path = Path('pi_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ''
❶ for line in lines:
    pi_string += line

print(pi_string)
print(len(pi_string))

We start by reading the file and storing each line of digits in a list, just as we did in the previous example. We then create a variable, pi_string, to hold the digits of pi. We write a loop that adds each line of digits to pi_string ❶. We print this string, and also show how long the string is:

3.1415926535  8979323846  2643383279
36

The variable pi_string contains the whitespace that was on the left side of the digits in each line, but we can get rid of that by using lstrip() on each line:

--snip--
for line in lines:
    pi_string += line.lstrip()

print(pi_string)
print(len(pi_string))

Now we have a string containing pi to 30 decimal places. The string is 32 characters long because it also includes the leading 3 and a decimal point:

3.141592653589793238462643383279
32

Large Files: One Million Digits

So far, we’ve focused on analyzing a text file that contains only three lines, but the code in these examples would work just as well on much larger files. If we start with a text file that contains pi to 1,000,000 decimal places, instead of just 30, we can create a single string containing all these digits. We don’t need to change our program at all, except to pass it a different file. We’ll also print just the first 50 decimal places, so we don’t have to watch a million digits scroll by in the terminal:

pi_string.py

from pathlib import Path

path = Path('pi_million_digits.txt')
contents = path.read_text()

lines = contents.splitlines()
pi_string = ''
for line in lines:
    pi_string += line.lstrip()

print(f"{pi_string[:52]}...")
print(len(pi_string))

The output shows that we do indeed have a string containing pi to 1,000,000 decimal places:

3.14159265358979323846264338327950288419716939937510...
1000002

Python has no inherent limit to how much data you can work with; you can work with as much data as your system’s memory can handle.

Is Your Birthday Contained in Pi?

I’ve always been curious to know if my birthday appears anywhere in the digits of pi. Let’s use the program we just wrote to find out if someone’s birthday appears anywhere in the first million digits of pi. We can do this by expressing each birthday as a string of digits and seeing if that string appears anywhere in pi_string:

pi_birthday.py

--snip--
for line in lines:
    pi_string += line.strip()

birthday = input("Enter your birthday, in the form mmddyy: ")
if birthday in pi_string:
    print("Your birthday appears in the first million digits of pi!")
else:
    print("Your birthday does not appear in the first million digits of pi.")

We first prompt for the user’s birthday, and then check if that string is in pi_string. Let’s try it:

Enter your birthdate, in the form mmddyy: 120372
Your birthday appears in the first million digits of pi!

My birthday does appear in the digits of pi! Once you’ve read from a file, you can analyze its contents in just about any way you can imagine.

Try It Yourself

10-1. Learning Python: Open a blank file in your text editor and write a few lines summarizing what you’ve learned about Python so far. Start each line with the phrase In Python you can. . . . Save the file as learning_python.txt in the same directory as your exercises from this chapter. Write a program that reads the file and prints what you wrote two times: print the contents once by reading in the entire file, and once by storing the lines in a list and then looping over each line.

10-2. Learning C: You can use the replace() method to replace any word in a string with a different word. Here’s a quick example showing how to replace 'dog' with 'cat' in a sentence:

>>> message = "I really like dogs."
>>> message.replace('dog', 'cat')
'I really like cats.'

Read in each line from the file you just created, learning_python.txt, and replace the word Python with the name of another language, such as C. Print each modified line to the screen.

10-3. Simpler Code: The program file_reader.py in this section uses a temporary variable, lines, to show how splitlines() works. You can skip the temporary variable and loop directly over the list that splitlines() returns:

for line in contents.splitlines():

Remove the temporary variable from each of the programs in this section, to make them more concise.

Writing to a File

One of the simplest ways to save data is to write it to a file. When you write text to a file, the output will still be available after you close the terminal containing your program’s output. You can examine output after a program finishes running, and you can share the output files with others as well. You can also write programs that read the text back into memory and work with it again later.

Writing a Single Line

Once you have a path defined, you can write to a file using the write_text() method. To see how this works, let’s write a simple message and store it in a file instead of printing it to the screen:

write_message.py

from pathlib import Path

path = Path('programming.txt')
path.write_text("I love programming.")

The write_text() method takes a single argument: the string that you want to write to the file. This program has no terminal output, but if you open the file programming.txt, you’ll see one line:

programming.txt

I love programming.

This file behaves like any other file on your computer. You can open it, write new text in it, copy from it, paste to it, and so forth.

Writing Multiple Lines

The write_text() method does a few things behind the scenes. If the file that path points to doesn’t exist, it creates that file. Also, after writing the string to the file, it makes sure the file is closed properly. Files that aren’t closed properly can lead to missing or corrupted data.

To write more than one line to a file, you need to build a string containing the entire contents of the file, and then call write_text() with that string. Let’s write several lines to the programming.txt file:

from pathlib import Path

contents = "I love programming.\n"
contents += "I love creating new games.\n"
contents += "I also love working with data.\n"

path = Path('programming.txt')
path.write_text(contents)

We define a variable called contents that will hold the entire contents of the file. On the next line, we use the += operator to add to this string. You can do this as many times as you need, to build strings of any length. In this case we include newline characters at the end of each line, to make sure each statement appears on its own line.

If you run this and then open programming.txt, you’ll see each of these lines in the text file:

I love programming.
I love creating new games.
I also love working with data.

You can also use spaces, tab characters, and blank lines to format your output, just as you’ve been doing with terminal-based output. There’s no limit to the length of your strings, and this is how many computer-generated documents are created.

Exceptions

Python uses special objects called exceptions to manage errors that arise during a program’s execution. Whenever an error occurs that makes Python unsure of what to do next, it creates an exception object. If you write code that handles the exception, the program will continue running. If you don’t handle the exception, the program will halt and show a traceback, which includes a report of the exception that was raised.

Exceptions are handled with try-except blocks. A try-except block asks Python to do something, but it also tells Python what to do if an exception is raised. When you use try-except blocks, your programs will continue running even if things start to go wrong. Instead of tracebacks, which can be confusing for users to read, users will see friendly error messages that you’ve written.

Handling the ZeroDivisionError Exception

Let’s look at a simple error that causes Python to raise an exception. You probably know that it’s impossible to divide a number by zero, but let’s ask Python to do it anyway:

division_calculator.py

print(5/0)

Python can’t do this, so we get a traceback:

Traceback (most recent call last):
  File "division_calculator.py", line 1, in <module>
    print(5/0)
          ~^~
❶ ZeroDivisionError: division by zero

The error reported in the traceback, ZeroDivisionError, is an exception object ❶. Python creates this kind of object in response to a situation where it can’t do what we ask it to. When this happens, Python stops the program and tells us the kind of exception that was raised. We can use this information to modify our program. We’ll tell Python what to do when this kind of exception occurs; that way, if it happens again, we’ll be prepared.

Using try-except Blocks

When you think an error may occur, you can write a try-except block to handle the exception that might be raised. You tell Python to try running some code, and you tell it what to do if the code results in a particular kind of exception.

Here’s what a try-except block for handling the ZeroDivisionError exception looks like:

try:
    print(5/0)
except ZeroDivisionError:
    print("You can't divide by zero!")

We put print(5/0), the line that caused the error, inside a try block. If the code in a try block works, Python skips over the except block. If the code in the try block causes an error, Python looks for an except block whose error matches the one that was raised, and runs the code in that block.

In this example, the code in the try block produces a ZeroDivisionError, so Python looks for an except block telling it how to respond. Python then runs the code in that block, and the user sees a friendly error message instead of a traceback:

You can't divide by zero!

If more code followed the try-except block, the program would continue running because we told Python how to handle the error. Let’s look at an example where catching an error can allow a program to continue running.

Using Exceptions to Prevent Crashes

Handling errors correctly is especially important when the program has more work to do after the error occurs. This happens often in programs that prompt users for input. If the program responds to invalid input appropriately, it can prompt for more valid input instead of crashing.

Let’s create a simple calculator that does only division:

division_calculator.py

print("Give me two numbers, and I'll divide them.")
print("Enter 'q' to quit.")

while True:
❶     first_number = input("\nFirst number: ")
    if first_number == 'q':
        break
❷     second_number = input("Second number: ")
    if second_number == 'q':
        break
❸     answer = int(first_number) / int(second_number)
    print(answer)

This program prompts the user to input a first_number ❶ and, if the user does not enter q to quit, a second_number ❷. We then divide these two numbers to get an answer ❸. This program does nothing to handle errors, so asking it to divide by zero causes it to crash:

Give me two numbers, and I'll divide them.
Enter 'q' to quit.

First number: 5
Second number: 0
Traceback (most recent call last):
  File "division_calculator.py", line 11, in <module>
    answer = int(first_number) / int(second_number)
             ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

It’s bad that the program crashed, but it’s also not a good idea to let users see tracebacks. Nontechnical users will be confused by them, and in a malicious setting, attackers will learn more than you want them to. For example, they’ll know the name of your program file, and they’ll see a part of your code that isn’t working properly. A skilled attacker can sometimes use this information to determine which kind of attacks to use against your code.

The else Block

We can make this program more error resistant by wrapping the line that might produce errors in a try-except block. The error occurs on the line that performs the division, so that’s where we’ll put the try-except block. This example also includes an else block. Any code that depends on the try block executing successfully goes in the else block:

--snip--
while True:
    --snip--
    if second_number == 'q':
        break
❶     try:
        answer = int(first_number) / int(second_number)
❷     except ZeroDivisionError:
        print("You can't divide by 0!")
❸     else:
        print(answer)

We ask Python to try to complete the division operation in a try block ❶, which includes only the code that might cause an error. Any code that depends on the try block succeeding is added to the else block. In this case, if the division operation is successful, we use the else block to print the result ❸.

The except block tells Python how to respond when a ZeroDivisionError arises ❷. If the try block doesn’t succeed because of a division-by-zero error, we print a friendly message telling the user how to avoid this kind of error. The program continues to run, and the user never sees a traceback:

Give me two numbers, and I'll divide them.
Enter 'q' to quit.

First number: 5
Second number: 0
You can't divide by 0!

First number: 5
Second number: 2
2.5

First number: q

The only code that should go in a try block is code that might cause an exception to be raised. Sometimes you’ll have additional code that should run only if the try block was successful; this code goes in the else block. The except block tells Python what to do in case a certain exception arises when it tries to run the code in the try block.

By anticipating likely sources of errors, you can write robust programs that continue to run even when they encounter invalid data and missing resources. Your code will be resistant to innocent user mistakes and malicious attacks.

Handling the FileNotFoundError Exception

One common issue when working with files is handling missing files. The file you’re looking for might be in a different location, the filename might be misspelled, or the file might not exist at all. You can handle all of these situations with a try-except block.

Let’s try to read a file that doesn’t exist. The following program tries to read in the contents of Alice in Wonderland, but I haven’t saved the file alice.txt in the same directory as alice.py:

alice.py

from pathlib import Path

path = Path('alice.txt')
contents = path.read_text(encoding='utf-8')

Note that we’re using read_text() in a slightly different way here than what you saw earlier. The encoding argument is needed when your system’s default encoding doesn’t match the encoding of the file that’s being read. This is most likely to happen when reading from a file that wasn’t created on your system.

Python can’t read from a missing file, so it raises an exception:

Traceback (most recent call last):
❶   File "alice.py", line 4, in <module>
❷     contents = path.read_text(encoding='utf-8')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pathlib.py", line 1056, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pathlib.py", line 1042, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
❸ FileNotFoundError: [Errno 2] No such file or directory: 'alice.txt'

This is a longer traceback than the ones we’ve seen previously, so let’s look at how you can make sense of more complex tracebacks. It’s often best to start at the very end of the traceback. On the last line, we can see that a FileNotFoundError exception was raised ❸. This is important because it tells us what kind of exception to use in the except block that we’ll write.

Looking back near the beginning of the traceback ❶, we can see that the error occurred at line 4 in the file alice.py. The next line shows the line of code that caused the error ❷. The rest of the traceback shows some code from the libraries that are involved in opening and reading from files. You don’t usually need to read through or understand all of these lines in a traceback.

To handle the error that’s being raised, the try block will begin with the line that was identified as problematic in the traceback. In our example, this is the line that contains read_text():

from pathlib import Path

path = Path('alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
❶ except FileNotFoundError:
    print(f"Sorry, the file {path} does not exist.")

In this example, the code in the try block produces a FileNotFoundError, so we write an except block that matches that error ❶. Python then runs the code in that block when the file can’t be found, and the result is a friendly error message instead of a traceback:

Sorry, the file alice.txt does not exist.

The program has nothing more to do if the file doesn’t exist, so this is all the output we see. Let’s build on this example and see how exception handling can help when you’re working with more than one file.

Analyzing Text

You can analyze text files containing entire books. Many classic works of literature are available as simple text files because they are in the public domain. The texts used in this section come from Project Gutenberg (https://gutenberg.org). Project Gutenberg maintains a collection of literary works that are available in the public domain, and it’s a great resource if you’re interested in working with literary texts in your programming projects.

Let’s pull in the text of Alice in Wonderland and try to count the number of words in the text. To do this, we’ll use the string method split(), which by default splits a string wherever it finds any whitespace:

from pathlib import Path

path = Path('alice.txt')
try:
    contents = path.read_text(encoding='utf-8')
except FileNotFoundError:
    print(f"Sorry, the file {path} does not exist.")
else:
    # Count the approximate number of words in the file:
❶     words = contents.split()
❷     num_words = len(words)
    print(f"The file {path} has about {num_words} words.")

I moved the file alice.txt to the correct directory, so the try block will work this time. We take the string contents, which now contains the entire text of Alice in Wonderland as one long string, and use split() to produce a list of all the words in the book ❶. Using len() on this list ❷ gives us a good approximation of the number of words in the original text. Lastly, we print a statement that reports how many words were found in the file. This code is placed in the else block because it only works if the code in the try block was executed successfully.

The output tells us how many words are in alice.txt:

The file alice.txt has about 29594 words.

The count is a little high because extra information is provided by the publisher in the text file used here, but it’s a good approximation of the length of Alice in Wonderland.

Working with Multiple Files

Let’s add more books to analyze, but before we do, let’s move the bulk of this program to a function called count_words(). This will make it easier to run the analysis for multiple books:

word_count.py

from pathlib import Path

def count_words(path):
❶     """Count the approximate number of words in a file."""
    try:
        contents = path.read_text(encoding='utf-8')
    except FileNotFoundError:
        print(f"Sorry, the file {path} does not exist.")
    else:
        # Count the approximate number of words in the file:
        words = contents.split()
        num_words = len(words)
        print(f"The file {path} has about {num_words} words.")

path = Path('alice.txt')
count_words(path)

Most of this code is unchanged. It’s only been indented, and moved into the body of count_words(). It’s a good habit to keep comments up to date when you’re modifying a program, so the comment has also been changed to a docstring and reworded slightly ❶.

Now we can write a short loop to count the words in any text we want to analyze. We do this by storing the names of the files we want to analyze in a list, and then we call count_words() for each file in the list. We’ll try to count the words for Alice in Wonderland, Siddhartha, Moby Dick, and Little Women, which are all available in the public domain. I’ve intentionally left siddhartha.txt out of the directory containing word_count.py, so we can see how well our program handles a missing file:

from pathlib import Path

def count_words(filename):
    --snip--

filenames = ['alice.txt', 'siddhartha.txt', 'moby_dick.txt', 
        'little_women.txt']
for filename in filenames:
❶     path = Path(filename)
    count_words(path)

The names of the files are stored as simple strings. Each string is then converted to a Path object ❶, before the call to count_words(). The missing siddhartha.txt file has no effect on the rest of the program’s execution:

The file alice.txt has about 29594 words.
Sorry, the file siddhartha.txt does not exist.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.

Using the try-except block in this example provides two significant advantages. We prevent our users from seeing a traceback, and we let the program continue analyzing the texts it’s able to find. If we don’t catch the FileNotFoundError that siddhartha.txt raises, the user would see a full traceback, and the program would stop running after trying to analyze Siddhartha. It would never analyze Moby Dick or Little Women.

Failing Silently

In the previous example, we informed our users that one of the files was unavailable. But you don’t need to report every exception you catch. Sometimes, you’ll want the program to fail silently when an exception occurs and continue on as if nothing happened. To make a program fail silently, you write a try block as usual, but you explicitly tell Python to do nothing in the except block. Python has a pass statement that tells it to do nothing in a block:

def count_words(path):
    """Count the approximate number of words in a file."""
    try:
        --snip--
    except FileNotFoundError:
        pass
    else:
        --snip--

The only difference between this listing and the previous one is the pass statement in the except block. Now when a FileNotFoundError is raised, the code in the except block runs, but nothing happens. No traceback is produced, and there’s no output in response to the error that was raised. Users see the word counts for each file that exists, but they don’t see any indication that a file wasn’t found:

The file alice.txt has about 29594 words.
The file moby_dick.txt has about 215864 words.
The file little_women.txt has about 189142 words.

The pass statement also acts as a placeholder. It’s a reminder that you’re choosing to do nothing at a specific point in your program’s execution and that you might want to do something there later. For example, in this program we might decide to write any missing filenames to a file called missing_files.txt. Our users wouldn’t see this file, but we’d be able to read the file and deal with any missing texts.

Deciding Which Errors to Report

How do you know when to report an error to your users and when to let your program fail silently? If users know which texts are supposed to be analyzed, they might appreciate a message informing them why some texts were not analyzed. If users expect to see some results but don’t know which books are supposed to be analyzed, they might not need to know that some texts were unavailable. Giving users information they aren’t looking for can decrease the usability of your program. Python’s error-handling structures give you fine-grained control over how much to share with users when things go wrong; it’s up to you to decide how much information to share.

Well-written, properly tested code is not very prone to internal errors, such as syntax or logical errors. But every time your program depends on something external such as user input, the existence of a file, or the availability of a network connection, there is a possibility of an exception being raised. A little experience will help you know where to include exception-handling blocks in your program and how much to report to users about errors that arise.

Try It Yourself

10-6. Addition: One common problem when prompting for numerical input occurs when people provide text instead of numbers. When you try to convert the input to an int, you’ll get a ValueError. Write a program that prompts for two numbers. Add them together and print the result. Catch the ValueError if either input value is not a number, and print a friendly error message. Test your program by entering two numbers and then by entering some text instead of a number.

10-7. Addition Calculator: Wrap your code from Exercise 10-5 in a while loop so the user can continue entering numbers, even if they make a mistake and enter text instead of a number.

10-8. Cats and Dogs: Make two files, cats.txt and dogs.txt. Store at least three names of cats in the first file and three names of dogs in the second file. Write a program that tries to read these files and print the contents of the file to the screen. Wrap your code in a try-except block to catch the FileNotFound error, and print a friendly message if a file is missing. Move one of the files to a different location on your system, and make sure the code in the except block executes properly.

10-9. Silent Cats and Dogs: Modify your except block in Exercise 10-7 to fail silently if either file is missing.

10-10. Common Words: Visit Project Gutenberg (https://gutenberg.org) and find a few texts you’d like to analyze. Download the text files for these works, or copy the raw text from your browser into a text file on your computer.

You can use the count() method to find out how many times a word or phrase appears in a string. For example, the following code counts the number of times 'row' appears in a string:

>>> line = "Row, row, row your boat"
>>> line.count('row')
2
>>> line.lower().count('row')
3

Notice that converting the string to lowercase using lower() catches all appearances of the word you’re looking for, regardless of how it’s formatted.

Write a program that reads the files you found at Project Gutenberg and determines how many times the word 'the' appears in each text. This will be an approximation because it will also count words such as 'then' and 'there'. Try counting 'the ', with a space in the string, and see how much lower your count is.

Storing Data

Many of your programs will ask users to input certain kinds of information. You might allow users to store preferences in a game or provide data for a visualization. Whatever the focus of your program is, you’ll store the information users provide in data structures such as lists and dictionaries. When users close a program, you’ll almost always want to save the information they entered. A simple way to do this involves storing your data using the json module.

The json module allows you to convert simple Python data structures into JSON-formatted strings, and then load the data from that file the next time the program runs. You can also use json to share data between different Python programs. Even better, the JSON data format is not specific to Python, so you can share data you store in the JSON format with people who work in many other programming languages. It’s a useful and portable format, and it’s easy to learn.

Using json.dumps() and json.loads()

Let’s write a short program that stores a set of numbers and another program that reads these numbers back into memory. The first program will use json.dumps() to store the set of numbers, and the second program will use json.loads().

The json.dumps() function takes one argument: a piece of data that should be converted to the JSON format. The function returns a string, which we can then write to a data file:

number_writer.py

from pathlib import Path
import json

numbers = [2, 3, 5, 7, 11, 13]

❶ path = Path('numbers.json')
❷ contents = json.dumps(numbers)
path.write_text(contents)

We first import the json module, and then create a list of numbers to work with. Then we choose a filename in which to store the list of numbers ❶. It’s customary to use the file extension .json to indicate that the data in the file is stored in the JSON format. Next, we use the json.dumps() ❷ function to generate a string containing the JSON representation of the data we’re working with. Once we have this string, we write it to the file using the same write_text() method we used earlier.

This program has no output, but let’s open the file numbers.json and look at it. The data is stored in a format that looks just like Python:

[2, 3, 5, 7, 11, 13]

Now we’ll write a separate program that uses json.loads() to read the list back into memory:

number_reader.py

from pathlib import Path
import json

❶ path = Path('numbers.json')
❷ contents = path.read_text()
❸ numbers = json.loads(contents)

print(numbers)

We make sure to read from the same file we wrote to ❶. Since the data file is just a text file with specific formatting, we can read it with the read_text() method ❷. We then pass the contents of the file to json.loads() ❸. This function takes in a JSON-formatted string and returns a Python object (in this case, a list), which we assign to numbers. Finally, we print the recovered list of numbers and see that it’s the same list created in number_writer.py:

[2, 3, 5, 7, 11, 13]

This is a simple way to share data between two programs.

Saving and Reading User-Generated Data

Saving data with json is useful when you’re working with user-generated data, because if you don’t store your user’s information somehow, you’ll lose it when the program stops running. Let’s look at an example where we prompt the user for their name the first time they run a program and then remember their name when they run the program again.

Let’s start by storing the user’s name:

remember_me.py

from pathlib import Path
import json

❶ username = input("What is your name? ")

❷ path = Path('username.json')
contents = json.dumps(username)
path.write_text(contents)

❸ print(f"We'll remember you when you come back, {username}!")

We first prompt for a username to store ❶. Next, we write the data we just collected to a file called username.json ❷. Then we print a message informing the user that we’ve stored their information ❸:

What is your name? Eric
We'll remember you when you come back, Eric!

Now let’s write a new program that greets a user whose name has already been stored:

greet_user.py

from pathlib import Path
import json

❶ path = Path('username.json')
contents = path.read_text()
❷ username = json.loads(contents)

print(f"Welcome back, {username}!")

We read the contents of the data file ❶ and then use json.loads() to assign the recovered data to the variable username ❷. Since we’ve recovered the username, we can welcome the user back with a personalized greeting:

Welcome back, Eric!

We need to combine these two programs into one file. When someone runs remember_me.py, we want to retrieve their username from memory if possible; if not, we’ll prompt for a username and store it in username.json for next time. We could write a try-except block here to respond appropriately if username.json doesn’t exist, but instead we’ll use a handy method from the pathlib module:

remember_me.py

from pathlib import Path
import json

path = Path('username.json')
❶ if path.exists():
    contents = path.read_text()
    username = json.loads(contents)
    print(f"Welcome back, {username}!")
❷ else:
    username = input("What is your name? ")
    contents = json.dumps(username)
    path.write_text(contents)
    print(f"We'll remember you when you come back, {username}!")

There are many helpful methods you can use with Path objects. The exists() method returns True if a file or folder exists and False if it doesn’t. Here we use path.exists() to find out if a username has already been stored ❶. If username.json exists, we load the username and print a personalized greeting to the user.

If the file username.json doesn’t exist ❷, we prompt for a username and store the value that the user enters. We also print the familiar message that we’ll remember them when they come back.

Whichever block executes, the result is a username and an appropriate greeting. If this is the first time the program runs, this is the output:

What is your name? Eric
We'll remember you when you come back, Eric!

Otherwise:

Welcome back, Eric!

This is the output you see if the program was already run at least once. Even though the data in this section is just a single string, the program would work just as well with any data that can be converted to a JSON-formatted string.

Refactoring

Often, you’ll come to a point where your code will work, but you’ll recognize that you could improve the code by breaking it up into a series of functions that have specific jobs. This process is called refactoring. Refactoring makes your code cleaner, easier to understand, and easier to extend.

We can refactor remember_me.py by moving the bulk of its logic into one or more functions. The focus of remember_me.py is on greeting the user, so let’s move all of our existing code into a function called greet_user():

remember_me.py

from pathlib import Path
import json

def greet_user():
❶     """Greet the user by name."""
    path = Path('username.json')
    if path.exists():
        contents = path.read_text()
        username = json.loads(contents)
        print(f"Welcome back, {username}!")
    else:
        username = input("What is your name? ")
        contents = json.dumps(username)
        path.write_text(contents)
        print(f"We'll remember you when you come back, {username}!")

greet_user()

Because we’re using a function now, we rewrite the comments as a docstring that reflects how the program currently works ❶. This file is a little cleaner, but the function greet_user() is doing more than just greeting the user—it’s also retrieving a stored username if one exists and prompting for a new username if one doesn’t.

Let’s refactor greet_user() so it’s not doing so many different tasks. We’ll start by moving the code for retrieving a stored username to a separate function:

from pathlib import Path
import json

def get_stored_username(path):
❶     """Get stored username if available."""
    if path.exists():
        contents = path.read_text()
        username = json.loads(contents)
        return username
    else:
❷         return None

def greet_user():
    """Greet the user by name."""
    path = Path('username.json')
    username = get_stored_username(path)
❸     if username:
        print(f"Welcome back, {username}!")
    else:
        username = input("What is your name? ")
        contents = json.dumps(username)
        path.write_text(contents)
        print(f"We'll remember you when you come back, {username}!")

greet_user()

The new function get_stored_username() ❶ has a clear purpose, as stated in the docstring. This function retrieves a stored username and returns the username if it finds one. If the path that’s passed to get_stored_username() doesn’t exist, the function returns None ❷. This is good practice: a function should either return the value you’re expecting, or it should return None. This allows us to perform a simple test with the return value of the function. We print a welcome back message to the user if the attempt to retrieve a username is successful ❸, and if it isn’t, we prompt for a new username.

We should factor one more block of code out of greet_user(). If the username doesn’t exist, we should move the code that prompts for a new username to a function dedicated to that purpose:

from pathlib import Path
import json

def get_stored_username(path):
    """Get stored username if available."""
    --snip--

def get_new_username(path):
    """Prompt for a new username."""
    username = input("What is your name? ")
    contents = json.dumps(username)
    path.write_text(contents)
    return username

def greet_user():
    """Greet the user by name."""
    path = Path('username.json')
❶     username = get_stored_username(path)
    if username:
        print(f"Welcome back, {username}!")
    else:
❷         username = get_new_username(path)
        print(f"We'll remember you when you come back, {username}!")

greet_user()

Each function in this final version of remember_me.py has a single, clear purpose. We call greet_user(), and that function prints an appropriate message: it either welcomes back an existing user or greets a new user. It does this by calling get_stored_username() ❶, which is responsible only for retrieving a stored username if one exists. Finally, if necessary, greet_user() calls get_new_username()❷, which is responsible only for getting a new username and storing it. This compartmentalization of work is an essential part of writing clear code that will be easy to maintain and extend.

Try It Yourself

10-11. Favorite Number: Write a program that prompts for the user’s favorite number. Use json.dumps() to store this number in a file. Write a separate program that reads in this value and prints the message “I know your favorite number! It’s _____.”

10-12. Favorite Number Remembered: Combine the two programs you wrote in Exercise 10-11 into one file. If the number is already stored, report the favorite number to the user. If not, prompt for the user’s favorite number and store it in a file. Run the program twice to see that it works.

10-13. User Dictionary: The remember_me.py example only stores one piece of information, the username. Expand this example by asking for two more pieces of information about the user, then store all the information you collect in a dictionary. Write this dictionary to a file using json.dumps(), and read it back in using json.loads(). Print a summary showing exactly what your program remembers about the user.

10-14. Verify User: The final listing for remember_me.py assumes either that the user has already entered their username or that the program is running for the first time. We should modify it in case the current user is not the person who last used the program.

Before printing a welcome back message in greet_user(), ask the user if this is the correct username. If it’s not, call get_new_username() to get the correct username.

Summary

In this chapter, you learned how to work with files. You learned to read the entire contents of a file, and then work through the contents one line at a time if you need to. You learned to write as much text as you want to a file. You also read about exceptions and how to handle the exceptions you’re likely to see in your programs. Finally, you learned how to store Python data structures so you can save information your users provide, preventing them from having to start over each time they run a program.

In Chapter 11, you’ll learn efficient ways to test your code. This will help you trust that the code you develop is correct, and it will help you identify bugs that are introduced as you continue to build on the programs you’ve written.