8. Modules and Packages

Python programs are organized into modules and packages that are loaded with the import statement. This chapter describes the module and package system in more detail. The primary focus is on programming with modules and packages, not the process of bundling code for deployment to others. For the latter, consult the latest documentation at https://packaging.python.org/tutorials/packaging-projects/.

8.1 Modules and the import Statement

Any Python source file can be imported as a module. For example:

# module.py

a = 37

def func():
    print(f'func says that a is {a}')

class SomeClass:
    def method(self):
        print('method says hi')

print('loaded module')

This file contains common programming elements—including a global variable, a function, a class definition, and an isolated statement. This example illustrates some important (and sometimes subtle) features of module loading.

To load a module, use the statement import module. For example:

>>> import module
loaded module
>>> module.a
37
>>> module.func()
func says that a is 37
>>> s = module.SomeClass()
>>> s.method()
method says hi
>>>

In executing an import, several things happen:

  1. The module source code is located. If it can’t be found, an ImportError exception is raised.

  2. A new module object is created. This object serves as a container for all of the global definitions contained within the module. It’s sometimes referred to as a namespace.

  3. The module source code is executed within the newly created module namespace.

  4. If no errors occur, a name is created within the caller that refers to the new module object. This name matches the name of the module, but without any kind of filename suffix. For example, if the code is found in a file module.py, the name of the module is module.

Of these steps, the first one (locating modules) is the most complicated. A common source of failure for newcomers is using a bad filename or putting code in an unknown location. A module filename must use the same rules as variable names (letters, digits, and underscore) and have a .py suffix—for example, module.py. When using import, you specify the name without the suffix: import module, not import module.py (the latter produces a rather confusing error message). The file needs to be placed in one of the directories found in sys.path.

The remaining steps are all related to a module defining an isolated environment for the code. All definitions that appear in a module remain isolated to that module. Thus, there is no risk of the names of variables, functions, and classes clashing with identical names in other modules. When accessing definitions in a module, use a fully qualified name such as module.func().

import executes all of the statements in the loaded source file. If a module carries out a computation or produces output in addition to defining objects, you will see the result—such as the “loaded module” message printed in the example above. A common confusion with modules concerns accessing classes. A module always defines a namespace, so if a file module.py defines a class SomeClass, use the name module.SomeClass to refer to the class.

To import multiple modules using a single import, use a comma-separated list of names:

import socket, os, re

Sometimes the local name used to refer to a module is changed using the as qualifier to import. For example:

import module as mo
mo.func()

This latter style of import is standard practice in the data analysis world. For example, you often see this:

import numpy as np
import pandas as pd
import matplotlib as plt
...

When a module is renamed, the new name only applies to the context where the import statement appeared. Other unrelated program modules can still load the module using its original name.

Assigning a different name to an imported module can be a useful tool for managing different implementations of common functionality or for writing extensible programs. For example, if you have two modules unixmodule.py and winmodule.py that both define a function func() but involve platform-dependent implementation details, you could write code to selectively import the module:

if platform == 'unix':
    import unixmodule as module
elif platform == 'windows':
    import winmodule as module

...
r = module.func()

Modules are first-class objects in Python. This means they can be assigned to variables, placed in data structures, and passed around in a program as data. For instance, the module name in the above example is a variable that refers to the corresponding module object.

8.2 Module Caching

The source code for a module is loaded and executed only once, regardless of how often you use the import statement. Subsequent import statements bind the module name to the module object already created by the previous import.

A common confusion for newcomers arises when a module is imported into an interactive session, then its source code is modified (for example, to fix a bug) but a new import fails to load the modified code. The module cache is to blame for this. Python will never reload a previously imported module even if the underlying source code has been updated.

You can find the cache of all currently loaded modules in sys.modules, which is a dictionary that maps module names to module objects. The contents of this dictionary are used to determine whether import loads a fresh copy of a module or not. Deleting a module from the cache will force it to load again on the next import statement. However, this is rarely safe for reasons explained in Section 8.5 on module reloading.

Sometimes you will see import used inside a function like this:

def f(x):
    import math
    return math.sin(x) + math.cos(x)

At first glance, it seems like such an implementation would be horribly slow—loading a module on each invocation. In reality, the cost of the import is minimal—it’s just a single dictionary lookup, as Python immediately finds the module in the cache. The main objection to having the import inside a function is one of style—it’s more common to have all module imports listed at the top of a file where they’re easy to see. On the other hand, if you have a specialized function that’s rarely invoked, putting the function’s import dependencies inside the function body can speed up program loading. In this case, you’d only load the required modules if they are actually needed.

8.3 Importing Selected Names from a Module

You can load specific definitions from a module into the current namespace using the from module import name statement. It’s identical to import except that, instead of creating a name referring to the newly created module namespace, it places references to one or more of the objects defined in the module into the current namespace:

from module import func  # Imports module and puts func in current namespace
func()                   # Calls module.func()
module.func()            # Fails. NameError: module

The from statement accepts comma-separated names if you want multiple definitions. For example:

from module import func, SomeClass

Semantically, the statement from module import name performs a name copy from the module cache to the local namespace. That is, Python first executes import module behind the scenes. Afterwards, it makes an assignment from the cache to a local name, such as name = sys.modules['module'].name.

A common misconception is that the from module import name statement is more efficient—possibly only loading part of a module. This is not the case. Either way, the entire module is loaded and stored in the cache.

Importing functions using the from syntax does not change their scoping rules. When functions look for variables, they only look within the file where the function was defined, not the namespace into which a function is imported and called. For example:

>>> from module import func
>>> a = 42
>>> func()
func says that a is 37
>>> func.__module__
'module'
>>> func.__globals__['a']
37
>>>

A related confusion concerns the behavior of global variables. For example, consider this code that imports both func and a global variable a that it uses:

from module import a, func
a = 42                      # Modify the variable
func()                      # Prints "func says a is 37"
print(a)                    # Prints "42"

Variable assignment in Python is not a storage operation. That is, the name a in this example does not represent some kind of box where a value gets stored. The initial import associates the local name a with the original object module.a. However, the later reassignment a = 42 moves the local name a to a completely different object. At this point, a is no longer bound to the value in the imported module. Because of this, it is not possible to use the from statement in a way that makes variables behave like global variables in a language such as C. If you want to have mutable global parameters in your program, put them in a module and use the module name explicitly using the import statement— for example, module.a.

The asterisk (*) wildcard character is sometimes used to load all the definitions in a module, except those that start with an underscore. Here’s an example:

# Load all definitions into the current namespace
from module import *

The from module import * statement may only be used at the top-level scope of a module. In particular, it is illegal to use this form of import inside a function body.

Modules can precisely control the set of names imported by from module import * by defining the list __all__. Here’s an example:

# module: module.py
__all__ = [ 'func', 'SomeClass' ]

a = 37              # Not exported

def func():         # Exported
    ...

class SomeClass:    # Exported
    ...

When at the interactive Python prompt, using from module import * can be a convenient way to work with a module. However, using this style of import in a program is frowned upon. Overuse can pollute the local namespace and lead to confusion. For example:

from math import *
from random import *
from statistics import *

a = gauss(1.0, 0.25)        # From which module?

It’s usually better to be explicit about names:

from math import sin, cos, sqrt
from random import gauss
from statistics import mean

a = gauss(1.0, 0.25)

8.4 Circular Imports

A peculiar problem arises if two modules mutually import each other. For example, suppose you had two files:

# ----------------------------
# moda.py

import modb

def func_a():
    modb.func_b()

class Base:
    pass

# ----------------------------
# modb.py

import moda

def func_b():
    print('B')

class Child(moda.Base):
    pass

There is a strange import order dependency in this code. Using import modb first works fine, but if you put import moda first, it blows up with an error about moda.Base being undefined.

To understand what is happening, you have to follow the control flow. import moda starts executing the file moda.py. The first statement it encounters is import modb. Thus, control switches over to modb.py. The first statement in that file is import moda. Instead of entering a recursive cycle, that import is satisfied by the module cache and control continues on to the next statement in modb.py. This is good—circular imports don’t cause Python to deadlock or enter a new spacetime dimension. However, at this point in execution, module moda has only been partially evaluated. When control reaches the class Child(moda.Base) statement, it blows up. The required Base class hasn’t been defined yet.

One way to fix this problem is to move the import modb statement someplace else. For example, you could move the import into func_a() where the definition is actually needed:

# moda.py
def func_a():
    import modb
    modb.func_b()

class Base:
    pass

You could also move the import to a later position in the file:

# moda.py

def func_a():
    modb.func_b()

class Base:
    pass

import modb     # Must be after Base is defined

Both of these solutions are likely to raise eyebrows in a code review. Most of the time, you don’t see module imports appearing at the end of a file. The presence of circular imports almost always suggests a problem in code organization. A better way to handle this might be to move the definition of Base to a separate file base.py and rewrite modb.py as follows:

# modb.py
import base

def func_b():
    print('B')

class Child(base.Base):
    pass

8.5 Module Reloading and Unloading

There is no reliable support for reloading or unloading of previously imported modules. Although you can remove a module from sys.modules, this does not unload a module from memory. This is because references to the cached module object still exist in other modules that imported that module. Moreover, if there are instances of classes defined in the module, those instances contain references back to their class objects, which in turn hold references to the module in which they were defined.

The fact that module references exist in many places makes it generally impractical to reload a module after making changes to its implementation. For example, if you remove a module from sys.modules and use import to reload it, this will not retroactively change all of the previous references to the module used in a program. Instead, you’ll have one reference to the new module created by the most recent import statement, and a set of references to the old module created by imports in other parts of the code. This is rarely what you want. Module reloading is never safe to use in any kind of sane production code unless you are able to carefully control the entire execution environment.

There is a reload() function for reloading a module that can be found in the importlib library. As an argument, you pass it the already loaded module. For example:

>>> import module
>>> import importlib
>>> importlib.reload(module)
loaded module
<module 'module' from 'module.py'>
>>>

reload() works by loading a new version of the module source code and then executing it on top of the already existing module namespace. This is done without clearing the previous namespace. It’s literally the same as you typing new source code on top of the old code without restarting the interpreter.

If other modules had previously imported the reloaded module using a standard import statement, such as import module, reloading will make them see the updated code as if by magic. However, there’s still a lot of peril. First, reloading doesn’t reload any of the modules that might be imported by the reloaded file. It’s not recursive—it only applies to the single module given to reload(). Second, if any module has used the from module import name form of import, those imports fail to see the effect of the reload. Finally, if instances of classes have been created, reloading does not update their underlying class definition. In fact, you’ll now have two different definitions of the same class in the same program—the old one that remains in use for all existing instances at the time of reloading, and the new one that gets used for new instances. This is almost always confusing.

Finally, it should be noted that C/C++ extensions to Python cannot be safely unloaded or reloaded in any way. No support is provided for this, and the underlying operating system may prohibit it anyways. Your best recourse for that scenario is to restart the Python interpreter process.

8.6 Module Compilation

When modules are first imported, they are compiled into an interpreter bytecode. This code is written to a .pyc file within a special __pycache__ directory. This directory is usually found in the same directory as the original .py file. When the same import occurs again on a different run of the program, the compiled bytecode is loaded instead. This significantly speeds up the import process.

The caching of bytecode is an automatic process that you almost never need to worry about. Files are automatically regenerated if the original source code changes. It just works.

That said, there are still reasons to know about this caching and compilation process. First, sometimes Python files get installed (often accidentally) in an environment where users don’t have operating system permissions to create the required __pycache__ directory. Python will still work, but every import now loads the original source code and compiles it to bytecode. Program loading will be a lot slower than it needs to be. Similarly, in deploying or packaging a Python application, it may be advantageous to include the compiled bytecode, as that may significantly speed up program startup.

The other good reason to know about module caching is that some programming techniques interfere with it. Advanced metaprogramming techniques involving dynamic code generation and the exec() function defeat the benefits of bytecode caching. A notable example is the use of dataclasses:

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

Dataclasses work by generating method functions as text fragments and executing them using exec(). None of this generated code is cached by the import system. For a single class definition, you won’t notice. However, if you have a module consisting of 100 dataclasses, you might find that it imports nearly 20 times slower than a comparable module where you just wrote out the classes in the normal, if less compact, way.

8.7 The Module Search Path

When importing modules, the interpreter searches the list of directories in sys.path. The first entry in sys.path is often an empty string '', which refers to the current working directory. Alternatively, if you run a script, the first entry in sys.path is the directory in which the script is located. The other entries in sys.path usually consist of a mix of directory names and .zip archive files. The order in which entries are listed in sys.path determines the search order used when importing modules. To add new entries to the search path, add them to this list. This can be done directly or by setting the PYTHONPATH environment variable. For example, on UNIX:

bash $ env PYTHONPATH=/some/path python3 script.py

ZIP archive files are a convenient way to bundle a collection of modules into a single file. For example, suppose you created two modules, foo.py and bar.py, and placed them in a file mymodules.zip. The file could be added to the Python search path as follows:

import sys
sys.path.append('mymodules.zip')
import foo, bar

Specific locations within the directory structure of a .zip file can also be used for the path. In addition, .zip files can be mixed with regular pathname components. Here’s an example:

sys.path.append('/tmp/modules.zip/lib/python')

It is not necessary for a ZIP file to have a .zip file suffix to be used. Historically, it has been common to encounter .egg files on the path as well. .egg files originate from an early Python package management tool called setuptools. However, an .egg file is nothing more than a normal .zip file or directory with some extra metadata added to it (version number, dependencies, and so on).

8.8 Execution as the Main Program

Although this section is about the import statement, Python files are often executed as a main script. For example:

% python3 module.py

Each module contains a variable, __name__, that holds the module name. Code can examine this variable to determine the module in which they’re executing. The top-level module of the interpreter is named __main__. Programs specified on the command line or entered interactively run inside the __main__ module. Sometimes a program may alter its behavior, depending on whether it has been imported as a module or is running in __main__. For example, a module may include code that is executed if the module is used as the main program but not executed if the module is simply imported by another module.

# Check if running as a program
if __name__ == '__main__':
     # Yes. Running as the main script
     statements
else:
     # No, I must have been imported as a module
     statements

Source files intended for use as libraries can use this technique to include optional testing or example code. When developing a module, you can put debugging code for testing the features of your library inside an if statement as shown, and run Python on your module as the main program. That code won’t run for users who import your library.

If you’ve made a directory of Python code, you can execute the directory if it contains a special __main__.py file. For example, if you make a directory like this:

myapp/
    foo.py
    bar.py
    __main__.py

You can run Python on it by typing python3 myapp. Execution will start in the __main__.py file. This also works if you turn the myapp directory into a ZIP archive. Typing python3 myapp.zip will look for a top-level __main__.py file and execute it if found.

8.9 Packages

For anything but the simplest programs, Python code is organized into packages. A package is a collection of modules that are grouped under a common top-level name. This grouping helps resolve conflicts between the module names used in different applications and keeps your code separate from everyone else’s code. A package is defined by creating a directory with a distinctive name and placing an initially empty __init__.py file in that directory. You then place additional Python files and subpackages in this directory as needed. For example, a package might be organized as follows:

graphics/
      __init__.py
      primitive/
         __init__.py
         lines.py
         fill.py
         text.py
         ...
      graph2d/
         __init__.py
         plot2d.py
         ...
      graph3d/
         __init__.py
         plot3d.py
         ...
      formats/
         __init__.py
         gif.py
         png.py
         tiff.py
         jpeg.py

The import statement is used to load modules from a package the same way as it’s used for simple modules, except that you now have longer names. For example:

# Full path
import graphics.primitive.fill
...
graphics.primitive.fill.floodfill(img, x, y, color)

# Load a specific submodule
from graphics.primitive import fill
...
fill.floodfill(img, x, y, color)

# Load a specific function from a submodule
from graphics.primitive.fill import floodfill
...
floodfill(img, x, y, color)

Whenever any part of a package is first imported, code in the __init__.py file executes first (if it exists). As noted, this file may be empty, but it can also contain code to perform package-specific initializations. If importing a deeply nested submodule, all __init__.py files encountered in traversal of the directory structure are executed. Thus, the statement import graphics.primitive.fill would first execute the __init__.py file in the graphics/ directory followed by the __init__.py file in the primitive/ directory.

Astute Python users might observe that a package still seems to work if __init__.py files are omitted. This is true—you can use a directory of Python code as a package even if it contains no __init__.py. However, what’s not obvious is that a directory with a missing __init__.py file actually defines a different kind of package known as namespace package. This is an advanced feature sometimes used by very large libraries and frameworks to implement broken plugin systems. In the opinion of the author, this is rarely what you want—you should always create proper __init__.py files when creating a package.

8.10 Imports Within a Package

A critical feature of the import statement is that all module imports require an absolute or fully qualified package path. This includes import statements used within a package itself. For example, suppose the graphics.primitive.fill module wants to import the graphics.primitive.lines module. A simple statement such as import lines won’t work—you’ll get an ImportError exception. Instead, you need to fully qualify the import like this:

# graphics/primitives/fill.py

# Fully qualified submodule import
from graphics.primitives import lines

Sadly, writing out a full package name like that is both annoying and fragile. For example, sometimes it makes sense to rename a package—maybe you want to rename it so that you can use different versions. If the package name is hardwired into the code, you can’t do that. A better choice is to use a package-relative import like this:

# graphics/primitives/fill.py

# Package-relative import
from . import lines

Here, the . used in the statement from . import lines refers to the same directory as the importing module. Thus, this statement looks for a module lines in the same directory as the file fill.py.

Relative imports can also specify submodules contained in different directories of the same package. For example, if the module graphics.graph2d.plot2d wants to import graphics.primitive.lines, it may use a statement like this:

# graphics/graph2d/plot2d.py

from ..primitive import lines

Here, the .. moves up one directory level and primitive drops down into a different subpackage directory.

Relative imports can only be specified using the from module import symbol form of the import statement. Thus, statements such as import ..primitive.lines or import .lines are a syntax error. Also, symbol has to be a simple identifier, so a statement such as from .. import primitive.lines is also illegal. Finally, relative imports can only be used from within a package; it is illegal to use a relative import to refer to modules that are simply located in a different directory on the filesystem.

8.11 Running a Package Submodule as a Script

Code that’s organized into a package has a different runtime environment than a simple script. There is an enclosing package name, submodules, and the use of relative imports (which only work inside a package). One feature that no longer works is the ability to run Python directly on a package source file. For example, suppose you are working on the graphics/graph2d/plot2d.py file and add some testing code at the bottom:

# graphics/graph2d/plot2d.py
from ..primitive import lines, text

class Plot2D:
    ...

if __name__ == '__main__':
    print('Testing Plot2D')
    p = Plot2D()
    ...

If you try to run it directly, you get a crash complaining about relative import statements:

bash $ python3 graphics/graph2d/plot2d.py
Traceback (most recent call last):
  File "graphics/graph2d/plot2d.py", line 1, in <module>
    from ..primitive import line, text
ValueError: attempted relative import beyond top-level package
bash $

You can’t move into the package directory and run it there either:

bash $ cd graphics/graph2d/
bash $ python3 plot2d.py
Traceback (most recent call last):
  File "plot2d.py", line 1, in <module>
    from ..primitive import line, text
ValueError: attempted relative import beyond top-level package
bash $

To run a submodule as a main script, you need to use the -m option to the interpreter. For example:

bash $ python3 -m graphics.graph2d.plot2d
Testing Plot2D
bash $

-m specifies a module or package as the main program. Python will run the module with the proper environment to make sure that imports work. Many of Python’s built-in packages have “secret” features that can be used via -m. One of the most well-known is using python3 -m http.server to run a web server from the current directory.

You can provide similar functionality with your own packages. If the name supplied to python -m name corresponds to a package directory, Python looks for the presence of a __main__.py in that directory and runs that as the script.

8.12 Controlling the Package Namespace

The primary purpose of a package is to serve as a top-level container for code. Sometimes users will import the top-level name and nothing else. For example:

import graphics

This import doesn’t specify any particular submodule. Nor does it make any other part of the package accessible. For example, you’ll find that code like this fails:

import graphics
graphics.primitive.fill.floodfill(img,x,y,color)  # Fails!

When only a top-level package import is given, the only file that imports is the associated __init__.py file. In this example, it’s the file graphics/__init__.py file.

The primary purpose of an __init__.py file is to build and/or manage the contents of the top-level package namespace. Often, this involves importing selected functions, classes, and other objects from lower-level submodules. For example, if the graphics package in this example consists of hundreds of low-level functions but most of those details are encapsulated into a handful of high-level classes, then the __init__.py file might choose to expose just those classes:

# graphics/__init__.py

from .graph2d.plot2d import Plot2D
from .graph3d.plot3d import Plot3D

With this __init__.py file, the names Plot2D and Plot3D would appear at the top level of the package. A user could then use those names as if graphics were a simple module:

from graphics import Plot2D
plt = Plot2D(100, 100)
plt.clear()
...

This is often much more convenient for the user because they don’t have to know how you’ve actually organized your code. In some sense, you’re putting a higher layer of abstraction on top of your code structure. Many of the modules in the Python standard library are constructed in this manner. For example, the popular collections module is actually a package. The collections/__init__.py file consolidates definitions from a few different places and presents them to the user as a single consolidated namespace.

8.13 Controlling Package Exports

One issue concerns the interaction between an __init__.py file and low-level submodules. For example, the user of a package might only want to concern themselves with objects and functions that live in the top-level package namespace. However, the implementor of a package might be concerned with the problem of organizing code into maintainable submodules.

To better manage this organizational complexity, package submodules often declare an explicit list of exports by defining an __all__ variable. This is a list of names that should be pushed up one level in the package namespace. For example:

# graphics/graph2d/plot2d.py

__all__ = ['Plot2D']

class Plot2D:
    ...

The associated __init__.py file then imports its submodules using an * import like this:

# graphics/graph2d/__init__.py

# Only loads names explicitly listed in __all__ variables
from .plot2d import *

# Propagate the __all__ up to next level (if desired)
__all__ = plot2d.__all__

This lifting process then continues all the way to the top-level package __init__.py. for example:

# graphics/__init__.py
from .graph2d import *
from .graph3d import *

# Consolidate exports
__all__ = [
    *graph2d.__all__,
    *graph3d.__all__
]

The gist is that every component of a package explicitly states its exports using the __all__ variable. The __init__.py files then propagate the exports upwards. In practice, it can get complicated, but this approach avoids the problem of hard-wiring specific export names into the __init__.py file. Instead, if a submodule wants to export something, its name gets listed in just one place—the __all__ variable. Then, by magic, it propagates up to its proper place in the package namespace.

It is worth noting that although using * imports in user code is frowned upon, it is widespread practice in package __init__.py files. The reason it works in packages is that it is usually much more controlled and contained—being driven by the contents of the __all__ variables and not a free-wheeling attitude of “let’s just import everything.”

8.14 Package Data

Sometimes a package includes data files that need to be loaded (as opposed to source code). Within a package, the __file__ variable will give you location information about a specific source file. However, packages are complicated. They might be bundled within ZIP archive files or loaded from unusual environments. The __file__ variable itself might be unreliable or even undefined. As a result, loading a data file is often not a simple matter of passing a filename to the built-in open() function and reading some data.

To read package data, use pkgutil.get_data(package, resource). For example, is your package looks like this:

mycode/
    resources/
        data.json
    __init__.py
    spam.py
    yow.py

To load the file data.json from the file spam.py, do this:

# mycode/spam.py
import pkgutil
import json

def func():
    rawdata = pkgutil.get_data(__package__,
                               'resources/data.json')
    textdata = rawdata.decode('utf-8')
    data = json.loads(textdata)
    print(data)

The get_data() function attempts to find the specified resource and returns its contents as a raw byte string. The __package__ variable shown in the example is a string holding the name of the enclosing package. Any further decoding (such as converting bytes to text) and interpretation is up to you. In the example, the data is decoded and parsed from JSON into a Python dictionary.

A package is not a good place to store giant data files. Reserve package resources for configuration data and other assorted bits of stuff needed to make your package work.

8.15 Module Objects

Modules are first-class objects. Table 8.1 lists attributes commonly found on modules.

Table 8.1 Module Attributes

Attribute

Description

__name__

Full module name

__doc__

Documentation string

__dict__

Module dictionary

__file__

Filename where defined

__package__

Name of enclosing package (if any)

__path__

List of subdirectories to search for submodules of a package.

__annotations__

Module-level type hints

The __dict__ attribute is a dictionary that represents the module namespace. Everything that’s defined in the module is placed here.

The __name__ attribute is often used in scripts. A check such as if __name__ == '__main__' is often done to see if a file is running as a standalone program.

The __package__ attribute contains the name of the enclosing package if any. If set, the __path__ attribute is a list of directories that will be searched to locate package submodules. Normally, it contains a single entry with the directory in which a package is located. Sometimes large frameworks will manipulate __path__ to incorporate additional directories for the purpose of supporting plugins and other advanced features.

Not all attributes are available on all modules. For example, built-in modules may not have a __file__ attribute set. Similarly, package-related attributes are not set for top-level modules (not contained in a package).

The __doc__ attribute is the module doc string (if any). This is a string that appears as the first statement in a file. The __annotations__ attribute is a dictionary of module-level type hints. These look something like this:

# mymodule.py

'''
The doc string
'''


# Type hints (placed into __annotations__)
x: int
y: float
...

As with other type hints, module-level hints change no part of Python’s behavior, nor do they actually define variables. They are purely metadata that other tools can choose to look at if they want.

8.16 Deploying Python Packages

The final frontier of modules and packages is the problem of giving your code to others. This is a large topic that has been the focus of active ongoing development over many years. I won’t try to document a process that’s bound to be out-of-date by the time you read this. Instead, direct your attention to the documentation at https://packaging.python.org/tutorials/packaging-projects.

For the purposes of day-to-day development, the most important thing is to keep your code isolated as a self-contained project. All of your code should live in a proper package. Try to give your package a unique name so that it doesn’t conflict with other possible dependencies. Consult the Python package index at https://pypi.org to pick a name. In structuring your code, try to keep things simple. As you’ve seen, there are many highly sophisticated things that can be done with the module and package system. There is a time and place for that, but it should not be your starting point.

With absolute simplicity in mind, the most minimalistic way to distribute pure Python code is to use the setuptools module or the built-in distutils module. Suppose you have written some code and it’s in a project that looks like this:

spam-project/
     README.txt
     Documentation.txt
     spam/              # A package of code
         __init__.py
         foo.py
         bar.py
     runspam.py         # A script to run as: python runspam.py

To create a distribution, create a file setup.py in the topmost directory (spam-project/ in this example). In this file, put the following code:

# setup.py
from setuptools import setup

setup(name = "spam",
            version = "0.0"
            packages = ['spam'],
            scripts = ['runspam.py'],
            )

In the setup() call, packages is a list of all package directories, and scripts is a list of script files. Any of these arguments may be omitted if your software does not have them (for example, if there are no scripts). name is the name of your package, and version is the version number as a string. The call to setup() supports a variety of other parameters that supply various metadata about your package. See the full list at https://docs.python.org/3/distutils/apiref.html.

Creating a setup.py file is enough to create a source distribution of your software. Type the following shell command to make a source distribution:

bash $ python setup.py sdist
...
bash $

This creates an archive file, such as spam-1.0.tar.gz or spam-1.0.zip, in the directory spam/dist. This is the file you would give to others to install your software. To install, a user can use a command such as pip. For example:

shell $ python3 -m pip install spam-1.0.tar.gz

This installs the software into the local Python distribution and makes it available for general use. The code will normally be installed into a directory called site-packages in the Python library. To find the exact location of this directory, inspect the value of sys.path. Scripts are normally installed into the same directory as the Python interpreter itself.

If the first line of a script starts with #! and contains the text python, the installer will rewrite the line to point to the local installation of Python. Thus, if your scripts have been hardcoded to a specific Python location, such as /usr/local/bin/python, they should still work when installed on other systems where Python is in a different location.

It must be stressed that the use of setuptools as described here is absolutely minimal. Larger projects may involve C/C++ extensions, complicated package structures, examples, and more. Covering all of the tools and possible ways to deploy such code is beyond the scope of this book. You should consult various resources on https://python.org and https://pypi.org for the most up-to-date advice.

8.17 The Penultimate Word: Start with a Package

When first starting a new program, it is easy to start with a simple single Python file. For example, you might write a script called program.py and start with that. Although this will work fine for throwaway programs and short tasks, your “script” may start growing and adding features. Eventually, you might want to split it into multiple files. It’s at that point that problems often arise.

In light of this, it makes sense to get in the habit of starting all programs as a package from the onset. For example, instead of making a file called program.py, you should make a program package directory called program:

program/
    __init__.py
    __main__.py

Put your starting code in __main__.py and run your program using a command such as python -m program. As you need more code, add new files to your package and use package-relative imports. An advantage of using a package is that all of your code remains isolated. You can name the files whatever you want and not worry about collisions with other packages, standard library modules, or code written by your coworkers. Although setting up a package requires a bit more work at the start, it will likely save you a lot of headaches later.

8.18 The Final Word: Keep It Simple

There is a lot of more advanced wizardry associated with the module and package system than what has been shown here. Consult the tutorial “Modules and Packages: Live and Let Die!” at https://dabeaz.com/modulepackage/index.html to get an idea of what’s possible.

All things considered, however, you’re probably better off not doing any advanced module hacking. Managing modules, packages, and software distribution has always been a source of pain in the Python community. Much of the pain is a direct consequence of people applying hacks to the module system. Don’t do that. Keep it simple and find the power to just say “no” when your coworkers propose to modify import to work with the blockchain.