1.8 The World According to C

Operating systems are normally large C (or sometimes C++) programs consisting of many pieces written by many programmers. The environment used for developing operating systems is very different from what individuals (such as students) are used to when writing small Java programs. This section is an attempt to give a very brief introduction to the world of writing an operating system for smalltime Java or Python programmers.

1.8.1 The C Language

This is not a guide to C, but a short summary of some of the key differences between C and languages like Python and especially Java. Java is based on C, so there are many similarities between the two. Python is somewhat different, but still fairly similar. For convenience, we focus on Java. Java, Python, and C are all imperative languages with data types, variables, and control statements, for example. The primitive data types in C are integers (including short and long ones), characters, and floating-point numbers. Composite data types can be constructed using arrays, structures, and unions. The control statements in C are similar to those in Java, including if, switch, for, and while statements. Functions and parameters are roughly the same in both languages.

One feature C has that Java and Python do not is explicit pointers. A pointer is a variable that points to (i.e., contains the address of) a variable or data structure. Consider the statements

char c1, c2, _*p;
c1 = ’c’;
p = &c1;
c2 = _*p;

which declare c1 and c2 to be character variables and p to be a variable that points to (i.e., contains the address of) a character. The first assignment stores the ASCII code for the character ‘‘c’’ in the variable c1. The second one assigns the address of c1 to the pointer variable p. The third one assigns the contents of the variable pointed to by p to the variable c2, so after these statements are executed, c2 also contains the ASCII code for ‘‘c’’. In theory, pointers are typed, so you are not supposed to assign the address of a floating-point number to a character pointer, but in practice compilers accept such assignments, albeit sometimes with a warning. Pointers are a very powerful construct, but also a great source of errors when used carelessly.

Some things that C does not have include built-in strings, threads, packages, classes, objects, type safety, and garbage collection. The last one is a show stopper for operating systems. All storage in C is either static or explicitly allocated and released by the programmer, usually with the library functions malloc and free. It is the latter property—total programmer control over memory—along with explicit pointers that makes C attractive for writing operating systems. Operating systems are basically real-time systems to some extent, even general-purpose ones. When an interrupt occurs, the operating system may have only a few microseconds to perform some action or lose critical information. Having the garbage collector kick in at an arbitrary moment is intolerable.

1.8.2 Header Files

An operating system project generally consists of some number of directories, each containing many .c files containing the code for some part of the system, along with some .h header files that contain declarations and definitions used by one or more code files. Header files can also include simple macros, such as

#define BUFFER SIZE 4096

which allow the programmer to name constants, so that when BUFFER SIZE is used in the code, it is replaced during compilation by the number 4096. Good C programming practice is to name every constant except 0, 1, and $negative 1 comma$ and sometimes even them. Macros can have parameters, such as

#define max(a, b) (a > b ? a: b)

which allows the programmer to write

i = max(j, k+1)

and get

i = (j > k+1 ? j: k+1)

to store the larger of j and $k plus 1$ in i. Headers can also contain conditional compilation, for example


#ifdef X86
intel int ack();
#endif

which compiles into a call to the function intel int ack if the macro X86 is defined and nothing otherwise. Conditional compilation is heavily used to isolate architecture-dependent code so that certain code is inserted only when the system is compiled on the X86, other code is inserted only when the system is compiled on a SPARC, and so on. A .c file can bodily include zero or more header files using the #include directive. There are also many header files that are common to nearly every .c and are stored in a central directory.

1.8.3 Large Programming Projects

To build the operating system, each .c is compiled into an object file by the C compiler. Object files, which have the suffix .o, contain binary instructions for the target machine. They will later be directly executed by the CPU. There is nothing like Java byte code or Python byte code in the C world.

The first pass of the C compiler is called the C preprocessor. As it reads each .c file, every time it hits a #include directive, it goes and gets the header file named in it and processes it, expanding macros, handling conditional compilation (and certain other things) and passing the results to the next pass of the compiler as if they were physically included.

Since operating systems are very large (five million lines of code is not unusual), having to recompile the entire thing every time one file is changed would be unbearable. On the other hand, changing a key header file that is included in thousands of other files does require recompiling those files. Keeping track of which object files depend on which header files is completely unmanageable without help.

Fortunately, computers are very good at precisely this sort of thing. On UNIX systems, there is a program called make (with numerous variants such as gmake, pmake, etc.) that reads the Makefile, which tells it which files are dependent on which other files. What make does is see which object files are needed to build the operating system binary and for each one, check to see if any of the files it depends on (the code and headers) have been modified subsequent to the last time the object file was created. If so, that object file has to be recompiled. When make has determined which .c files have to recompiled, it then invokes the C compiler to recompile them, thus reducing the number of compilations to the bare minimum. In large projects, creating the Makefile is error prone, so there are tools that do it automatically.

Once all the .o files are ready, they are passed to a program called the linker to combine all of them into a single executable binary file. Any library functions called are also included at this point, interfunction references are resolved, and machine addresses are relocated as need be. When the linker is finished, the result is an executable program, traditionally called a.out on UNIX systems. The various components of this process are illustrated in Fig. 1-30 for a program with three C files and two header files. Although we have been discussing operating system development here, all of this applies to developing any large program.

The figure illustrates the process of compiling C and header files to make an executable. — The process of compiling C and header files to make an executable.

Figure 1-30 Full Alternative Text

1.8.4 The Model of Run Time

Once the operating system binary has been linked, the computer can be rebooted and the new operating system started. Once running, it may dynamically load pieces that were not statically included in the binary such as device drivers and file systems. At run time, the operating system may consist of multiple segments, for the text (the program code), the data, and the stack. The text segment is normally immutable, not changing during execution. The data segment starts out at a certain size and initialized with certain values, but it can change and grow as need be. The stack is initially empty but grows and shrinks as functions are called and returned from. Often the text segment is placed near the bottom of memory, the data segment just above it, with the ability to grow upward, and the stack segment at a high virtual address, with the ability to grow downward, but different systems work differently.

In all cases, the operating system code is directly executed by the hardware, with no interpreter and no just-in-time compilation, as is normal with Java.