Previous: Tokenization, Up: Overview



1.4 The preprocessing language

After tokenization, the stream of tokens may simply be passed straight to the compiler's parser. However, if it contains any operations in the preprocessing language, it will be transformed first. This stage corresponds roughly to the standard's “translation phase 4” and is what most people think of as the preprocessor's job.

The preprocessing language consists of directives to be executed and macros to be expanded. Its primary capabilities are:

There are a few more, less useful, features.

Except for expansion of predefined macros, all these operations are triggered with preprocessing directives. Preprocessing directives are lines in your program that start with #. Whitespace is allowed before and after the #. The # is followed by an identifier, the directive name. It specifies the operation to perform. Directives are commonly referred to as # name where name is the directive name. For example, #define is the directive that defines a macro.

The # which begins a directive cannot come from a macro expansion. Also, the directive name is not macro expanded. Thus, if foo is defined as a macro expanding to define, that does not make #foo a valid preprocessing directive.

The set of valid directive names is fixed. Programs cannot define new preprocessing directives.

Some directives require arguments; these make up the rest of the directive line and must be separated from the directive name by whitespace. For example, #define must be followed by a macro name and the intended expansion of the macro.

A preprocessing directive cannot cover more than one line. The line may, however, be continued with backslash-newline, or by a block comment which extends past the end of the line. In either case, when the directive is processed, the continuations have already been merged with the first line to make one long line.