- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- <html>
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>Token Spacing (The GNU C Preprocessor Internals)</title>
- <meta name="description" content="Token Spacing (The GNU C Preprocessor Internals)">
- <meta name="keywords" content="Token Spacing (The GNU C Preprocessor Internals)">
- <meta name="resource-type" content="document">
- <meta name="distribution" content="global">
- <meta name="Generator" content="makeinfo">
- <link href="index.html#Top" rel="start" title="Top">
- <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
- <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
- <link href="index.html#Top" rel="up" title="Top">
- <link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
- <link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion">
- <style type="text/css">
- <!--
- a.summary-letter {text-decoration: none}
- blockquote.indentedblock {margin-right: 0em}
- blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
- blockquote.smallquotation {font-size: smaller}
- div.display {margin-left: 3.2em}
- div.example {margin-left: 3.2em}
- div.lisp {margin-left: 3.2em}
- div.smalldisplay {margin-left: 3.2em}
- div.smallexample {margin-left: 3.2em}
- div.smalllisp {margin-left: 3.2em}
- kbd {font-style: oblique}
- pre.display {font-family: inherit}
- pre.format {font-family: inherit}
- pre.menu-comment {font-family: serif}
- pre.menu-preformatted {font-family: serif}
- pre.smalldisplay {font-family: inherit; font-size: smaller}
- pre.smallexample {font-size: smaller}
- pre.smallformat {font-family: inherit; font-size: smaller}
- pre.smalllisp {font-size: smaller}
- span.nolinebreak {white-space: nowrap}
- span.roman {font-family: initial; font-weight: normal}
- span.sansserif {font-family: sans-serif; font-weight: normal}
- ul.no-bullet {list-style: none}
- -->
- </style>
- </head>
- <body lang="en">
- <a name="Token-Spacing"></a>
- <div class="header">
- <p>
- Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
- </div>
- <hr>
- <a name="Token-Spacing-1"></a>
- <h2 class="unnumbered">Token Spacing</h2>
- <a name="index-paste-avoidance"></a>
- <a name="index-spacing"></a>
- <a name="index-token-spacing"></a>
- <p>First, consider an issue that only concerns the stand-alone
- preprocessor: there needs to be a guarantee that re-reading its preprocessed
- output results in an identical token stream. Without taking special
- measures, this might not be the case because of macro substitution.
- For example:
- </p>
- <div class="smallexample">
- <pre class="smallexample">#define PLUS +
- #define EMPTY
- #define f(x) =x=
- +PLUS -EMPTY- PLUS+ f(=)
- → + + - - + + = = =
- <em>not</em>
- → ++ -- ++ ===
- </pre></div>
- <p>One solution would be to simply insert a space between all adjacent
- tokens. However, we would like to keep space insertion to a minimum,
- both for aesthetic reasons and because it causes problems for people who
- still try to abuse the preprocessor for things like Fortran source and
- Makefiles.
- </p>
- <p>For now, just notice that when tokens are added (or removed, as shown by
- the <code>EMPTY</code> example) from the original lexed token stream, we need
- to check for accidental token pasting. We call this <em>paste
- avoidance</em>. Token addition and removal can only occur because of macro
- expansion, but accidental pasting can occur in many places: both before
- and after each macro replacement, each argument replacement, and
- additionally each token created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
- </p>
- <p>Look at how the preprocessor gets whitespace output correct
- normally. The <code>cpp_token</code> structure contains a flags byte, and one
- of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
- indicates that the token was preceded by whitespace of some form other
- than a new line. The stand-alone preprocessor can use this flag to
- decide whether to insert a space between tokens in the output.
- </p>
- <p>Now consider the result of the following macro expansion:
- </p>
- <div class="smallexample">
- <pre class="smallexample">#define add(x, y, z) x + y +z;
- sum = add (1,2, 3);
- → sum = 1 + 2 +3;
- </pre></div>
- <p>The interesting thing here is that the tokens ‘<samp>1</samp>’ and ‘<samp>2</samp>’ are
- output with a preceding space, and ‘<samp>3</samp>’ is output without a
- preceding space, but when lexed none of these tokens had that property.
- Careful consideration reveals that ‘<samp>1</samp>’ gets its preceding
- whitespace from the space preceding ‘<samp>add</samp>’ in the macro invocation,
- <em>not</em> replacement list. ‘<samp>2</samp>’ gets its whitespace from the
- space preceding the parameter ‘<samp>y</samp>’ in the macro replacement list,
- and ‘<samp>3</samp>’ has no preceding space because parameter ‘<samp>z</samp>’ has none
- in the replacement list.
- </p>
- <p>Once lexed, tokens are effectively fixed and cannot be altered, since
- pointers to them might be held in many places, in particular by
- in-progress macro expansions. So instead of modifying the two tokens
- above, the preprocessor inserts a special token, which I call a
- <em>padding token</em>, into the token stream to indicate that spacing of
- the subsequent token is special. The preprocessor inserts padding
- tokens in front of every macro expansion and expanded macro argument.
- These point to a <em>source token</em> from which the subsequent real token
- should inherit its spacing. In the above example, the source tokens are
- ‘<samp>add</samp>’ in the macro invocation, and ‘<samp>y</samp>’ and ‘<samp>z</samp>’ in the
- macro replacement list, respectively.
- </p>
- <p>It is quite easy to get multiple padding tokens in a row, for example if
- a macro’s first replacement token expands straight into another macro.
- </p>
- <div class="smallexample">
- <pre class="smallexample">#define foo bar
- #define bar baz
- [foo]
- → [baz]
- </pre></div>
- <p>Here, two padding tokens are generated with sources the ‘<samp>foo</samp>’ token
- between the brackets, and the ‘<samp>bar</samp>’ token from foo’s replacement
- list, respectively. Clearly the first padding token is the one to
- use, so the output code should contain a rule that the first
- padding token in a sequence is the one that matters.
- </p>
- <p>But what if a macro expansion is left? Adjusting the above
- example slightly:
- </p>
- <div class="smallexample">
- <pre class="smallexample">#define foo bar
- #define bar EMPTY baz
- #define EMPTY
- [foo] EMPTY;
- → [ baz] ;
- </pre></div>
- <p>As shown, now there should be a space before ‘<samp>baz</samp>’ and the
- semicolon in the output.
- </p>
- <p>The rules we decided above fail for ‘<samp>baz</samp>’: we generate three
- padding tokens, one per macro invocation, before the token ‘<samp>baz</samp>’.
- We would then have it take its spacing from the first of these, which
- carries source token ‘<samp>foo</samp>’ with no leading space.
- </p>
- <p>It is vital that cpplib get spacing correct in these examples since any
- of these macro expansions could be stringized, where spacing matters.
- </p>
- <p>So, this demonstrates that not just entering macro and argument
- expansions, but leaving them requires special handling too. I made
- cpplib insert a padding token with a <code>NULL</code> source token when
- leaving macro expansions, as well as after each replaced argument in a
- macro’s replacement list. It also inserts appropriate padding tokens on
- either side of tokens created by the ‘<samp>#</samp>’ and ‘<samp>##</samp>’ operators.
- I expanded the rule so that, if we see a padding token with a
- <code>NULL</code> source token, <em>and</em> that source token has no leading
- space, then we behave as if we have seen no padding tokens at all. A
- quick check shows this rule will then get the above example correct as
- well.
- </p>
- <p>Now a relationship with paste avoidance is apparent: we have to be
- careful about paste avoidance in exactly the same locations we have
- padding tokens in order to get white space correct. This makes
- implementation of paste avoidance easy: wherever the stand-alone
- preprocessor is fixing up spacing because of padding tokens, and it
- turns out that no space is needed, it has to take the extra step to
- check that a space is not needed after all to avoid an accidental paste.
- The function <code>cpp_avoid_paste</code> advises whether a space is required
- between two consecutive tokens. To avoid excessive spacing, it tries
- hard to only require a space if one is likely to be necessary, but for
- reasons of efficiency it is slightly conservative and might recommend a
- space where one is not strictly needed.
- </p>
- <hr>
- <div class="header">
- <p>
- Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
- </div>
- </body>
- </html>