You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Token-Spacing.html 9.7KB

3 yıl önce
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
  4. <head>
  5. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  6. <title>Token Spacing (The GNU C Preprocessor Internals)</title>
  7. <meta name="description" content="Token Spacing (The GNU C Preprocessor Internals)">
  8. <meta name="keywords" content="Token Spacing (The GNU C Preprocessor Internals)">
  9. <meta name="resource-type" content="document">
  10. <meta name="distribution" content="global">
  11. <meta name="Generator" content="makeinfo">
  12. <link href="index.html#Top" rel="start" title="Top">
  13. <link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
  14. <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
  15. <link href="index.html#Top" rel="up" title="Top">
  16. <link href="Line-Numbering.html#Line-Numbering" rel="next" title="Line Numbering">
  17. <link href="Macro-Expansion.html#Macro-Expansion" rel="prev" title="Macro Expansion">
  18. <style type="text/css">
  19. <!--
  20. a.summary-letter {text-decoration: none}
  21. blockquote.indentedblock {margin-right: 0em}
  22. blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
  23. blockquote.smallquotation {font-size: smaller}
  24. div.display {margin-left: 3.2em}
  25. div.example {margin-left: 3.2em}
  26. div.lisp {margin-left: 3.2em}
  27. div.smalldisplay {margin-left: 3.2em}
  28. div.smallexample {margin-left: 3.2em}
  29. div.smalllisp {margin-left: 3.2em}
  30. kbd {font-style: oblique}
  31. pre.display {font-family: inherit}
  32. pre.format {font-family: inherit}
  33. pre.menu-comment {font-family: serif}
  34. pre.menu-preformatted {font-family: serif}
  35. pre.smalldisplay {font-family: inherit; font-size: smaller}
  36. pre.smallexample {font-size: smaller}
  37. pre.smallformat {font-family: inherit; font-size: smaller}
  38. pre.smalllisp {font-size: smaller}
  39. span.nolinebreak {white-space: nowrap}
  40. span.roman {font-family: initial; font-weight: normal}
  41. span.sansserif {font-family: sans-serif; font-weight: normal}
  42. ul.no-bullet {list-style: none}
  43. -->
  44. </style>
  45. </head>
  46. <body lang="en">
  47. <a name="Token-Spacing"></a>
  48. <div class="header">
  49. <p>
  50. Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
  51. </div>
  52. <hr>
  53. <a name="Token-Spacing-1"></a>
  54. <h2 class="unnumbered">Token Spacing</h2>
  55. <a name="index-paste-avoidance"></a>
  56. <a name="index-spacing"></a>
  57. <a name="index-token-spacing"></a>
  58. <p>First, consider an issue that only concerns the stand-alone
  59. preprocessor: there needs to be a guarantee that re-reading its preprocessed
  60. output results in an identical token stream. Without taking special
  61. measures, this might not be the case because of macro substitution.
  62. For example:
  63. </p>
  64. <div class="smallexample">
  65. <pre class="smallexample">#define PLUS +
  66. #define EMPTY
  67. #define f(x) =x=
  68. +PLUS -EMPTY- PLUS+ f(=)
  69. &rarr; + + - - + + = = =
  70. <em>not</em>
  71. &rarr; ++ -- ++ ===
  72. </pre></div>
  73. <p>One solution would be to simply insert a space between all adjacent
  74. tokens. However, we would like to keep space insertion to a minimum,
  75. both for aesthetic reasons and because it causes problems for people who
  76. still try to abuse the preprocessor for things like Fortran source and
  77. Makefiles.
  78. </p>
  79. <p>For now, just notice that when tokens are added (or removed, as shown by
  80. the <code>EMPTY</code> example) from the original lexed token stream, we need
  81. to check for accidental token pasting. We call this <em>paste
  82. avoidance</em>. Token addition and removal can only occur because of macro
  83. expansion, but accidental pasting can occur in many places: both before
  84. and after each macro replacement, each argument replacement, and
  85. additionally each token created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
  86. </p>
  87. <p>Look at how the preprocessor gets whitespace output correct
  88. normally. The <code>cpp_token</code> structure contains a flags byte, and one
  89. of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
  90. indicates that the token was preceded by whitespace of some form other
  91. than a new line. The stand-alone preprocessor can use this flag to
  92. decide whether to insert a space between tokens in the output.
  93. </p>
  94. <p>Now consider the result of the following macro expansion:
  95. </p>
  96. <div class="smallexample">
  97. <pre class="smallexample">#define add(x, y, z) x + y +z;
  98. sum = add (1,2, 3);
  99. &rarr; sum = 1 + 2 +3;
  100. </pre></div>
  101. <p>The interesting thing here is that the tokens &lsquo;<samp>1</samp>&rsquo; and &lsquo;<samp>2</samp>&rsquo; are
  102. output with a preceding space, and &lsquo;<samp>3</samp>&rsquo; is output without a
  103. preceding space, but when lexed none of these tokens had that property.
  104. Careful consideration reveals that &lsquo;<samp>1</samp>&rsquo; gets its preceding
  105. whitespace from the space preceding &lsquo;<samp>add</samp>&rsquo; in the macro invocation,
  106. <em>not</em> replacement list. &lsquo;<samp>2</samp>&rsquo; gets its whitespace from the
  107. space preceding the parameter &lsquo;<samp>y</samp>&rsquo; in the macro replacement list,
  108. and &lsquo;<samp>3</samp>&rsquo; has no preceding space because parameter &lsquo;<samp>z</samp>&rsquo; has none
  109. in the replacement list.
  110. </p>
  111. <p>Once lexed, tokens are effectively fixed and cannot be altered, since
  112. pointers to them might be held in many places, in particular by
  113. in-progress macro expansions. So instead of modifying the two tokens
  114. above, the preprocessor inserts a special token, which I call a
  115. <em>padding token</em>, into the token stream to indicate that spacing of
  116. the subsequent token is special. The preprocessor inserts padding
  117. tokens in front of every macro expansion and expanded macro argument.
  118. These point to a <em>source token</em> from which the subsequent real token
  119. should inherit its spacing. In the above example, the source tokens are
  120. &lsquo;<samp>add</samp>&rsquo; in the macro invocation, and &lsquo;<samp>y</samp>&rsquo; and &lsquo;<samp>z</samp>&rsquo; in the
  121. macro replacement list, respectively.
  122. </p>
  123. <p>It is quite easy to get multiple padding tokens in a row, for example if
  124. a macro&rsquo;s first replacement token expands straight into another macro.
  125. </p>
  126. <div class="smallexample">
  127. <pre class="smallexample">#define foo bar
  128. #define bar baz
  129. [foo]
  130. &rarr; [baz]
  131. </pre></div>
  132. <p>Here, two padding tokens are generated with sources the &lsquo;<samp>foo</samp>&rsquo; token
  133. between the brackets, and the &lsquo;<samp>bar</samp>&rsquo; token from foo&rsquo;s replacement
  134. list, respectively. Clearly the first padding token is the one to
  135. use, so the output code should contain a rule that the first
  136. padding token in a sequence is the one that matters.
  137. </p>
  138. <p>But what if a macro expansion is left? Adjusting the above
  139. example slightly:
  140. </p>
  141. <div class="smallexample">
  142. <pre class="smallexample">#define foo bar
  143. #define bar EMPTY baz
  144. #define EMPTY
  145. [foo] EMPTY;
  146. &rarr; [ baz] ;
  147. </pre></div>
  148. <p>As shown, now there should be a space before &lsquo;<samp>baz</samp>&rsquo; and the
  149. semicolon in the output.
  150. </p>
  151. <p>The rules we decided above fail for &lsquo;<samp>baz</samp>&rsquo;: we generate three
  152. padding tokens, one per macro invocation, before the token &lsquo;<samp>baz</samp>&rsquo;.
  153. We would then have it take its spacing from the first of these, which
  154. carries source token &lsquo;<samp>foo</samp>&rsquo; with no leading space.
  155. </p>
  156. <p>It is vital that cpplib get spacing correct in these examples since any
  157. of these macro expansions could be stringized, where spacing matters.
  158. </p>
  159. <p>So, this demonstrates that not just entering macro and argument
  160. expansions, but leaving them requires special handling too. I made
  161. cpplib insert a padding token with a <code>NULL</code> source token when
  162. leaving macro expansions, as well as after each replaced argument in a
  163. macro&rsquo;s replacement list. It also inserts appropriate padding tokens on
  164. either side of tokens created by the &lsquo;<samp>#</samp>&rsquo; and &lsquo;<samp>##</samp>&rsquo; operators.
  165. I expanded the rule so that, if we see a padding token with a
  166. <code>NULL</code> source token, <em>and</em> that source token has no leading
  167. space, then we behave as if we have seen no padding tokens at all. A
  168. quick check shows this rule will then get the above example correct as
  169. well.
  170. </p>
  171. <p>Now a relationship with paste avoidance is apparent: we have to be
  172. careful about paste avoidance in exactly the same locations we have
  173. padding tokens in order to get white space correct. This makes
  174. implementation of paste avoidance easy: wherever the stand-alone
  175. preprocessor is fixing up spacing because of padding tokens, and it
  176. turns out that no space is needed, it has to take the extra step to
  177. check that a space is not needed after all to avoid an accidental paste.
  178. The function <code>cpp_avoid_paste</code> advises whether a space is required
  179. between two consecutive tokens. To avoid excessive spacing, it tries
  180. hard to only require a space if one is likely to be necessary, but for
  181. reasons of efficiency it is slightly conservative and might recommend a
  182. space where one is not strictly needed.
  183. </p>
  184. <hr>
  185. <div class="header">
  186. <p>
  187. Next: <a href="Line-Numbering.html#Line-Numbering" accesskey="n" rel="next">Line Numbering</a>, Previous: <a href="Macro-Expansion.html#Macro-Expansion" accesskey="p" rel="prev">Macro Expansion</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
  188. </div>
  189. </body>
  190. </html>