You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Character-sets.html 6.2KB

3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!-- Copyright (C) 1987-2020 Free Software Foundation, Inc.
  4. Permission is granted to copy, distribute and/or modify this document
  5. under the terms of the GNU Free Documentation License, Version 1.3 or
  6. any later version published by the Free Software Foundation. A copy of
  7. the license is included in the
  8. section entitled "GNU Free Documentation License".
  9. This manual contains no Invariant Sections. The Front-Cover Texts are
  10. (a) (see below), and the Back-Cover Texts are (b) (see below).
  11. (a) The FSF's Front-Cover Text is:
  12. A GNU Manual
  13. (b) The FSF's Back-Cover Text is:
  14. You have freedom to copy and modify this GNU Manual, like GNU
  15. software. Copies published by the Free Software Foundation raise
  16. funds for GNU development. -->
  17. <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
  18. <head>
  19. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  20. <title>Character sets (The C Preprocessor)</title>
  21. <meta name="description" content="Character sets (The C Preprocessor)">
  22. <meta name="keywords" content="Character sets (The C Preprocessor)">
  23. <meta name="resource-type" content="document">
  24. <meta name="distribution" content="global">
  25. <meta name="Generator" content="makeinfo">
  26. <link href="index.html#Top" rel="start" title="Top">
  27. <link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
  28. <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
  29. <link href="Overview.html#Overview" rel="up" title="Overview">
  30. <link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
  31. <link href="Overview.html#Overview" rel="prev" title="Overview">
  32. <style type="text/css">
  33. <!--
  34. a.summary-letter {text-decoration: none}
  35. blockquote.indentedblock {margin-right: 0em}
  36. blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
  37. blockquote.smallquotation {font-size: smaller}
  38. div.display {margin-left: 3.2em}
  39. div.example {margin-left: 3.2em}
  40. div.lisp {margin-left: 3.2em}
  41. div.smalldisplay {margin-left: 3.2em}
  42. div.smallexample {margin-left: 3.2em}
  43. div.smalllisp {margin-left: 3.2em}
  44. kbd {font-style: oblique}
  45. pre.display {font-family: inherit}
  46. pre.format {font-family: inherit}
  47. pre.menu-comment {font-family: serif}
  48. pre.menu-preformatted {font-family: serif}
  49. pre.smalldisplay {font-family: inherit; font-size: smaller}
  50. pre.smallexample {font-size: smaller}
  51. pre.smallformat {font-family: inherit; font-size: smaller}
  52. pre.smalllisp {font-size: smaller}
  53. span.nolinebreak {white-space: nowrap}
  54. span.roman {font-family: initial; font-weight: normal}
  55. span.sansserif {font-family: sans-serif; font-weight: normal}
  56. ul.no-bullet {list-style: none}
  57. -->
  58. </style>
  59. </head>
  60. <body lang="en">
  61. <a name="Character-sets"></a>
  62. <div class="header">
  63. <p>
  64. Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
  65. </div>
  66. <hr>
  67. <a name="Character-sets-1"></a>
  68. <h3 class="section">1.1 Character sets</h3>
  69. <p>Source code character set processing in C and related languages is
  70. rather complicated. The C standard discusses two character sets, but
  71. there are really at least four.
  72. </p>
  73. <p>The files input to CPP might be in any character set at all. CPP&rsquo;s
  74. very first action, before it even looks for line boundaries, is to
  75. convert the file into the character set it uses for internal
  76. processing. That set is what the C standard calls the <em>source</em>
  77. character set. It must be isomorphic with ISO 10646, also known as
  78. Unicode. CPP uses the UTF-8 encoding of Unicode.
  79. </p>
  80. <p>The character sets of the input files are specified using the
  81. <samp>-finput-charset=</samp> option.
  82. </p>
  83. <p>All preprocessing work (the subject of the rest of this manual) is
  84. carried out in the source character set. If you request textual
  85. output from the preprocessor with the <samp>-E</samp> option, it will be
  86. in UTF-8.
  87. </p>
  88. <p>After preprocessing is complete, string and character constants are
  89. converted again, into the <em>execution</em> character set. This
  90. character set is under control of the user; the default is UTF-8,
  91. matching the source character set. Wide string and character
  92. constants have their own character set, which is not called out
  93. specifically in the standard. Again, it is under control of the user.
  94. The default is UTF-16 or UTF-32, whichever fits in the target&rsquo;s
  95. <code>wchar_t</code> type, in the target machine&rsquo;s byte
  96. order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo
  97. conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
  98. selected execution character set. All other escapes are replaced by
  99. the character in the source character set that they represent, then
  100. converted to the execution character set, just like unescaped
  101. characters.
  102. </p>
  103. <p>In identifiers, characters outside the ASCII range can be specified
  104. with the &lsquo;<samp>\u</samp>&rsquo; and &lsquo;<samp>\U</samp>&rsquo; escapes or used directly in the input
  105. encoding. If strict ISO C90 conformance is specified with an option
  106. such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
  107. used, then those constructs are not permitted in identifiers.
  108. </p>
  109. <div class="footnote">
  110. <hr>
  111. <h4 class="footnotes-heading">Footnotes</h4>
  112. <h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
  113. <p>UTF-16 does not meet the requirements of the C
  114. standard for a wide character set, but the choice of 16-bit
  115. <code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
  116. this.</p>
  117. </div>
  118. <hr>
  119. <div class="header">
  120. <p>
  121. Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
  122. </div>
  123. </body>
  124. </html>