|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139 |
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- <html>
- <!-- Copyright (C) 1987-2020 Free Software Foundation, Inc.
-
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3 or
- any later version published by the Free Software Foundation. A copy of
- the license is included in the
- section entitled "GNU Free Documentation License".
-
- This manual contains no Invariant Sections. The Front-Cover Texts are
- (a) (see below), and the Back-Cover Texts are (b) (see below).
-
- (a) The FSF's Front-Cover Text is:
-
- A GNU Manual
-
- (b) The FSF's Back-Cover Text is:
-
- You have freedom to copy and modify this GNU Manual, like GNU
- software. Copies published by the Free Software Foundation raise
- funds for GNU development. -->
- <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>Character sets (The C Preprocessor)</title>
-
- <meta name="description" content="Character sets (The C Preprocessor)">
- <meta name="keywords" content="Character sets (The C Preprocessor)">
- <meta name="resource-type" content="document">
- <meta name="distribution" content="global">
- <meta name="Generator" content="makeinfo">
- <link href="index.html#Top" rel="start" title="Top">
- <link href="Index-of-Directives.html#Index-of-Directives" rel="index" title="Index of Directives">
- <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
- <link href="Overview.html#Overview" rel="up" title="Overview">
- <link href="Initial-processing.html#Initial-processing" rel="next" title="Initial processing">
- <link href="Overview.html#Overview" rel="prev" title="Overview">
- <style type="text/css">
- <!--
- a.summary-letter {text-decoration: none}
- blockquote.indentedblock {margin-right: 0em}
- blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
- blockquote.smallquotation {font-size: smaller}
- div.display {margin-left: 3.2em}
- div.example {margin-left: 3.2em}
- div.lisp {margin-left: 3.2em}
- div.smalldisplay {margin-left: 3.2em}
- div.smallexample {margin-left: 3.2em}
- div.smalllisp {margin-left: 3.2em}
- kbd {font-style: oblique}
- pre.display {font-family: inherit}
- pre.format {font-family: inherit}
- pre.menu-comment {font-family: serif}
- pre.menu-preformatted {font-family: serif}
- pre.smalldisplay {font-family: inherit; font-size: smaller}
- pre.smallexample {font-size: smaller}
- pre.smallformat {font-family: inherit; font-size: smaller}
- pre.smalllisp {font-size: smaller}
- span.nolinebreak {white-space: nowrap}
- span.roman {font-family: initial; font-weight: normal}
- span.sansserif {font-family: sans-serif; font-weight: normal}
- ul.no-bullet {list-style: none}
- -->
- </style>
-
-
- </head>
-
- <body lang="en">
- <a name="Character-sets"></a>
- <div class="header">
- <p>
- Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
- </div>
- <hr>
- <a name="Character-sets-1"></a>
- <h3 class="section">1.1 Character sets</h3>
-
- <p>Source code character set processing in C and related languages is
- rather complicated. The C standard discusses two character sets, but
- there are really at least four.
- </p>
- <p>The files input to CPP might be in any character set at all. CPP’s
- very first action, before it even looks for line boundaries, is to
- convert the file into the character set it uses for internal
- processing. That set is what the C standard calls the <em>source</em>
- character set. It must be isomorphic with ISO 10646, also known as
- Unicode. CPP uses the UTF-8 encoding of Unicode.
- </p>
- <p>The character sets of the input files are specified using the
- <samp>-finput-charset=</samp> option.
- </p>
- <p>All preprocessing work (the subject of the rest of this manual) is
- carried out in the source character set. If you request textual
- output from the preprocessor with the <samp>-E</samp> option, it will be
- in UTF-8.
- </p>
- <p>After preprocessing is complete, string and character constants are
- converted again, into the <em>execution</em> character set. This
- character set is under control of the user; the default is UTF-8,
- matching the source character set. Wide string and character
- constants have their own character set, which is not called out
- specifically in the standard. Again, it is under control of the user.
- The default is UTF-16 or UTF-32, whichever fits in the target’s
- <code>wchar_t</code> type, in the target machine’s byte
- order.<a name="DOCF1" href="#FOOT1"><sup>1</sup></a> Octal and hexadecimal escape sequences do not undergo
- conversion; <tt>'\x12'</tt> has the value 0x12 regardless of the currently
- selected execution character set. All other escapes are replaced by
- the character in the source character set that they represent, then
- converted to the execution character set, just like unescaped
- characters.
- </p>
- <p>In identifiers, characters outside the ASCII range can be specified
- with the ‘<samp>\u</samp>’ and ‘<samp>\U</samp>’ escapes or used directly in the input
- encoding. If strict ISO C90 conformance is specified with an option
- such as <samp>-std=c90</samp>, or <samp>-fno-extended-identifiers</samp> is
- used, then those constructs are not permitted in identifiers.
- </p>
- <div class="footnote">
- <hr>
- <h4 class="footnotes-heading">Footnotes</h4>
-
- <h3><a name="FOOT1" href="#DOCF1">(1)</a></h3>
- <p>UTF-16 does not meet the requirements of the C
- standard for a wide character set, but the choice of 16-bit
- <code>wchar_t</code> is enshrined in some system ABIs so we cannot fix
- this.</p>
- </div>
- <hr>
- <div class="header">
- <p>
- Next: <a href="Initial-processing.html#Initial-processing" accesskey="n" rel="next">Initial processing</a>, Up: <a href="Overview.html#Overview" accesskey="u" rel="up">Overview</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index-of-Directives.html#Index-of-Directives" title="Index" rel="index">Index</a>]</p>
- </div>
-
-
-
- </body>
- </html>
|