|
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- <html>
- <!-- Copyright (C) 1988-2020 Free Software Foundation, Inc.
-
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3 or
- any later version published by the Free Software Foundation; with the
- Invariant Sections being "Funding Free Software", the Front-Cover
- Texts being (a) (see below), and with the Back-Cover Texts being (b)
- (see below). A copy of the license is included in the section entitled
- "GNU Free Documentation License".
-
- (a) The FSF's Front-Cover Text is:
-
- A GNU Manual
-
- (b) The FSF's Back-Cover Text is:
-
- You have freedom to copy and modify this GNU Manual, like GNU
- software. Copies published by the Free Software Foundation raise
- funds for GNU development. -->
- <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>Optimize Options (Using the GNU Compiler Collection (GCC))</title>
-
- <meta name="description" content="Optimize Options (Using the GNU Compiler Collection (GCC))">
- <meta name="keywords" content="Optimize Options (Using the GNU Compiler Collection (GCC))">
- <meta name="resource-type" content="document">
- <meta name="distribution" content="global">
- <meta name="Generator" content="makeinfo">
- <link href="index.html#Top" rel="start" title="Top">
- <link href="Option-Index.html#Option-Index" rel="index" title="Option Index">
- <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
- <link href="Invoking-GCC.html#Invoking-GCC" rel="up" title="Invoking GCC">
- <link href="Instrumentation-Options.html#Instrumentation-Options" rel="next" title="Instrumentation Options">
- <link href="Debugging-Options.html#Debugging-Options" rel="prev" title="Debugging Options">
- <style type="text/css">
- <!--
- a.summary-letter {text-decoration: none}
- blockquote.indentedblock {margin-right: 0em}
- blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
- blockquote.smallquotation {font-size: smaller}
- div.display {margin-left: 3.2em}
- div.example {margin-left: 3.2em}
- div.lisp {margin-left: 3.2em}
- div.smalldisplay {margin-left: 3.2em}
- div.smallexample {margin-left: 3.2em}
- div.smalllisp {margin-left: 3.2em}
- kbd {font-style: oblique}
- pre.display {font-family: inherit}
- pre.format {font-family: inherit}
- pre.menu-comment {font-family: serif}
- pre.menu-preformatted {font-family: serif}
- pre.smalldisplay {font-family: inherit; font-size: smaller}
- pre.smallexample {font-size: smaller}
- pre.smallformat {font-family: inherit; font-size: smaller}
- pre.smalllisp {font-size: smaller}
- span.nolinebreak {white-space: nowrap}
- span.roman {font-family: initial; font-weight: normal}
- span.sansserif {font-family: sans-serif; font-weight: normal}
- ul.no-bullet {list-style: none}
- -->
- </style>
-
-
- </head>
-
- <body lang="en">
- <a name="Optimize-Options"></a>
- <div class="header">
- <p>
- Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
- </div>
- <hr>
- <a name="Options-That-Control-Optimization"></a>
- <h3 class="section">3.11 Options That Control Optimization</h3>
- <a name="index-optimize-options"></a>
- <a name="index-options_002c-optimization"></a>
-
- <p>These options control various sorts of optimizations.
- </p>
- <p>Without any optimization option, the compiler’s goal is to reduce the
- cost of compilation and to make debugging produce the expected
- results. Statements are independent: if you stop the program with a
- breakpoint between statements, you can then assign a new value to any
- variable or change the program counter to any other statement in the
- function and get exactly the results you expect from the source
- code.
- </p>
- <p>Turning on optimization flags makes the compiler attempt to improve
- the performance and/or code size at the expense of compilation time
- and possibly the ability to debug the program.
- </p>
- <p>The compiler performs optimization based on the knowledge it has of the
- program. Compiling multiple files at once to a single output file mode allows
- the compiler to use information gained from all of the files when compiling
- each of them.
- </p>
- <p>Not all optimizations are controlled directly by a flag. Only
- optimizations that have a flag are listed in this section.
- </p>
- <p>Most optimizations are completely disabled at <samp>-O0</samp> or if an
- <samp>-O</samp> level is not set on the command line, even if individual
- optimization flags are specified. Similarly, <samp>-Og</samp> suppresses
- many optimization passes.
- </p>
- <p>Depending on the target and how GCC was configured, a slightly different
- set of optimizations may be enabled at each <samp>-O</samp> level than
- those listed here. You can invoke GCC with <samp>-Q --help=optimizers</samp>
- to find out the exact set of optimizations that are enabled at each level.
- See <a href="Overall-Options.html#Overall-Options">Overall Options</a>, for examples.
- </p>
- <dl compact="compact">
- <dt><code>-O</code></dt>
- <dt><code>-O1</code></dt>
- <dd><a name="index-O"></a>
- <a name="index-O1"></a>
- <p>Optimize. Optimizing compilation takes somewhat more time, and a lot
- more memory for a large function.
- </p>
- <p>With <samp>-O</samp>, the compiler tries to reduce code size and execution
- time, without performing any optimizations that take a great deal of
- compilation time.
- </p>
-
- <p><samp>-O</samp> turns on the following optimization flags:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-fauto-inc-dec
- -fbranch-count-reg
- -fcombine-stack-adjustments
- -fcompare-elim
- -fcprop-registers
- -fdce
- -fdefer-pop
- -fdelayed-branch
- -fdse
- -fforward-propagate
- -fguess-branch-probability
- -fif-conversion
- -fif-conversion2
- -finline-functions-called-once
- -fipa-profile
- -fipa-pure-const
- -fipa-reference
- -fipa-reference-addressable
- -fmerge-constants
- -fmove-loop-invariants
- -fomit-frame-pointer
- -freorder-blocks
- -fshrink-wrap
- -fshrink-wrap-separate
- -fsplit-wide-types
- -fssa-backprop
- -fssa-phiopt
- -ftree-bit-ccp
- -ftree-ccp
- -ftree-ch
- -ftree-coalesce-vars
- -ftree-copy-prop
- -ftree-dce
- -ftree-dominator-opts
- -ftree-dse
- -ftree-forwprop
- -ftree-fre
- -ftree-phiprop
- -ftree-pta
- -ftree-scev-cprop
- -ftree-sink
- -ftree-slsr
- -ftree-sra
- -ftree-ter
- -funit-at-a-time
- </pre></div>
-
- </dd>
- <dt><code>-O2</code></dt>
- <dd><a name="index-O2"></a>
- <p>Optimize even more. GCC performs nearly all supported optimizations
- that do not involve a space-speed tradeoff.
- As compared to <samp>-O</samp>, this option increases both compilation time
- and the performance of the generated code.
- </p>
- <p><samp>-O2</samp> turns on all optimization flags specified by <samp>-O</samp>. It
- also turns on the following optimization flags:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-falign-functions -falign-jumps
- -falign-labels -falign-loops
- -fcaller-saves
- -fcode-hoisting
- -fcrossjumping
- -fcse-follow-jumps -fcse-skip-blocks
- -fdelete-null-pointer-checks
- -fdevirtualize -fdevirtualize-speculatively
- -fexpensive-optimizations
- -ffinite-loops
- -fgcse -fgcse-lm
- -fhoist-adjacent-loads
- -finline-functions
- -finline-small-functions
- -findirect-inlining
- -fipa-bit-cp -fipa-cp -fipa-icf
- -fipa-ra -fipa-sra -fipa-vrp
- -fisolate-erroneous-paths-dereference
- -flra-remat
- -foptimize-sibling-calls
- -foptimize-strlen
- -fpartial-inlining
- -fpeephole2
- -freorder-blocks-algorithm=stc
- -freorder-blocks-and-partition -freorder-functions
- -frerun-cse-after-loop
- -fschedule-insns -fschedule-insns2
- -fsched-interblock -fsched-spec
- -fstore-merging
- -fstrict-aliasing
- -fthread-jumps
- -ftree-builtin-call-dce
- -ftree-pre
- -ftree-switch-conversion -ftree-tail-merge
- -ftree-vrp
- </pre></div>
-
- <p>Please note the warning under <samp>-fgcse</samp> about
- invoking <samp>-O2</samp> on programs that use computed gotos.
- </p>
- </dd>
- <dt><code>-O3</code></dt>
- <dd><a name="index-O3"></a>
- <p>Optimize yet more. <samp>-O3</samp> turns on all optimizations specified
- by <samp>-O2</samp> and also turns on the following optimization flags:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-fgcse-after-reload
- -fipa-cp-clone
- -floop-interchange
- -floop-unroll-and-jam
- -fpeel-loops
- -fpredictive-commoning
- -fsplit-loops
- -fsplit-paths
- -ftree-loop-distribution
- -ftree-loop-vectorize
- -ftree-partial-pre
- -ftree-slp-vectorize
- -funswitch-loops
- -fvect-cost-model
- -fvect-cost-model=dynamic
- -fversion-loops-for-strides
- </pre></div>
-
- </dd>
- <dt><code>-O0</code></dt>
- <dd><a name="index-O0"></a>
- <p>Reduce compilation time and make debugging produce the expected
- results. This is the default.
- </p>
- </dd>
- <dt><code>-Os</code></dt>
- <dd><a name="index-Os"></a>
- <p>Optimize for size. <samp>-Os</samp> enables all <samp>-O2</samp> optimizations
- except those that often increase code size:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-falign-functions -falign-jumps
- -falign-labels -falign-loops
- -fprefetch-loop-arrays -freorder-blocks-algorithm=stc
- </pre></div>
-
- <p>It also enables <samp>-finline-functions</samp>, causes the compiler to tune for
- code size rather than execution speed, and performs further optimizations
- designed to reduce code size.
- </p>
- </dd>
- <dt><code>-Ofast</code></dt>
- <dd><a name="index-Ofast"></a>
- <p>Disregard strict standards compliance. <samp>-Ofast</samp> enables all
- <samp>-O3</samp> optimizations. It also enables optimizations that are not
- valid for all standard-compliant programs.
- It turns on <samp>-ffast-math</samp>, <samp>-fallow-store-data-races</samp>
- and the Fortran-specific <samp>-fstack-arrays</samp>, unless
- <samp>-fmax-stack-var-size</samp> is specified, and <samp>-fno-protect-parens</samp>.
- </p>
- </dd>
- <dt><code>-Og</code></dt>
- <dd><a name="index-Og"></a>
- <p>Optimize debugging experience. <samp>-Og</samp> should be the optimization
- level of choice for the standard edit-compile-debug cycle, offering
- a reasonable level of optimization while maintaining fast compilation
- and a good debugging experience. It is a better choice than <samp>-O0</samp>
- for producing debuggable code because some compiler passes
- that collect debug information are disabled at <samp>-O0</samp>.
- </p>
- <p>Like <samp>-O0</samp>, <samp>-Og</samp> completely disables a number of
- optimization passes so that individual options controlling them have
- no effect. Otherwise <samp>-Og</samp> enables all <samp>-O1</samp>
- optimization flags except for those that may interfere with debugging:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-fbranch-count-reg -fdelayed-branch
- -fdse -fif-conversion -fif-conversion2
- -finline-functions-called-once
- -fmove-loop-invariants -fssa-phiopt
- -ftree-bit-ccp -ftree-dse -ftree-pta -ftree-sra
- </pre></div>
-
- </dd>
- </dl>
-
- <p>If you use multiple <samp>-O</samp> options, with or without level numbers,
- the last such option is the one that is effective.
- </p>
- <p>Options of the form <samp>-f<var>flag</var></samp> specify machine-independent
- flags. Most flags have both positive and negative forms; the negative
- form of <samp>-ffoo</samp> is <samp>-fno-foo</samp>. In the table
- below, only one of the forms is listed—the one you typically
- use. You can figure out the other form by either removing ‘<samp>no-</samp>’
- or adding it.
- </p>
- <p>The following options control specific optimizations. They are either
- activated by <samp>-O</samp> options or are related to ones that are. You
- can use the following flags in the rare cases when “fine-tuning” of
- optimizations to be performed is desired.
- </p>
- <dl compact="compact">
- <dt><code>-fno-defer-pop</code></dt>
- <dd><a name="index-fno_002ddefer_002dpop"></a>
- <a name="index-fdefer_002dpop"></a>
- <p>For machines that must pop arguments after a function call, always pop
- the arguments as soon as each function returns.
- At levels <samp>-O1</samp> and higher, <samp>-fdefer-pop</samp> is the default;
- this allows the compiler to let arguments accumulate on the stack for several
- function calls and pop them all at once.
- </p>
- </dd>
- <dt><code>-fforward-propagate</code></dt>
- <dd><a name="index-fforward_002dpropagate"></a>
- <p>Perform a forward propagation pass on RTL. The pass tries to combine two
- instructions and checks if the result can be simplified. If loop unrolling
- is active, two passes are performed and the second is scheduled after
- loop unrolling.
- </p>
- <p>This option is enabled by default at optimization levels <samp>-O</samp>,
- <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-ffp-contract=<var>style</var></code></dt>
- <dd><a name="index-ffp_002dcontract"></a>
- <p><samp>-ffp-contract=off</samp> disables floating-point expression contraction.
- <samp>-ffp-contract=fast</samp> enables floating-point expression contraction
- such as forming of fused multiply-add operations if the target has
- native support for them.
- <samp>-ffp-contract=on</samp> enables floating-point expression contraction
- if allowed by the language standard. This is currently not implemented
- and treated equal to <samp>-ffp-contract=off</samp>.
- </p>
- <p>The default is <samp>-ffp-contract=fast</samp>.
- </p>
- </dd>
- <dt><code>-fomit-frame-pointer</code></dt>
- <dd><a name="index-fomit_002dframe_002dpointer"></a>
- <p>Omit the frame pointer in functions that don’t need one. This avoids the
- instructions to save, set up and restore the frame pointer; on many targets
- it also makes an extra register available.
- </p>
- <p>On some targets this flag has no effect because the standard calling sequence
- always uses a frame pointer, so it cannot be omitted.
- </p>
- <p>Note that <samp>-fno-omit-frame-pointer</samp> doesn’t guarantee the frame pointer
- is used in all functions. Several targets always omit the frame pointer in
- leaf functions.
- </p>
- <p>Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-foptimize-sibling-calls</code></dt>
- <dd><a name="index-foptimize_002dsibling_002dcalls"></a>
- <p>Optimize sibling and tail recursive calls.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-foptimize-strlen</code></dt>
- <dd><a name="index-foptimize_002dstrlen"></a>
- <p>Optimize various standard C string functions (e.g. <code>strlen</code>,
- <code>strchr</code> or <code>strcpy</code>) and
- their <code>_FORTIFY_SOURCE</code> counterparts into faster alternatives.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-fno-inline</code></dt>
- <dd><a name="index-fno_002dinline"></a>
- <a name="index-finline"></a>
- <p>Do not expand any functions inline apart from those marked with
- the <code>always_inline</code> attribute. This is the default when not
- optimizing.
- </p>
- <p>Single functions can be exempted from inlining by marking them
- with the <code>noinline</code> attribute.
- </p>
- </dd>
- <dt><code>-finline-small-functions</code></dt>
- <dd><a name="index-finline_002dsmall_002dfunctions"></a>
- <p>Integrate functions into their callers when their body is smaller than expected
- function call code (so overall size of program gets smaller). The compiler
- heuristically decides which functions are simple enough to be worth integrating
- in this way. This inlining applies to all functions, even those not declared
- inline.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-findirect-inlining</code></dt>
- <dd><a name="index-findirect_002dinlining"></a>
- <p>Inline also indirect calls that are discovered to be known at compile
- time thanks to previous inlining. This option has any effect only
- when inlining itself is turned on by the <samp>-finline-functions</samp>
- or <samp>-finline-small-functions</samp> options.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-finline-functions</code></dt>
- <dd><a name="index-finline_002dfunctions"></a>
- <p>Consider all functions for inlining, even if they are not declared inline.
- The compiler heuristically decides which functions are worth integrating
- in this way.
- </p>
- <p>If all calls to a given function are integrated, and the function is
- declared <code>static</code>, then the function is normally not output as
- assembler code in its own right.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>. Also enabled
- by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-finline-functions-called-once</code></dt>
- <dd><a name="index-finline_002dfunctions_002dcalled_002donce"></a>
- <p>Consider all <code>static</code> functions called once for inlining into their
- caller even if they are not marked <code>inline</code>. If a call to a given
- function is integrated, then the function is not output as assembler code
- in its own right.
- </p>
- <p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>,
- but not <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fearly-inlining</code></dt>
- <dd><a name="index-fearly_002dinlining"></a>
- <p>Inline functions marked by <code>always_inline</code> and functions whose body seems
- smaller than the function call overhead early before doing
- <samp>-fprofile-generate</samp> instrumentation and real inlining pass. Doing so
- makes profiling significantly cheaper and usually inlining faster on programs
- having large chains of nested wrapper functions.
- </p>
- <p>Enabled by default.
- </p>
- </dd>
- <dt><code>-fipa-sra</code></dt>
- <dd><a name="index-fipa_002dsra"></a>
- <p>Perform interprocedural scalar replacement of aggregates, removal of
- unused parameters and replacement of parameters passed by reference
- by parameters passed by value.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-finline-limit=<var>n</var></code></dt>
- <dd><a name="index-finline_002dlimit"></a>
- <p>By default, GCC limits the size of functions that can be inlined. This flag
- allows coarse control of this limit. <var>n</var> is the size of functions that
- can be inlined in number of pseudo instructions.
- </p>
- <p>Inlining is actually controlled by a number of parameters, which may be
- specified individually by using <samp>--param <var>name</var>=<var>value</var></samp>.
- The <samp>-finline-limit=<var>n</var></samp> option sets some of these parameters
- as follows:
- </p>
- <dl compact="compact">
- <dt><code>max-inline-insns-single</code></dt>
- <dd><p>is set to <var>n</var>/2.
- </p></dd>
- <dt><code>max-inline-insns-auto</code></dt>
- <dd><p>is set to <var>n</var>/2.
- </p></dd>
- </dl>
-
- <p>See below for a documentation of the individual
- parameters controlling inlining and for the defaults of these parameters.
- </p>
- <p><em>Note:</em> there may be no value to <samp>-finline-limit</samp> that results
- in default behavior.
- </p>
- <p><em>Note:</em> pseudo instruction represents, in this particular context, an
- abstract measurement of function’s size. In no way does it represent a count
- of assembly instructions and as such its exact meaning might change from one
- release to an another.
- </p>
- </dd>
- <dt><code>-fno-keep-inline-dllexport</code></dt>
- <dd><a name="index-fno_002dkeep_002dinline_002ddllexport"></a>
- <a name="index-fkeep_002dinline_002ddllexport"></a>
- <p>This is a more fine-grained version of <samp>-fkeep-inline-functions</samp>,
- which applies only to functions that are declared using the <code>dllexport</code>
- attribute or declspec. See <a href="Function-Attributes.html#Function-Attributes">Declaring Attributes of
- Functions</a>.
- </p>
- </dd>
- <dt><code>-fkeep-inline-functions</code></dt>
- <dd><a name="index-fkeep_002dinline_002dfunctions"></a>
- <p>In C, emit <code>static</code> functions that are declared <code>inline</code>
- into the object file, even if the function has been inlined into all
- of its callers. This switch does not affect functions using the
- <code>extern inline</code> extension in GNU C90. In C++, emit any and all
- inline functions into the object file.
- </p>
- </dd>
- <dt><code>-fkeep-static-functions</code></dt>
- <dd><a name="index-fkeep_002dstatic_002dfunctions"></a>
- <p>Emit <code>static</code> functions into the object file, even if the function
- is never used.
- </p>
- </dd>
- <dt><code>-fkeep-static-consts</code></dt>
- <dd><a name="index-fkeep_002dstatic_002dconsts"></a>
- <p>Emit variables declared <code>static const</code> when optimization isn’t turned
- on, even if the variables aren’t referenced.
- </p>
- <p>GCC enables this option by default. If you want to force the compiler to
- check if a variable is referenced, regardless of whether or not
- optimization is turned on, use the <samp>-fno-keep-static-consts</samp> option.
- </p>
- </dd>
- <dt><code>-fmerge-constants</code></dt>
- <dd><a name="index-fmerge_002dconstants"></a>
- <p>Attempt to merge identical constants (string constants and floating-point
- constants) across compilation units.
- </p>
- <p>This option is the default for optimized compilation if the assembler and
- linker support it. Use <samp>-fno-merge-constants</samp> to inhibit this
- behavior.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fmerge-all-constants</code></dt>
- <dd><a name="index-fmerge_002dall_002dconstants"></a>
- <p>Attempt to merge identical constants and identical variables.
- </p>
- <p>This option implies <samp>-fmerge-constants</samp>. In addition to
- <samp>-fmerge-constants</samp> this considers e.g. even constant initialized
- arrays or initialized constant variables with integral or floating-point
- types. Languages like C or C++ require each variable, including multiple
- instances of the same variable in recursive calls, to have distinct locations,
- so using this option results in non-conforming
- behavior.
- </p>
- </dd>
- <dt><code>-fmodulo-sched</code></dt>
- <dd><a name="index-fmodulo_002dsched"></a>
- <p>Perform swing modulo scheduling immediately before the first scheduling
- pass. This pass looks at innermost loops and reorders their
- instructions by overlapping different iterations.
- </p>
- </dd>
- <dt><code>-fmodulo-sched-allow-regmoves</code></dt>
- <dd><a name="index-fmodulo_002dsched_002dallow_002dregmoves"></a>
- <p>Perform more aggressive SMS-based modulo scheduling with register moves
- allowed. By setting this flag certain anti-dependences edges are
- deleted, which triggers the generation of reg-moves based on the
- life-range analysis. This option is effective only with
- <samp>-fmodulo-sched</samp> enabled.
- </p>
- </dd>
- <dt><code>-fno-branch-count-reg</code></dt>
- <dd><a name="index-fno_002dbranch_002dcount_002dreg"></a>
- <a name="index-fbranch_002dcount_002dreg"></a>
- <p>Disable the optimization pass that scans for opportunities to use
- “decrement and branch” instructions on a count register instead of
- instruction sequences that decrement a register, compare it against zero, and
- then branch based upon the result. This option is only meaningful on
- architectures that support such instructions, which include x86, PowerPC,
- IA-64 and S/390. Note that the <samp>-fno-branch-count-reg</samp> option
- doesn’t remove the decrement and branch instructions from the generated
- instruction stream introduced by other optimization passes.
- </p>
- <p>The default is <samp>-fbranch-count-reg</samp> at <samp>-O1</samp> and higher,
- except for <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fno-function-cse</code></dt>
- <dd><a name="index-fno_002dfunction_002dcse"></a>
- <a name="index-ffunction_002dcse"></a>
- <p>Do not put function addresses in registers; make each instruction that
- calls a constant function contain the function’s address explicitly.
- </p>
- <p>This option results in less efficient code, but some strange hacks
- that alter the assembler output may be confused by the optimizations
- performed when this option is not used.
- </p>
- <p>The default is <samp>-ffunction-cse</samp>
- </p>
- </dd>
- <dt><code>-fno-zero-initialized-in-bss</code></dt>
- <dd><a name="index-fno_002dzero_002dinitialized_002din_002dbss"></a>
- <a name="index-fzero_002dinitialized_002din_002dbss"></a>
- <p>If the target supports a BSS section, GCC by default puts variables that
- are initialized to zero into BSS. This can save space in the resulting
- code.
- </p>
- <p>This option turns off this behavior because some programs explicitly
- rely on variables going to the data section—e.g., so that the
- resulting executable can find the beginning of that section and/or make
- assumptions based on that.
- </p>
- <p>The default is <samp>-fzero-initialized-in-bss</samp>.
- </p>
- </dd>
- <dt><code>-fthread-jumps</code></dt>
- <dd><a name="index-fthread_002djumps"></a>
- <p>Perform optimizations that check to see if a jump branches to a
- location where another comparison subsumed by the first is found. If
- so, the first branch is redirected to either the destination of the
- second branch or a point immediately following it, depending on whether
- the condition is known to be true or false.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fsplit-wide-types</code></dt>
- <dd><a name="index-fsplit_002dwide_002dtypes"></a>
- <p>When using a type that occupies multiple registers, such as <code>long
- long</code> on a 32-bit system, split the registers apart and allocate them
- independently. This normally generates better code for those types,
- but may make debugging more difficult.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>,
- <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fsplit-wide-types-early</code></dt>
- <dd><a name="index-fsplit_002dwide_002dtypes_002dearly"></a>
- <p>Fully split wide types early, instead of very late.
- This option has no effect unless <samp>-fsplit-wide-types</samp> is turned on.
- </p>
- <p>This is the default on some targets.
- </p>
- </dd>
- <dt><code>-fcse-follow-jumps</code></dt>
- <dd><a name="index-fcse_002dfollow_002djumps"></a>
- <p>In common subexpression elimination (CSE), scan through jump instructions
- when the target of the jump is not reached by any other path. For
- example, when CSE encounters an <code>if</code> statement with an
- <code>else</code> clause, CSE follows the jump when the condition
- tested is false.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fcse-skip-blocks</code></dt>
- <dd><a name="index-fcse_002dskip_002dblocks"></a>
- <p>This is similar to <samp>-fcse-follow-jumps</samp>, but causes CSE to
- follow jumps that conditionally skip over blocks. When CSE
- encounters a simple <code>if</code> statement with no else clause,
- <samp>-fcse-skip-blocks</samp> causes CSE to follow the jump around the
- body of the <code>if</code>.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-frerun-cse-after-loop</code></dt>
- <dd><a name="index-frerun_002dcse_002dafter_002dloop"></a>
- <p>Re-run common subexpression elimination after loop optimizations are
- performed.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fgcse</code></dt>
- <dd><a name="index-fgcse"></a>
- <p>Perform a global common subexpression elimination pass.
- This pass also performs global constant and copy propagation.
- </p>
- <p><em>Note:</em> When compiling a program using computed gotos, a GCC
- extension, you may get better run-time performance if you disable
- the global common subexpression elimination pass by adding
- <samp>-fno-gcse</samp> to the command line.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fgcse-lm</code></dt>
- <dd><a name="index-fgcse_002dlm"></a>
- <p>When <samp>-fgcse-lm</samp> is enabled, global common subexpression elimination
- attempts to move loads that are only killed by stores into themselves. This
- allows a loop containing a load/store sequence to be changed to a load outside
- the loop, and a copy/store within the loop.
- </p>
- <p>Enabled by default when <samp>-fgcse</samp> is enabled.
- </p>
- </dd>
- <dt><code>-fgcse-sm</code></dt>
- <dd><a name="index-fgcse_002dsm"></a>
- <p>When <samp>-fgcse-sm</samp> is enabled, a store motion pass is run after
- global common subexpression elimination. This pass attempts to move
- stores out of loops. When used in conjunction with <samp>-fgcse-lm</samp>,
- loops containing a load/store sequence can be changed to a load before
- the loop and a store after the loop.
- </p>
- <p>Not enabled at any optimization level.
- </p>
- </dd>
- <dt><code>-fgcse-las</code></dt>
- <dd><a name="index-fgcse_002dlas"></a>
- <p>When <samp>-fgcse-las</samp> is enabled, the global common subexpression
- elimination pass eliminates redundant loads that come after stores to the
- same memory location (both partial and full redundancies).
- </p>
- <p>Not enabled at any optimization level.
- </p>
- </dd>
- <dt><code>-fgcse-after-reload</code></dt>
- <dd><a name="index-fgcse_002dafter_002dreload"></a>
- <p>When <samp>-fgcse-after-reload</samp> is enabled, a redundant load elimination
- pass is performed after reload. The purpose of this pass is to clean up
- redundant spilling.
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-faggressive-loop-optimizations</code></dt>
- <dd><a name="index-faggressive_002dloop_002doptimizations"></a>
- <p>This option tells the loop optimizer to use language constraints to
- derive bounds for the number of iterations of a loop. This assumes that
- loop code does not invoke undefined behavior by for example causing signed
- integer overflows or out-of-bound array accesses. The bounds for the
- number of iterations of a loop are used to guide loop unrolling and peeling
- and loop exit test optimizations.
- This option is enabled by default.
- </p>
- </dd>
- <dt><code>-funconstrained-commons</code></dt>
- <dd><a name="index-funconstrained_002dcommons"></a>
- <p>This option tells the compiler that variables declared in common blocks
- (e.g. Fortran) may later be overridden with longer trailing arrays. This
- prevents certain optimizations that depend on knowing the array bounds.
- </p>
- </dd>
- <dt><code>-fcrossjumping</code></dt>
- <dd><a name="index-fcrossjumping"></a>
- <p>Perform cross-jumping transformation.
- This transformation unifies equivalent code and saves code size. The
- resulting code may or may not perform better than without cross-jumping.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fauto-inc-dec</code></dt>
- <dd><a name="index-fauto_002dinc_002ddec"></a>
- <p>Combine increments or decrements of addresses with memory accesses.
- This pass is always skipped on architectures that do not have
- instructions to support this. Enabled by default at <samp>-O</samp> and
- higher on architectures that support this.
- </p>
- </dd>
- <dt><code>-fdce</code></dt>
- <dd><a name="index-fdce"></a>
- <p>Perform dead code elimination (DCE) on RTL.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fdse</code></dt>
- <dd><a name="index-fdse"></a>
- <p>Perform dead store elimination (DSE) on RTL.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fif-conversion</code></dt>
- <dd><a name="index-fif_002dconversion"></a>
- <p>Attempt to transform conditional jumps into branch-less equivalents. This
- includes use of conditional moves, min, max, set flags and abs instructions, and
- some tricks doable by standard arithmetics. The use of conditional execution
- on chips where it is available is controlled by <samp>-fif-conversion2</samp>.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, but
- not with <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fif-conversion2</code></dt>
- <dd><a name="index-fif_002dconversion2"></a>
- <p>Use conditional execution (where available) to transform conditional jumps into
- branch-less equivalents.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, but
- not with <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fdeclone-ctor-dtor</code></dt>
- <dd><a name="index-fdeclone_002dctor_002ddtor"></a>
- <p>The C++ ABI requires multiple entry points for constructors and
- destructors: one for a base subobject, one for a complete object, and
- one for a virtual destructor that calls operator delete afterwards.
- For a hierarchy with virtual bases, the base and complete variants are
- clones, which means two copies of the function. With this option, the
- base and complete variants are changed to be thunks that call a common
- implementation.
- </p>
- <p>Enabled by <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fdelete-null-pointer-checks</code></dt>
- <dd><a name="index-fdelete_002dnull_002dpointer_002dchecks"></a>
- <p>Assume that programs cannot safely dereference null pointers, and that
- no code or data element resides at address zero.
- This option enables simple constant
- folding optimizations at all optimization levels. In addition, other
- optimization passes in GCC use this flag to control global dataflow
- analyses that eliminate useless checks for null pointers; these assume
- that a memory access to address zero always results in a trap, so
- that if a pointer is checked after it has already been dereferenced,
- it cannot be null.
- </p>
- <p>Note however that in some environments this assumption is not true.
- Use <samp>-fno-delete-null-pointer-checks</samp> to disable this optimization
- for programs that depend on that behavior.
- </p>
- <p>This option is enabled by default on most targets. On Nios II ELF, it
- defaults to off. On AVR, CR16, and MSP430, this option is completely disabled.
- </p>
- <p>Passes that use the dataflow information
- are enabled independently at different optimization levels.
- </p>
- </dd>
- <dt><code>-fdevirtualize</code></dt>
- <dd><a name="index-fdevirtualize"></a>
- <p>Attempt to convert calls to virtual functions to direct calls. This
- is done both within a procedure and interprocedurally as part of
- indirect inlining (<samp>-findirect-inlining</samp>) and interprocedural constant
- propagation (<samp>-fipa-cp</samp>).
- Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fdevirtualize-speculatively</code></dt>
- <dd><a name="index-fdevirtualize_002dspeculatively"></a>
- <p>Attempt to convert calls to virtual functions to speculative direct calls.
- Based on the analysis of the type inheritance graph, determine for a given call
- the set of likely targets. If the set is small, preferably of size 1, change
- the call into a conditional deciding between direct and indirect calls. The
- speculative calls enable more optimizations, such as inlining. When they seem
- useless after further optimization, they are converted back into original form.
- </p>
- </dd>
- <dt><code>-fdevirtualize-at-ltrans</code></dt>
- <dd><a name="index-fdevirtualize_002dat_002dltrans"></a>
- <p>Stream extra information needed for aggressive devirtualization when running
- the link-time optimizer in local transformation mode.
- This option enables more devirtualization but
- significantly increases the size of streamed data. For this reason it is
- disabled by default.
- </p>
- </dd>
- <dt><code>-fexpensive-optimizations</code></dt>
- <dd><a name="index-fexpensive_002doptimizations"></a>
- <p>Perform a number of minor optimizations that are relatively expensive.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-free</code></dt>
- <dd><a name="index-free-1"></a>
- <p>Attempt to remove redundant extension instructions. This is especially
- helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
- registers after writing to their lower 32-bit half.
- </p>
- <p>Enabled for Alpha, AArch64 and x86 at levels <samp>-O2</samp>,
- <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fno-lifetime-dse</code></dt>
- <dd><a name="index-fno_002dlifetime_002ddse"></a>
- <a name="index-flifetime_002ddse"></a>
- <p>In C++ the value of an object is only affected by changes within its
- lifetime: when the constructor begins, the object has an indeterminate
- value, and any changes during the lifetime of the object are dead when
- the object is destroyed. Normally dead store elimination will take
- advantage of this; if your code relies on the value of the object
- storage persisting beyond the lifetime of the object, you can use this
- flag to disable this optimization. To preserve stores before the
- constructor starts (e.g. because your operator new clears the object
- storage) but still treat the object as dead after the destructor, you
- can use <samp>-flifetime-dse=1</samp>. The default behavior can be
- explicitly selected with <samp>-flifetime-dse=2</samp>.
- <samp>-flifetime-dse=0</samp> is equivalent to <samp>-fno-lifetime-dse</samp>.
- </p>
- </dd>
- <dt><code>-flive-range-shrinkage</code></dt>
- <dd><a name="index-flive_002drange_002dshrinkage"></a>
- <p>Attempt to decrease register pressure through register live range
- shrinkage. This is helpful for fast processors with small or moderate
- size register sets.
- </p>
- </dd>
- <dt><code>-fira-algorithm=<var>algorithm</var></code></dt>
- <dd><a name="index-fira_002dalgorithm"></a>
- <p>Use the specified coloring algorithm for the integrated register
- allocator. The <var>algorithm</var> argument can be ‘<samp>priority</samp>’, which
- specifies Chow’s priority coloring, or ‘<samp>CB</samp>’, which specifies
- Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented
- for all architectures, but for those targets that do support it, it is
- the default because it generates better code.
- </p>
- </dd>
- <dt><code>-fira-region=<var>region</var></code></dt>
- <dd><a name="index-fira_002dregion"></a>
- <p>Use specified regions for the integrated register allocator. The
- <var>region</var> argument should be one of the following:
- </p>
- <dl compact="compact">
- <dt>‘<samp>all</samp>’</dt>
- <dd><p>Use all loops as register allocation regions.
- This can give the best results for machines with a small and/or
- irregular register set.
- </p>
- </dd>
- <dt>‘<samp>mixed</samp>’</dt>
- <dd><p>Use all loops except for loops with small register pressure
- as the regions. This value usually gives
- the best results in most cases and for most architectures,
- and is enabled by default when compiling with optimization for speed
- (<samp>-O</samp>, <samp>-O2</samp>, …).
- </p>
- </dd>
- <dt>‘<samp>one</samp>’</dt>
- <dd><p>Use all functions as a single region.
- This typically results in the smallest code size, and is enabled by default for
- <samp>-Os</samp> or <samp>-O0</samp>.
- </p>
- </dd>
- </dl>
-
- </dd>
- <dt><code>-fira-hoist-pressure</code></dt>
- <dd><a name="index-fira_002dhoist_002dpressure"></a>
- <p>Use IRA to evaluate register pressure in the code hoisting pass for
- decisions to hoist expressions. This option usually results in smaller
- code, but it can slow the compiler down.
- </p>
- <p>This option is enabled at level <samp>-Os</samp> for all targets.
- </p>
- </dd>
- <dt><code>-fira-loop-pressure</code></dt>
- <dd><a name="index-fira_002dloop_002dpressure"></a>
- <p>Use IRA to evaluate register pressure in loops for decisions to move
- loop invariants. This option usually results in generation
- of faster and smaller code on machines with large register files (>= 32
- registers), but it can slow the compiler down.
- </p>
- <p>This option is enabled at level <samp>-O3</samp> for some targets.
- </p>
- </dd>
- <dt><code>-fno-ira-share-save-slots</code></dt>
- <dd><a name="index-fno_002dira_002dshare_002dsave_002dslots"></a>
- <a name="index-fira_002dshare_002dsave_002dslots"></a>
- <p>Disable sharing of stack slots used for saving call-used hard
- registers living through a call. Each hard register gets a
- separate stack slot, and as a result function stack frames are
- larger.
- </p>
- </dd>
- <dt><code>-fno-ira-share-spill-slots</code></dt>
- <dd><a name="index-fno_002dira_002dshare_002dspill_002dslots"></a>
- <a name="index-fira_002dshare_002dspill_002dslots"></a>
- <p>Disable sharing of stack slots allocated for pseudo-registers. Each
- pseudo-register that does not get a hard register gets a separate
- stack slot, and as a result function stack frames are larger.
- </p>
- </dd>
- <dt><code>-flra-remat</code></dt>
- <dd><a name="index-flra_002dremat"></a>
- <p>Enable CFG-sensitive rematerialization in LRA. Instead of loading
- values of spilled pseudos, LRA tries to rematerialize (recalculate)
- values if it is profitable.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fdelayed-branch</code></dt>
- <dd><a name="index-fdelayed_002dbranch"></a>
- <p>If supported for the target machine, attempt to reorder instructions
- to exploit instruction slots available after delayed branch
- instructions.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>,
- but not at <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fschedule-insns</code></dt>
- <dd><a name="index-fschedule_002dinsns"></a>
- <p>If supported for the target machine, attempt to reorder instructions to
- eliminate execution stalls due to required data being unavailable. This
- helps machines that have slow floating point or memory load instructions
- by allowing other instructions to be issued until the result of the load
- or floating-point instruction is required.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-fschedule-insns2</code></dt>
- <dd><a name="index-fschedule_002dinsns2"></a>
- <p>Similar to <samp>-fschedule-insns</samp>, but requests an additional pass of
- instruction scheduling after register allocation has been done. This is
- especially useful on machines with a relatively small number of
- registers and where memory load instructions take more than one cycle.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fno-sched-interblock</code></dt>
- <dd><a name="index-fno_002dsched_002dinterblock"></a>
- <a name="index-fsched_002dinterblock"></a>
- <p>Disable instruction scheduling across basic blocks, which
- is normally enabled when scheduling before register allocation, i.e.
- with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fno-sched-spec</code></dt>
- <dd><a name="index-fno_002dsched_002dspec"></a>
- <a name="index-fsched_002dspec"></a>
- <p>Disable speculative motion of non-load instructions, which
- is normally enabled when scheduling before register allocation, i.e.
- with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-pressure</code></dt>
- <dd><a name="index-fsched_002dpressure"></a>
- <p>Enable register pressure sensitive insn scheduling before register
- allocation. This only makes sense when scheduling before register
- allocation is enabled, i.e. with <samp>-fschedule-insns</samp> or at
- <samp>-O2</samp> or higher. Usage of this option can improve the
- generated code and decrease its size by preventing register pressure
- increase above the number of available hard registers and subsequent
- spills in register allocation.
- </p>
- </dd>
- <dt><code>-fsched-spec-load</code></dt>
- <dd><a name="index-fsched_002dspec_002dload"></a>
- <p>Allow speculative motion of some load instructions. This only makes
- sense when scheduling before register allocation, i.e. with
- <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-spec-load-dangerous</code></dt>
- <dd><a name="index-fsched_002dspec_002dload_002ddangerous"></a>
- <p>Allow speculative motion of more load instructions. This only makes
- sense when scheduling before register allocation, i.e. with
- <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-stalled-insns</code></dt>
- <dt><code>-fsched-stalled-insns=<var>n</var></code></dt>
- <dd><a name="index-fsched_002dstalled_002dinsns"></a>
- <p>Define how many insns (if any) can be moved prematurely from the queue
- of stalled insns into the ready list during the second scheduling pass.
- <samp>-fno-sched-stalled-insns</samp> means that no insns are moved
- prematurely, <samp>-fsched-stalled-insns=0</samp> means there is no limit
- on how many queued insns can be moved prematurely.
- <samp>-fsched-stalled-insns</samp> without a value is equivalent to
- <samp>-fsched-stalled-insns=1</samp>.
- </p>
- </dd>
- <dt><code>-fsched-stalled-insns-dep</code></dt>
- <dt><code>-fsched-stalled-insns-dep=<var>n</var></code></dt>
- <dd><a name="index-fsched_002dstalled_002dinsns_002ddep"></a>
- <p>Define how many insn groups (cycles) are examined for a dependency
- on a stalled insn that is a candidate for premature removal from the queue
- of stalled insns. This has an effect only during the second scheduling pass,
- and only if <samp>-fsched-stalled-insns</samp> is used.
- <samp>-fno-sched-stalled-insns-dep</samp> is equivalent to
- <samp>-fsched-stalled-insns-dep=0</samp>.
- <samp>-fsched-stalled-insns-dep</samp> without a value is equivalent to
- <samp>-fsched-stalled-insns-dep=1</samp>.
- </p>
- </dd>
- <dt><code>-fsched2-use-superblocks</code></dt>
- <dd><a name="index-fsched2_002duse_002dsuperblocks"></a>
- <p>When scheduling after register allocation, use superblock scheduling.
- This allows motion across basic block boundaries,
- resulting in faster schedules. This option is experimental, as not all machine
- descriptions used by GCC model the CPU closely enough to avoid unreliable
- results from the algorithm.
- </p>
- <p>This only makes sense when scheduling after register allocation, i.e. with
- <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-group-heuristic</code></dt>
- <dd><a name="index-fsched_002dgroup_002dheuristic"></a>
- <p>Enable the group heuristic in the scheduler. This heuristic favors
- the instruction that belongs to a schedule group. This is enabled
- by default when scheduling is enabled, i.e. with <samp>-fschedule-insns</samp>
- or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-critical-path-heuristic</code></dt>
- <dd><a name="index-fsched_002dcritical_002dpath_002dheuristic"></a>
- <p>Enable the critical-path heuristic in the scheduler. This heuristic favors
- instructions on the critical path. This is enabled by default when
- scheduling is enabled, i.e. with <samp>-fschedule-insns</samp>
- or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-spec-insn-heuristic</code></dt>
- <dd><a name="index-fsched_002dspec_002dinsn_002dheuristic"></a>
- <p>Enable the speculative instruction heuristic in the scheduler. This
- heuristic favors speculative instructions with greater dependency weakness.
- This is enabled by default when scheduling is enabled, i.e.
- with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp>
- or at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-rank-heuristic</code></dt>
- <dd><a name="index-fsched_002drank_002dheuristic"></a>
- <p>Enable the rank heuristic in the scheduler. This heuristic favors
- the instruction belonging to a basic block with greater size or frequency.
- This is enabled by default when scheduling is enabled, i.e.
- with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
- at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-last-insn-heuristic</code></dt>
- <dd><a name="index-fsched_002dlast_002dinsn_002dheuristic"></a>
- <p>Enable the last-instruction heuristic in the scheduler. This heuristic
- favors the instruction that is less dependent on the last instruction
- scheduled. This is enabled by default when scheduling is enabled,
- i.e. with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
- at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-fsched-dep-count-heuristic</code></dt>
- <dd><a name="index-fsched_002ddep_002dcount_002dheuristic"></a>
- <p>Enable the dependent-count heuristic in the scheduler. This heuristic
- favors the instruction that has more instructions depending on it.
- This is enabled by default when scheduling is enabled, i.e.
- with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
- at <samp>-O2</samp> or higher.
- </p>
- </dd>
- <dt><code>-freschedule-modulo-scheduled-loops</code></dt>
- <dd><a name="index-freschedule_002dmodulo_002dscheduled_002dloops"></a>
- <p>Modulo scheduling is performed before traditional scheduling. If a loop
- is modulo scheduled, later scheduling passes may change its schedule.
- Use this option to control that behavior.
- </p>
- </dd>
- <dt><code>-fselective-scheduling</code></dt>
- <dd><a name="index-fselective_002dscheduling"></a>
- <p>Schedule instructions using selective scheduling algorithm. Selective
- scheduling runs instead of the first scheduler pass.
- </p>
- </dd>
- <dt><code>-fselective-scheduling2</code></dt>
- <dd><a name="index-fselective_002dscheduling2"></a>
- <p>Schedule instructions using selective scheduling algorithm. Selective
- scheduling runs instead of the second scheduler pass.
- </p>
- </dd>
- <dt><code>-fsel-sched-pipelining</code></dt>
- <dd><a name="index-fsel_002dsched_002dpipelining"></a>
- <p>Enable software pipelining of innermost loops during selective scheduling.
- This option has no effect unless one of <samp>-fselective-scheduling</samp> or
- <samp>-fselective-scheduling2</samp> is turned on.
- </p>
- </dd>
- <dt><code>-fsel-sched-pipelining-outer-loops</code></dt>
- <dd><a name="index-fsel_002dsched_002dpipelining_002douter_002dloops"></a>
- <p>When pipelining loops during selective scheduling, also pipeline outer loops.
- This option has no effect unless <samp>-fsel-sched-pipelining</samp> is turned on.
- </p>
- </dd>
- <dt><code>-fsemantic-interposition</code></dt>
- <dd><a name="index-fsemantic_002dinterposition"></a>
- <p>Some object formats, like ELF, allow interposing of symbols by the
- dynamic linker.
- This means that for symbols exported from the DSO, the compiler cannot perform
- interprocedural propagation, inlining and other optimizations in anticipation
- that the function or variable in question may change. While this feature is
- useful, for example, to rewrite memory allocation functions by a debugging
- implementation, it is expensive in the terms of code quality.
- With <samp>-fno-semantic-interposition</samp> the compiler assumes that
- if interposition happens for functions the overwriting function will have
- precisely the same semantics (and side effects).
- Similarly if interposition happens
- for variables, the constructor of the variable will be the same. The flag
- has no effect for functions explicitly declared inline
- (where it is never allowed for interposition to change semantics)
- and for symbols explicitly declared weak.
- </p>
- </dd>
- <dt><code>-fshrink-wrap</code></dt>
- <dd><a name="index-fshrink_002dwrap"></a>
- <p>Emit function prologues only before parts of the function that need it,
- rather than at the top of the function. This flag is enabled by default at
- <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fshrink-wrap-separate</code></dt>
- <dd><a name="index-fshrink_002dwrap_002dseparate"></a>
- <p>Shrink-wrap separate parts of the prologue and epilogue separately, so that
- those parts are only executed when needed.
- This option is on by default, but has no effect unless <samp>-fshrink-wrap</samp>
- is also turned on and the target supports this.
- </p>
- </dd>
- <dt><code>-fcaller-saves</code></dt>
- <dd><a name="index-fcaller_002dsaves"></a>
- <p>Enable allocation of values to registers that are clobbered by
- function calls, by emitting extra instructions to save and restore the
- registers around such calls. Such allocation is done only when it
- seems to result in better code.
- </p>
- <p>This option is always enabled by default on certain machines, usually
- those which have no call-preserved registers to use instead.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fcombine-stack-adjustments</code></dt>
- <dd><a name="index-fcombine_002dstack_002dadjustments"></a>
- <p>Tracks stack adjustments (pushes and pops) and stack memory references
- and then tries to find ways to combine them.
- </p>
- <p>Enabled by default at <samp>-O1</samp> and higher.
- </p>
- </dd>
- <dt><code>-fipa-ra</code></dt>
- <dd><a name="index-fipa_002dra"></a>
- <p>Use caller save registers for allocation if those registers are not used by
- any called function. In that case it is not necessary to save and restore
- them around calls. This is only possible if called functions are part of
- same compilation unit as current function and they are compiled before it.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, however the option
- is disabled if generated code will be instrumented for profiling
- (<samp>-p</samp>, or <samp>-pg</samp>) or if callee’s register usage cannot be known
- exactly (this happens on targets that do not expose prologues
- and epilogues in RTL).
- </p>
- </dd>
- <dt><code>-fconserve-stack</code></dt>
- <dd><a name="index-fconserve_002dstack"></a>
- <p>Attempt to minimize stack usage. The compiler attempts to use less
- stack space, even if that makes the program slower. This option
- implies setting the <samp>large-stack-frame</samp> parameter to 100
- and the <samp>large-stack-frame-growth</samp> parameter to 400.
- </p>
- </dd>
- <dt><code>-ftree-reassoc</code></dt>
- <dd><a name="index-ftree_002dreassoc"></a>
- <p>Perform reassociation on trees. This flag is enabled by default
- at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fcode-hoisting</code></dt>
- <dd><a name="index-fcode_002dhoisting"></a>
- <p>Perform code hoisting. Code hoisting tries to move the
- evaluation of expressions executed on all paths to the function exit
- as early as possible. This is especially useful as a code size
- optimization, but it often helps for code speed as well.
- This flag is enabled by default at <samp>-O2</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-pre</code></dt>
- <dd><a name="index-ftree_002dpre"></a>
- <p>Perform partial redundancy elimination (PRE) on trees. This flag is
- enabled by default at <samp>-O2</samp> and <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-ftree-partial-pre</code></dt>
- <dd><a name="index-ftree_002dpartial_002dpre"></a>
- <p>Make partial redundancy elimination (PRE) more aggressive. This flag is
- enabled by default at <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-ftree-forwprop</code></dt>
- <dd><a name="index-ftree_002dforwprop"></a>
- <p>Perform forward propagation on trees. This flag is enabled by default
- at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-fre</code></dt>
- <dd><a name="index-ftree_002dfre"></a>
- <p>Perform full redundancy elimination (FRE) on trees. The difference
- between FRE and PRE is that FRE only considers expressions
- that are computed on all paths leading to the redundant computation.
- This analysis is faster than PRE, though it exposes fewer redundancies.
- This flag is enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-phiprop</code></dt>
- <dd><a name="index-ftree_002dphiprop"></a>
- <p>Perform hoisting of loads from conditional pointers on trees. This
- pass is enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fhoist-adjacent-loads</code></dt>
- <dd><a name="index-fhoist_002dadjacent_002dloads"></a>
- <p>Speculatively hoist loads from both branches of an if-then-else if the
- loads are from adjacent locations in the same structure and the target
- architecture has a conditional move instruction. This flag is enabled
- by default at <samp>-O2</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-copy-prop</code></dt>
- <dd><a name="index-ftree_002dcopy_002dprop"></a>
- <p>Perform copy propagation on trees. This pass eliminates unnecessary
- copy operations. This flag is enabled by default at <samp>-O</samp> and
- higher.
- </p>
- </dd>
- <dt><code>-fipa-pure-const</code></dt>
- <dd><a name="index-fipa_002dpure_002dconst"></a>
- <p>Discover which functions are pure or constant.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fipa-reference</code></dt>
- <dd><a name="index-fipa_002dreference"></a>
- <p>Discover which static variables do not escape the
- compilation unit.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fipa-reference-addressable</code></dt>
- <dd><a name="index-fipa_002dreference_002daddressable"></a>
- <p>Discover read-only, write-only and non-addressable static variables.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fipa-stack-alignment</code></dt>
- <dd><a name="index-fipa_002dstack_002dalignment"></a>
- <p>Reduce stack alignment on call sites if possible.
- Enabled by default.
- </p>
- </dd>
- <dt><code>-fipa-pta</code></dt>
- <dd><a name="index-fipa_002dpta"></a>
- <p>Perform interprocedural pointer analysis and interprocedural modification
- and reference analysis. This option can cause excessive memory and
- compile-time usage on large compilation units. It is not enabled by
- default at any optimization level.
- </p>
- </dd>
- <dt><code>-fipa-profile</code></dt>
- <dd><a name="index-fipa_002dprofile"></a>
- <p>Perform interprocedural profile propagation. The functions called only from
- cold functions are marked as cold. Also functions executed once (such as
- <code>cold</code>, <code>noreturn</code>, static constructors or destructors) are identified. Cold
- functions and loop less parts of functions executed once are then optimized for
- size.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fipa-cp</code></dt>
- <dd><a name="index-fipa_002dcp"></a>
- <p>Perform interprocedural constant propagation.
- This optimization analyzes the program to determine when values passed
- to functions are constants and then optimizes accordingly.
- This optimization can substantially increase performance
- if the application has constants passed to functions.
- This flag is enabled by default at <samp>-O2</samp>, <samp>-Os</samp> and <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fipa-cp-clone</code></dt>
- <dd><a name="index-fipa_002dcp_002dclone"></a>
- <p>Perform function cloning to make interprocedural constant propagation stronger.
- When enabled, interprocedural constant propagation performs function cloning
- when externally visible function can be called with constant arguments.
- Because this optimization can create multiple copies of functions,
- it may significantly increase code size
- (see <samp>--param ipa-cp-unit-growth=<var>value</var></samp>).
- This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fipa-bit-cp</code></dt>
- <dd><a name="index-fipa_002dbit_002dcp"></a>
- <p>When enabled, perform interprocedural bitwise constant
- propagation. This flag is enabled by default at <samp>-O2</samp> and
- by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- It requires that <samp>-fipa-cp</samp> is enabled.
- </p>
- </dd>
- <dt><code>-fipa-vrp</code></dt>
- <dd><a name="index-fipa_002dvrp"></a>
- <p>When enabled, perform interprocedural propagation of value
- ranges. This flag is enabled by default at <samp>-O2</samp>. It requires
- that <samp>-fipa-cp</samp> is enabled.
- </p>
- </dd>
- <dt><code>-fipa-icf</code></dt>
- <dd><a name="index-fipa_002dicf"></a>
- <p>Perform Identical Code Folding for functions and read-only variables.
- The optimization reduces code size and may disturb unwind stacks by replacing
- a function by equivalent one with a different name. The optimization works
- more effectively with link-time optimization enabled.
- </p>
- <p>Although the behavior is similar to the Gold Linker’s ICF optimization, GCC ICF
- works on different levels and thus the optimizations are not same - there are
- equivalences that are found only by GCC and equivalences found only by Gold.
- </p>
- <p>This flag is enabled by default at <samp>-O2</samp> and <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-flive-patching=<var>level</var></code></dt>
- <dd><a name="index-flive_002dpatching"></a>
- <p>Control GCC’s optimizations to produce output suitable for live-patching.
- </p>
- <p>If the compiler’s optimization uses a function’s body or information extracted
- from its body to optimize/change another function, the latter is called an
- impacted function of the former. If a function is patched, its impacted
- functions should be patched too.
- </p>
- <p>The impacted functions are determined by the compiler’s interprocedural
- optimizations. For example, a caller is impacted when inlining a function
- into its caller,
- cloning a function and changing its caller to call this new clone,
- or extracting a function’s pureness/constness information to optimize
- its direct or indirect callers, etc.
- </p>
- <p>Usually, the more IPA optimizations enabled, the larger the number of
- impacted functions for each function. In order to control the number of
- impacted functions and more easily compute the list of impacted function,
- IPA optimizations can be partially enabled at two different levels.
- </p>
- <p>The <var>level</var> argument should be one of the following:
- </p>
- <dl compact="compact">
- <dt>‘<samp>inline-clone</samp>’</dt>
- <dd>
- <p>Only enable inlining and cloning optimizations, which includes inlining,
- cloning, interprocedural scalar replacement of aggregates and partial inlining.
- As a result, when patching a function, all its callers and its clones’
- callers are impacted, therefore need to be patched as well.
- </p>
- <p><samp>-flive-patching=inline-clone</samp> disables the following optimization flags:
- </p><div class="smallexample">
- <pre class="smallexample">-fwhole-program -fipa-pta -fipa-reference -fipa-ra
- -fipa-icf -fipa-icf-functions -fipa-icf-variables
- -fipa-bit-cp -fipa-vrp -fipa-pure-const -fipa-reference-addressable
- -fipa-stack-alignment
- </pre></div>
-
- </dd>
- <dt>‘<samp>inline-only-static</samp>’</dt>
- <dd>
- <p>Only enable inlining of static functions.
- As a result, when patching a static function, all its callers are impacted
- and so need to be patched as well.
- </p>
- <p>In addition to all the flags that <samp>-flive-patching=inline-clone</samp>
- disables,
- <samp>-flive-patching=inline-only-static</samp> disables the following additional
- optimization flags:
- </p><div class="smallexample">
- <pre class="smallexample">-fipa-cp-clone -fipa-sra -fpartial-inlining -fipa-cp
- </pre></div>
-
- </dd>
- </dl>
-
- <p>When <samp>-flive-patching</samp> is specified without any value, the default value
- is <var>inline-clone</var>.
- </p>
- <p>This flag is disabled by default.
- </p>
- <p>Note that <samp>-flive-patching</samp> is not supported with link-time optimization
- (<samp>-flto</samp>).
- </p>
- </dd>
- <dt><code>-fisolate-erroneous-paths-dereference</code></dt>
- <dd><a name="index-fisolate_002derroneous_002dpaths_002ddereference"></a>
- <p>Detect paths that trigger erroneous or undefined behavior due to
- dereferencing a null pointer. Isolate those paths from the main control
- flow and turn the statement with erroneous or undefined behavior into a trap.
- This flag is enabled by default at <samp>-O2</samp> and higher and depends on
- <samp>-fdelete-null-pointer-checks</samp> also being enabled.
- </p>
- </dd>
- <dt><code>-fisolate-erroneous-paths-attribute</code></dt>
- <dd><a name="index-fisolate_002derroneous_002dpaths_002dattribute"></a>
- <p>Detect paths that trigger erroneous or undefined behavior due to a null value
- being used in a way forbidden by a <code>returns_nonnull</code> or <code>nonnull</code>
- attribute. Isolate those paths from the main control flow and turn the
- statement with erroneous or undefined behavior into a trap. This is not
- currently enabled, but may be enabled by <samp>-O2</samp> in the future.
- </p>
- </dd>
- <dt><code>-ftree-sink</code></dt>
- <dd><a name="index-ftree_002dsink"></a>
- <p>Perform forward store motion on trees. This flag is
- enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-bit-ccp</code></dt>
- <dd><a name="index-ftree_002dbit_002dccp"></a>
- <p>Perform sparse conditional bit constant propagation on trees and propagate
- pointer alignment information.
- This pass only operates on local scalar variables and is enabled by default
- at <samp>-O1</samp> and higher, except for <samp>-Og</samp>.
- It requires that <samp>-ftree-ccp</samp> is enabled.
- </p>
- </dd>
- <dt><code>-ftree-ccp</code></dt>
- <dd><a name="index-ftree_002dccp"></a>
- <p>Perform sparse conditional constant propagation (CCP) on trees. This
- pass only operates on local scalar variables and is enabled by default
- at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fssa-backprop</code></dt>
- <dd><a name="index-fssa_002dbackprop"></a>
- <p>Propagate information about uses of a value up the definition chain
- in order to simplify the definitions. For example, this pass strips
- sign operations if the sign of a value never matters. The flag is
- enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fssa-phiopt</code></dt>
- <dd><a name="index-fssa_002dphiopt"></a>
- <p>Perform pattern matching on SSA PHI nodes to optimize conditional
- code. This pass is enabled by default at <samp>-O1</samp> and higher,
- except for <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-ftree-switch-conversion</code></dt>
- <dd><a name="index-ftree_002dswitch_002dconversion"></a>
- <p>Perform conversion of simple initializations in a switch to
- initializations from a scalar array. This flag is enabled by default
- at <samp>-O2</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-tail-merge</code></dt>
- <dd><a name="index-ftree_002dtail_002dmerge"></a>
- <p>Look for identical code sequences. When found, replace one with a jump to the
- other. This optimization is known as tail merging or cross jumping. This flag
- is enabled by default at <samp>-O2</samp> and higher. The compilation time
- in this pass can
- be limited using <samp>max-tail-merge-comparisons</samp> parameter and
- <samp>max-tail-merge-iterations</samp> parameter.
- </p>
- </dd>
- <dt><code>-ftree-dce</code></dt>
- <dd><a name="index-ftree_002ddce"></a>
- <p>Perform dead code elimination (DCE) on trees. This flag is enabled by
- default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-builtin-call-dce</code></dt>
- <dd><a name="index-ftree_002dbuiltin_002dcall_002ddce"></a>
- <p>Perform conditional dead code elimination (DCE) for calls to built-in functions
- that may set <code>errno</code> but are otherwise free of side effects. This flag is
- enabled by default at <samp>-O2</samp> and higher if <samp>-Os</samp> is not also
- specified.
- </p>
- </dd>
- <dt><code>-ffinite-loops</code></dt>
- <dd><a name="index-ffinite_002dloops"></a>
- <a name="index-fno_002dfinite_002dloops"></a>
- <p>Assume that a loop with an exit will eventually take the exit and not loop
- indefinitely. This allows the compiler to remove loops that otherwise have
- no side-effects, not considering eventual endless looping as such.
- </p>
- <p>This option is enabled by default at <samp>-O2</samp> for C++ with -std=c++11
- or higher.
- </p>
- </dd>
- <dt><code>-ftree-dominator-opts</code></dt>
- <dd><a name="index-ftree_002ddominator_002dopts"></a>
- <p>Perform a variety of simple scalar cleanups (constant/copy
- propagation, redundancy elimination, range propagation and expression
- simplification) based on a dominator tree traversal. This also
- performs jump threading (to reduce jumps to jumps). This flag is
- enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-dse</code></dt>
- <dd><a name="index-ftree_002ddse"></a>
- <p>Perform dead store elimination (DSE) on trees. A dead store is a store into
- a memory location that is later overwritten by another store without
- any intervening loads. In this case the earlier store can be deleted. This
- flag is enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-ch</code></dt>
- <dd><a name="index-ftree_002dch"></a>
- <p>Perform loop header copying on trees. This is beneficial since it increases
- effectiveness of code motion optimizations. It also saves one jump. This flag
- is enabled by default at <samp>-O</samp> and higher. It is not enabled
- for <samp>-Os</samp>, since it usually increases code size.
- </p>
- </dd>
- <dt><code>-ftree-loop-optimize</code></dt>
- <dd><a name="index-ftree_002dloop_002doptimize"></a>
- <p>Perform loop optimizations on trees. This flag is enabled by default
- at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-loop-linear</code></dt>
- <dt><code>-floop-strip-mine</code></dt>
- <dt><code>-floop-block</code></dt>
- <dd><a name="index-ftree_002dloop_002dlinear"></a>
- <a name="index-floop_002dstrip_002dmine"></a>
- <a name="index-floop_002dblock"></a>
- <p>Perform loop nest optimizations. Same as
- <samp>-floop-nest-optimize</samp>. To use this code transformation, GCC has
- to be configured with <samp>--with-isl</samp> to enable the Graphite loop
- transformation infrastructure.
- </p>
- </dd>
- <dt><code>-fgraphite-identity</code></dt>
- <dd><a name="index-fgraphite_002didentity"></a>
- <p>Enable the identity transformation for graphite. For every SCoP we generate
- the polyhedral representation and transform it back to gimple. Using
- <samp>-fgraphite-identity</samp> we can check the costs or benefits of the
- GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations
- are also performed by the code generator isl, like index splitting and
- dead code elimination in loops.
- </p>
- </dd>
- <dt><code>-floop-nest-optimize</code></dt>
- <dd><a name="index-floop_002dnest_002doptimize"></a>
- <p>Enable the isl based loop nest optimizer. This is a generic loop nest
- optimizer based on the Pluto optimization algorithms. It calculates a loop
- structure optimized for data-locality and parallelism. This option
- is experimental.
- </p>
- </dd>
- <dt><code>-floop-parallelize-all</code></dt>
- <dd><a name="index-floop_002dparallelize_002dall"></a>
- <p>Use the Graphite data dependence analysis to identify loops that can
- be parallelized. Parallelize all the loops that can be analyzed to
- not contain loop carried dependences without checking that it is
- profitable to parallelize the loops.
- </p>
- </dd>
- <dt><code>-ftree-coalesce-vars</code></dt>
- <dd><a name="index-ftree_002dcoalesce_002dvars"></a>
- <p>While transforming the program out of the SSA representation, attempt to
- reduce copying by coalescing versions of different user-defined
- variables, instead of just compiler temporaries. This may severely
- limit the ability to debug an optimized program compiled with
- <samp>-fno-var-tracking-assignments</samp>. In the negated form, this flag
- prevents SSA coalescing of user variables. This option is enabled by
- default if optimization is enabled, and it does very little otherwise.
- </p>
- </dd>
- <dt><code>-ftree-loop-if-convert</code></dt>
- <dd><a name="index-ftree_002dloop_002dif_002dconvert"></a>
- <p>Attempt to transform conditional jumps in the innermost loops to
- branch-less equivalents. The intent is to remove control-flow from
- the innermost loops in order to improve the ability of the
- vectorization pass to handle these loops. This is enabled by default
- if vectorization is enabled.
- </p>
- </dd>
- <dt><code>-ftree-loop-distribution</code></dt>
- <dd><a name="index-ftree_002dloop_002ddistribution"></a>
- <p>Perform loop distribution. This flag can improve cache performance on
- big loop bodies and allow further loop optimizations, like
- parallelization or vectorization, to take place. For example, the loop
- </p><div class="smallexample">
- <pre class="smallexample">DO I = 1, N
- A(I) = B(I) + C
- D(I) = E(I) * F
- ENDDO
- </pre></div>
- <p>is transformed to
- </p><div class="smallexample">
- <pre class="smallexample">DO I = 1, N
- A(I) = B(I) + C
- ENDDO
- DO I = 1, N
- D(I) = E(I) * F
- ENDDO
- </pre></div>
- <p>This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-ftree-loop-distribute-patterns</code></dt>
- <dd><a name="index-ftree_002dloop_002ddistribute_002dpatterns"></a>
- <p>Perform loop distribution of patterns that can be code generated with
- calls to a library. This flag is enabled by default at <samp>-O2</samp> and
- higher, and by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- <p>This pass distributes the initialization loops and generates a call to
- memset zero. For example, the loop
- </p><div class="smallexample">
- <pre class="smallexample">DO I = 1, N
- A(I) = 0
- B(I) = A(I) + I
- ENDDO
- </pre></div>
- <p>is transformed to
- </p><div class="smallexample">
- <pre class="smallexample">DO I = 1, N
- A(I) = 0
- ENDDO
- DO I = 1, N
- B(I) = A(I) + I
- ENDDO
- </pre></div>
- <p>and the initialization loop is transformed into a call to memset zero.
- This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-floop-interchange</code></dt>
- <dd><a name="index-floop_002dinterchange"></a>
- <p>Perform loop interchange outside of graphite. This flag can improve cache
- performance on loop nest and allow further loop optimizations, like
- vectorization, to take place. For example, the loop
- </p><div class="smallexample">
- <pre class="smallexample">for (int i = 0; i < N; i++)
- for (int j = 0; j < N; j++)
- for (int k = 0; k < N; k++)
- c[i][j] = c[i][j] + a[i][k]*b[k][j];
- </pre></div>
- <p>is transformed to
- </p><div class="smallexample">
- <pre class="smallexample">for (int i = 0; i < N; i++)
- for (int k = 0; k < N; k++)
- for (int j = 0; j < N; j++)
- c[i][j] = c[i][j] + a[i][k]*b[k][j];
- </pre></div>
- <p>This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-floop-unroll-and-jam</code></dt>
- <dd><a name="index-floop_002dunroll_002dand_002djam"></a>
- <p>Apply unroll and jam transformations on feasible loops. In a loop
- nest this unrolls the outer loop by some factor and fuses the resulting
- multiple inner loops. This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-ftree-loop-im</code></dt>
- <dd><a name="index-ftree_002dloop_002dim"></a>
- <p>Perform loop invariant motion on trees. This pass moves only invariants that
- are hard to handle at RTL level (function calls, operations that expand to
- nontrivial sequences of insns). With <samp>-funswitch-loops</samp> it also moves
- operands of conditions that are invariant out of the loop, so that we can use
- just trivial invariantness analysis in loop unswitching. The pass also includes
- store motion.
- </p>
- </dd>
- <dt><code>-ftree-loop-ivcanon</code></dt>
- <dd><a name="index-ftree_002dloop_002divcanon"></a>
- <p>Create a canonical counter for number of iterations in loops for which
- determining number of iterations requires complicated analysis. Later
- optimizations then may determine the number easily. Useful especially
- in connection with unrolling.
- </p>
- </dd>
- <dt><code>-ftree-scev-cprop</code></dt>
- <dd><a name="index-ftree_002dscev_002dcprop"></a>
- <p>Perform final value replacement. If a variable is modified in a loop
- in such a way that its value when exiting the loop can be determined using
- only its initial value and the number of loop iterations, replace uses of
- the final value by such a computation, provided it is sufficiently cheap.
- This reduces data dependencies and may allow further simplifications.
- Enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-fivopts</code></dt>
- <dd><a name="index-fivopts"></a>
- <p>Perform induction variable optimizations (strength reduction, induction
- variable merging and induction variable elimination) on trees.
- </p>
- </dd>
- <dt><code>-ftree-parallelize-loops=n</code></dt>
- <dd><a name="index-ftree_002dparallelize_002dloops"></a>
- <p>Parallelize loops, i.e., split their iteration space to run in n threads.
- This is only possible for loops whose iterations are independent
- and can be arbitrarily reordered. The optimization is only
- profitable on multiprocessor machines, for loops that are CPU-intensive,
- rather than constrained e.g. by memory bandwidth. This option
- implies <samp>-pthread</samp>, and thus is only supported on targets
- that have support for <samp>-pthread</samp>.
- </p>
- </dd>
- <dt><code>-ftree-pta</code></dt>
- <dd><a name="index-ftree_002dpta"></a>
- <p>Perform function-local points-to analysis on trees. This flag is
- enabled by default at <samp>-O1</samp> and higher, except for <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-ftree-sra</code></dt>
- <dd><a name="index-ftree_002dsra"></a>
- <p>Perform scalar replacement of aggregates. This pass replaces structure
- references with scalars to prevent committing structures to memory too
- early. This flag is enabled by default at <samp>-O1</samp> and higher,
- except for <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fstore-merging</code></dt>
- <dd><a name="index-fstore_002dmerging"></a>
- <p>Perform merging of narrow stores to consecutive memory addresses. This pass
- merges contiguous stores of immediate values narrower than a word into fewer
- wider stores to reduce the number of instructions. This is enabled by default
- at <samp>-O2</samp> and higher as well as <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-ftree-ter</code></dt>
- <dd><a name="index-ftree_002dter"></a>
- <p>Perform temporary expression replacement during the SSA->normal phase. Single
- use/single def temporaries are replaced at their use location with their
- defining expression. This results in non-GIMPLE code, but gives the expanders
- much more complex trees to work on resulting in better RTL generation. This is
- enabled by default at <samp>-O</samp> and higher.
- </p>
- </dd>
- <dt><code>-ftree-slsr</code></dt>
- <dd><a name="index-ftree_002dslsr"></a>
- <p>Perform straight-line strength reduction on trees. This recognizes related
- expressions involving multiplications and replaces them by less expensive
- calculations when possible. This is enabled by default at <samp>-O</samp> and
- higher.
- </p>
- </dd>
- <dt><code>-ftree-vectorize</code></dt>
- <dd><a name="index-ftree_002dvectorize"></a>
- <p>Perform vectorization on trees. This flag enables <samp>-ftree-loop-vectorize</samp>
- and <samp>-ftree-slp-vectorize</samp> if not explicitly specified.
- </p>
- </dd>
- <dt><code>-ftree-loop-vectorize</code></dt>
- <dd><a name="index-ftree_002dloop_002dvectorize"></a>
- <p>Perform loop vectorization on trees. This flag is enabled by default at
- <samp>-O3</samp> and by <samp>-ftree-vectorize</samp>, <samp>-fprofile-use</samp>,
- and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-ftree-slp-vectorize</code></dt>
- <dd><a name="index-ftree_002dslp_002dvectorize"></a>
- <p>Perform basic block vectorization on trees. This flag is enabled by default at
- <samp>-O3</samp> and by <samp>-ftree-vectorize</samp>, <samp>-fprofile-use</samp>,
- and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fvect-cost-model=<var>model</var></code></dt>
- <dd><a name="index-fvect_002dcost_002dmodel"></a>
- <p>Alter the cost model used for vectorization. The <var>model</var> argument
- should be one of ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’.
- With the ‘<samp>unlimited</samp>’ model the vectorized code-path is assumed
- to be profitable while with the ‘<samp>dynamic</samp>’ model a runtime check
- guards the vectorized code-path to enable it only for iteration
- counts that will likely execute faster than when executing the original
- scalar loop. The ‘<samp>cheap</samp>’ model disables vectorization of
- loops where doing so would be cost prohibitive for example due to
- required runtime checks for data dependence or alignment but otherwise
- is equal to the ‘<samp>dynamic</samp>’ model.
- The default cost model depends on other optimization flags and is
- either ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’.
- </p>
- </dd>
- <dt><code>-fsimd-cost-model=<var>model</var></code></dt>
- <dd><a name="index-fsimd_002dcost_002dmodel"></a>
- <p>Alter the cost model used for vectorization of loops marked with the OpenMP
- simd directive. The <var>model</var> argument should be one of
- ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’, ‘<samp>cheap</samp>’. All values of <var>model</var>
- have the same meaning as described in <samp>-fvect-cost-model</samp> and by
- default a cost model defined with <samp>-fvect-cost-model</samp> is used.
- </p>
- </dd>
- <dt><code>-ftree-vrp</code></dt>
- <dd><a name="index-ftree_002dvrp"></a>
- <p>Perform Value Range Propagation on trees. This is similar to the
- constant propagation pass, but instead of values, ranges of values are
- propagated. This allows the optimizers to remove unnecessary range
- checks like array bound checks and null pointer checks. This is
- enabled by default at <samp>-O2</samp> and higher. Null pointer check
- elimination is only done if <samp>-fdelete-null-pointer-checks</samp> is
- enabled.
- </p>
- </dd>
- <dt><code>-fsplit-paths</code></dt>
- <dd><a name="index-fsplit_002dpaths"></a>
- <p>Split paths leading to loop backedges. This can improve dead code
- elimination and common subexpression elimination. This is enabled by
- default at <samp>-O3</samp> and above.
- </p>
- </dd>
- <dt><code>-fsplit-ivs-in-unroller</code></dt>
- <dd><a name="index-fsplit_002divs_002din_002dunroller"></a>
- <p>Enables expression of values of induction variables in later iterations
- of the unrolled loop using the value in the first iteration. This breaks
- long dependency chains, thus improving efficiency of the scheduling passes.
- </p>
- <p>A combination of <samp>-fweb</samp> and CSE is often sufficient to obtain the
- same effect. However, that is not reliable in cases where the loop body
- is more complicated than a single basic block. It also does not work at all
- on some architectures due to restrictions in the CSE pass.
- </p>
- <p>This optimization is enabled by default.
- </p>
- </dd>
- <dt><code>-fvariable-expansion-in-unroller</code></dt>
- <dd><a name="index-fvariable_002dexpansion_002din_002dunroller"></a>
- <p>With this option, the compiler creates multiple copies of some
- local variables when unrolling a loop, which can result in superior code.
- </p>
- <p>This optimization is enabled by default for PowerPC targets, but disabled
- by default otherwise.
- </p>
- </dd>
- <dt><code>-fpartial-inlining</code></dt>
- <dd><a name="index-fpartial_002dinlining"></a>
- <p>Inline parts of functions. This option has any effect only
- when inlining itself is turned on by the <samp>-finline-functions</samp>
- or <samp>-finline-small-functions</samp> options.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fpredictive-commoning</code></dt>
- <dd><a name="index-fpredictive_002dcommoning"></a>
- <p>Perform predictive commoning optimization, i.e., reusing computations
- (especially memory loads and stores) performed in previous
- iterations of loops.
- </p>
- <p>This option is enabled at level <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fprefetch-loop-arrays</code></dt>
- <dd><a name="index-fprefetch_002dloop_002darrays"></a>
- <p>If supported by the target machine, generate instructions to prefetch
- memory to improve the performance of loops that access large arrays.
- </p>
- <p>This option may generate better or worse code; results are highly
- dependent on the structure of loops within the source code.
- </p>
- <p>Disabled at level <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fno-printf-return-value</code></dt>
- <dd><a name="index-fno_002dprintf_002dreturn_002dvalue"></a>
- <a name="index-fprintf_002dreturn_002dvalue"></a>
- <p>Do not substitute constants for known return value of formatted output
- functions such as <code>sprintf</code>, <code>snprintf</code>, <code>vsprintf</code>, and
- <code>vsnprintf</code> (but not <code>printf</code> of <code>fprintf</code>). This
- transformation allows GCC to optimize or even eliminate branches based
- on the known return value of these functions called with arguments that
- are either constant, or whose values are known to be in a range that
- makes determining the exact return value possible. For example, when
- <samp>-fprintf-return-value</samp> is in effect, both the branch and the
- body of the <code>if</code> statement (but not the call to <code>snprint</code>)
- can be optimized away when <code>i</code> is a 32-bit or smaller integer
- because the return value is guaranteed to be at most 8.
- </p>
- <div class="smallexample">
- <pre class="smallexample">char buf[9];
- if (snprintf (buf, "%08x", i) >= sizeof buf)
- …
- </pre></div>
-
- <p>The <samp>-fprintf-return-value</samp> option relies on other optimizations
- and yields best results with <samp>-O2</samp> and above. It works in tandem
- with the <samp>-Wformat-overflow</samp> and <samp>-Wformat-truncation</samp>
- options. The <samp>-fprintf-return-value</samp> option is enabled by default.
- </p>
- </dd>
- <dt><code>-fno-peephole</code></dt>
- <dt><code>-fno-peephole2</code></dt>
- <dd><a name="index-fno_002dpeephole"></a>
- <a name="index-fpeephole"></a>
- <a name="index-fno_002dpeephole2"></a>
- <a name="index-fpeephole2"></a>
- <p>Disable any machine-specific peephole optimizations. The difference
- between <samp>-fno-peephole</samp> and <samp>-fno-peephole2</samp> is in how they
- are implemented in the compiler; some targets use one, some use the
- other, a few use both.
- </p>
- <p><samp>-fpeephole</samp> is enabled by default.
- <samp>-fpeephole2</samp> enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fno-guess-branch-probability</code></dt>
- <dd><a name="index-fno_002dguess_002dbranch_002dprobability"></a>
- <a name="index-fguess_002dbranch_002dprobability"></a>
- <p>Do not guess branch probabilities using heuristics.
- </p>
- <p>GCC uses heuristics to guess branch probabilities if they are
- not provided by profiling feedback (<samp>-fprofile-arcs</samp>). These
- heuristics are based on the control flow graph. If some branch probabilities
- are specified by <code>__builtin_expect</code>, then the heuristics are
- used to guess branch probabilities for the rest of the control flow graph,
- taking the <code>__builtin_expect</code> info into account. The interactions
- between the heuristics and <code>__builtin_expect</code> can be complex, and in
- some cases, it may be useful to disable the heuristics so that the effects
- of <code>__builtin_expect</code> are easier to understand.
- </p>
- <p>It is also possible to specify expected probability of the expression
- with <code>__builtin_expect_with_probability</code> built-in function.
- </p>
- <p>The default is <samp>-fguess-branch-probability</samp> at levels
- <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-freorder-blocks</code></dt>
- <dd><a name="index-freorder_002dblocks"></a>
- <p>Reorder basic blocks in the compiled function in order to reduce number of
- taken branches and improve code locality.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-freorder-blocks-algorithm=<var>algorithm</var></code></dt>
- <dd><a name="index-freorder_002dblocks_002dalgorithm"></a>
- <p>Use the specified algorithm for basic block reordering. The
- <var>algorithm</var> argument can be ‘<samp>simple</samp>’, which does not increase
- code size (except sometimes due to secondary effects like alignment),
- or ‘<samp>stc</samp>’, the “software trace cache” algorithm, which tries to
- put all often executed code together, minimizing the number of branches
- executed by making extra copies of code.
- </p>
- <p>The default is ‘<samp>simple</samp>’ at levels <samp>-O</samp>, <samp>-Os</samp>, and
- ‘<samp>stc</samp>’ at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-freorder-blocks-and-partition</code></dt>
- <dd><a name="index-freorder_002dblocks_002dand_002dpartition"></a>
- <p>In addition to reordering basic blocks in the compiled function, in order
- to reduce number of taken branches, partitions hot and cold basic blocks
- into separate sections of the assembly and <samp>.o</samp> files, to improve
- paging and cache locality performance.
- </p>
- <p>This optimization is automatically turned off in the presence of
- exception handling or unwind tables (on targets using setjump/longjump or target specific scheme), for linkonce sections, for functions with a user-defined
- section attribute and on any architecture that does not support named
- sections. When <samp>-fsplit-stack</samp> is used this option is not
- enabled by default (to avoid linker errors), but may be enabled
- explicitly (if using a working linker).
- </p>
- <p>Enabled for x86 at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-freorder-functions</code></dt>
- <dd><a name="index-freorder_002dfunctions"></a>
- <p>Reorder functions in the object file in order to
- improve code locality. This is implemented by using special
- subsections <code>.text.hot</code> for most frequently executed functions and
- <code>.text.unlikely</code> for unlikely executed functions. Reordering is done by
- the linker so object file format must support named sections and linker must
- place them in a reasonable way.
- </p>
- <p>This option isn’t effective unless you either provide profile feedback
- (see <samp>-fprofile-arcs</samp> for details) or manually annotate functions with
- <code>hot</code> or <code>cold</code> attributes (see <a href="Common-Function-Attributes.html#Common-Function-Attributes">Common Function Attributes</a>).
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fstrict-aliasing</code></dt>
- <dd><a name="index-fstrict_002daliasing"></a>
- <p>Allow the compiler to assume the strictest aliasing rules applicable to
- the language being compiled. For C (and C++), this activates
- optimizations based on the type of expressions. In particular, an
- object of one type is assumed never to reside at the same address as an
- object of a different type, unless the types are almost the same. For
- example, an <code>unsigned int</code> can alias an <code>int</code>, but not a
- <code>void*</code> or a <code>double</code>. A character type may alias any other
- type.
- </p>
- <a name="Type_002dpunning"></a><p>Pay special attention to code like this:
- </p><div class="smallexample">
- <pre class="smallexample">union a_union {
- int i;
- double d;
- };
-
- int f() {
- union a_union t;
- t.d = 3.0;
- return t.i;
- }
- </pre></div>
- <p>The practice of reading from a different union member than the one most
- recently written to (called “type-punning”) is common. Even with
- <samp>-fstrict-aliasing</samp>, type-punning is allowed, provided the memory
- is accessed through the union type. So, the code above works as
- expected. See <a href="Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation">Structures unions enumerations and bit-fields implementation</a>. However, this code might not:
- </p><div class="smallexample">
- <pre class="smallexample">int f() {
- union a_union t;
- int* ip;
- t.d = 3.0;
- ip = &t.i;
- return *ip;
- }
- </pre></div>
-
- <p>Similarly, access by taking the address, casting the resulting pointer
- and dereferencing the result has undefined behavior, even if the cast
- uses a union type, e.g.:
- </p><div class="smallexample">
- <pre class="smallexample">int f() {
- double d = 3.0;
- return ((union a_union *) &d)->i;
- }
- </pre></div>
-
- <p>The <samp>-fstrict-aliasing</samp> option is enabled at levels
- <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-falign-functions</code></dt>
- <dt><code>-falign-functions=<var>n</var></code></dt>
- <dt><code>-falign-functions=<var>n</var>:<var>m</var></code></dt>
- <dt><code>-falign-functions=<var>n</var>:<var>m</var>:<var>n2</var></code></dt>
- <dt><code>-falign-functions=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt>
- <dd><a name="index-falign_002dfunctions"></a>
- <p>Align the start of functions to the next power-of-two greater than or
- equal to <var>n</var>, skipping up to <var>m</var>-1 bytes. This ensures that at
- least the first <var>m</var> bytes of the function can be fetched by the CPU
- without crossing an <var>n</var>-byte alignment boundary.
- </p>
- <p>If <var>m</var> is not specified, it defaults to <var>n</var>.
- </p>
- <p>Examples: <samp>-falign-functions=32</samp> aligns functions to the next
- 32-byte boundary, <samp>-falign-functions=24</samp> aligns to the next
- 32-byte boundary only if this can be done by skipping 23 bytes or less,
- <samp>-falign-functions=32:7</samp> aligns to the next
- 32-byte boundary only if this can be done by skipping 6 bytes or less.
- </p>
- <p>The second pair of <var>n2</var>:<var>m2</var> values allows you to specify
- a secondary alignment: <samp>-falign-functions=64:7:32:3</samp> aligns to
- the next 64-byte boundary if this can be done by skipping 6 bytes or less,
- otherwise aligns to the next 32-byte boundary if this can be done
- by skipping 2 bytes or less.
- If <var>m2</var> is not specified, it defaults to <var>n2</var>.
- </p>
- <p>Some assemblers only support this flag when <var>n</var> is a power of two;
- in that case, it is rounded up.
- </p>
- <p><samp>-fno-align-functions</samp> and <samp>-falign-functions=1</samp> are
- equivalent and mean that functions are not aligned.
- </p>
- <p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
- The maximum allowed <var>n</var> option value is 65536.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-flimit-function-alignment</code></dt>
- <dd><p>If this option is enabled, the compiler tries to avoid unnecessarily
- overaligning functions. It attempts to instruct the assembler to align
- by the amount specified by <samp>-falign-functions</samp>, but not to
- skip more bytes than the size of the function.
- </p>
- </dd>
- <dt><code>-falign-labels</code></dt>
- <dt><code>-falign-labels=<var>n</var></code></dt>
- <dt><code>-falign-labels=<var>n</var>:<var>m</var></code></dt>
- <dt><code>-falign-labels=<var>n</var>:<var>m</var>:<var>n2</var></code></dt>
- <dt><code>-falign-labels=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt>
- <dd><a name="index-falign_002dlabels"></a>
- <p>Align all branch targets to a power-of-two boundary.
- </p>
- <p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option.
- <samp>-fno-align-labels</samp> and <samp>-falign-labels=1</samp> are
- equivalent and mean that labels are not aligned.
- </p>
- <p>If <samp>-falign-loops</samp> or <samp>-falign-jumps</samp> are applicable and
- are greater than this value, then their values are used instead.
- </p>
- <p>If <var>n</var> is not specified or is zero, use a machine-dependent default
- which is very likely to be ‘<samp>1</samp>’, meaning no alignment.
- The maximum allowed <var>n</var> option value is 65536.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-falign-loops</code></dt>
- <dt><code>-falign-loops=<var>n</var></code></dt>
- <dt><code>-falign-loops=<var>n</var>:<var>m</var></code></dt>
- <dt><code>-falign-loops=<var>n</var>:<var>m</var>:<var>n2</var></code></dt>
- <dt><code>-falign-loops=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt>
- <dd><a name="index-falign_002dloops"></a>
- <p>Align loops to a power-of-two boundary. If the loops are executed
- many times, this makes up for any execution of the dummy padding
- instructions.
- </p>
- <p>If <samp>-falign-labels</samp> is greater than this value, then its value
- is used instead.
- </p>
- <p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option.
- <samp>-fno-align-loops</samp> and <samp>-falign-loops=1</samp> are
- equivalent and mean that loops are not aligned.
- The maximum allowed <var>n</var> option value is 65536.
- </p>
- <p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-falign-jumps</code></dt>
- <dt><code>-falign-jumps=<var>n</var></code></dt>
- <dt><code>-falign-jumps=<var>n</var>:<var>m</var></code></dt>
- <dt><code>-falign-jumps=<var>n</var>:<var>m</var>:<var>n2</var></code></dt>
- <dt><code>-falign-jumps=<var>n</var>:<var>m</var>:<var>n2</var>:<var>m2</var></code></dt>
- <dd><a name="index-falign_002djumps"></a>
- <p>Align branch targets to a power-of-two boundary, for branch targets
- where the targets can only be reached by jumping. In this case,
- no dummy operations need be executed.
- </p>
- <p>If <samp>-falign-labels</samp> is greater than this value, then its value
- is used instead.
- </p>
- <p>Parameters of this option are analogous to the <samp>-falign-functions</samp> option.
- <samp>-fno-align-jumps</samp> and <samp>-falign-jumps=1</samp> are
- equivalent and mean that loops are not aligned.
- </p>
- <p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
- The maximum allowed <var>n</var> option value is 65536.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
- </p>
- </dd>
- <dt><code>-fno-allocation-dce</code></dt>
- <dd><a name="index-fno_002dallocation_002ddce"></a>
- <p>Do not remove unused C++ allocations in dead code elimination.
- </p>
- </dd>
- <dt><code>-fallow-store-data-races</code></dt>
- <dd><a name="index-fallow_002dstore_002ddata_002draces"></a>
- <p>Allow the compiler to introduce new data races on stores.
- </p>
- <p>Enabled at level <samp>-Ofast</samp>.
- </p>
- </dd>
- <dt><code>-funit-at-a-time</code></dt>
- <dd><a name="index-funit_002dat_002da_002dtime"></a>
- <p>This option is left for compatibility reasons. <samp>-funit-at-a-time</samp>
- has no effect, while <samp>-fno-unit-at-a-time</samp> implies
- <samp>-fno-toplevel-reorder</samp> and <samp>-fno-section-anchors</samp>.
- </p>
- <p>Enabled by default.
- </p>
- </dd>
- <dt><code>-fno-toplevel-reorder</code></dt>
- <dd><a name="index-fno_002dtoplevel_002dreorder"></a>
- <a name="index-ftoplevel_002dreorder"></a>
- <p>Do not reorder top-level functions, variables, and <code>asm</code>
- statements. Output them in the same order that they appear in the
- input file. When this option is used, unreferenced static variables
- are not removed. This option is intended to support existing code
- that relies on a particular ordering. For new code, it is better to
- use attributes when possible.
- </p>
- <p><samp>-ftoplevel-reorder</samp> is the default at <samp>-O1</samp> and higher, and
- also at <samp>-O0</samp> if <samp>-fsection-anchors</samp> is explicitly requested.
- Additionally <samp>-fno-toplevel-reorder</samp> implies
- <samp>-fno-section-anchors</samp>.
- </p>
- </dd>
- <dt><code>-fweb</code></dt>
- <dd><a name="index-fweb"></a>
- <p>Constructs webs as commonly used for register allocation purposes and assign
- each web individual pseudo register. This allows the register allocation pass
- to operate on pseudos directly, but also strengthens several other optimization
- passes, such as CSE, loop optimizer and trivial dead code remover. It can,
- however, make debugging impossible, since variables no longer stay in a
- “home register”.
- </p>
- <p>Enabled by default with <samp>-funroll-loops</samp>.
- </p>
- </dd>
- <dt><code>-fwhole-program</code></dt>
- <dd><a name="index-fwhole_002dprogram"></a>
- <p>Assume that the current compilation unit represents the whole program being
- compiled. All public functions and variables with the exception of <code>main</code>
- and those merged by attribute <code>externally_visible</code> become static functions
- and in effect are optimized more aggressively by interprocedural optimizers.
- </p>
- <p>This option should not be used in combination with <samp>-flto</samp>.
- Instead relying on a linker plugin should provide safer and more precise
- information.
- </p>
- </dd>
- <dt><code>-flto[=<var>n</var>]</code></dt>
- <dd><a name="index-flto"></a>
- <p>This option runs the standard link-time optimizer. When invoked
- with source code, it generates GIMPLE (one of GCC’s internal
- representations) and writes it to special ELF sections in the object
- file. When the object files are linked together, all the function
- bodies are read from these ELF sections and instantiated as if they
- had been part of the same translation unit.
- </p>
- <p>To use the link-time optimizer, <samp>-flto</samp> and optimization
- options should be specified at compile time and during the final link.
- It is recommended that you compile all the files participating in the
- same link with the same options and also specify those options at
- link time.
- For example:
- </p>
- <div class="smallexample">
- <pre class="smallexample">gcc -c -O2 -flto foo.c
- gcc -c -O2 -flto bar.c
- gcc -o myprog -flto -O2 foo.o bar.o
- </pre></div>
-
- <p>The first two invocations to GCC save a bytecode representation
- of GIMPLE into special ELF sections inside <samp>foo.o</samp> and
- <samp>bar.o</samp>. The final invocation reads the GIMPLE bytecode from
- <samp>foo.o</samp> and <samp>bar.o</samp>, merges the two files into a single
- internal image, and compiles the result as usual. Since both
- <samp>foo.o</samp> and <samp>bar.o</samp> are merged into a single image, this
- causes all the interprocedural analyses and optimizations in GCC to
- work across the two files as if they were a single one. This means,
- for example, that the inliner is able to inline functions in
- <samp>bar.o</samp> into functions in <samp>foo.o</samp> and vice-versa.
- </p>
- <p>Another (simpler) way to enable link-time optimization is:
- </p>
- <div class="smallexample">
- <pre class="smallexample">gcc -o myprog -flto -O2 foo.c bar.c
- </pre></div>
-
- <p>The above generates bytecode for <samp>foo.c</samp> and <samp>bar.c</samp>,
- merges them together into a single GIMPLE representation and optimizes
- them as usual to produce <samp>myprog</samp>.
- </p>
- <p>The important thing to keep in mind is that to enable link-time
- optimizations you need to use the GCC driver to perform the link step.
- GCC automatically performs link-time optimization if any of the
- objects involved were compiled with the <samp>-flto</samp> command-line option.
- You can always override
- the automatic decision to do link-time optimization
- by passing <samp>-fno-lto</samp> to the link command.
- </p>
- <p>To make whole program optimization effective, it is necessary to make
- certain whole program assumptions. The compiler needs to know
- what functions and variables can be accessed by libraries and runtime
- outside of the link-time optimized unit. When supported by the linker,
- the linker plugin (see <samp>-fuse-linker-plugin</samp>) passes information
- to the compiler about used and externally visible symbols. When
- the linker plugin is not available, <samp>-fwhole-program</samp> should be
- used to allow the compiler to make these assumptions, which leads
- to more aggressive optimization decisions.
- </p>
- <p>When a file is compiled with <samp>-flto</samp> without
- <samp>-fuse-linker-plugin</samp>, the generated object file is larger than
- a regular object file because it contains GIMPLE bytecodes and the usual
- final code (see <samp>-ffat-lto-objects</samp>. This means that
- object files with LTO information can be linked as normal object
- files; if <samp>-fno-lto</samp> is passed to the linker, no
- interprocedural optimizations are applied. Note that when
- <samp>-fno-fat-lto-objects</samp> is enabled the compile stage is faster
- but you cannot perform a regular, non-LTO link on them.
- </p>
- <p>When producing the final binary, GCC only
- applies link-time optimizations to those files that contain bytecode.
- Therefore, you can mix and match object files and libraries with
- GIMPLE bytecodes and final object code. GCC automatically selects
- which files to optimize in LTO mode and which files to link without
- further processing.
- </p>
- <p>Generally, options specified at link time override those
- specified at compile time, although in some cases GCC attempts to infer
- link-time options from the settings used to compile the input files.
- </p>
- <p>If you do not specify an optimization level option <samp>-O</samp> at
- link time, then GCC uses the highest optimization level
- used when compiling the object files. Note that it is generally
- ineffective to specify an optimization level option only at link time and
- not at compile time, for two reasons. First, compiling without
- optimization suppresses compiler passes that gather information
- needed for effective optimization at link time. Second, some early
- optimization passes can be performed only at compile time and
- not at link time.
- </p>
- <p>There are some code generation flags preserved by GCC when
- generating bytecodes, as they need to be used during the final link.
- Currently, the following options and their settings are taken from
- the first object file that explicitly specifies them:
- <samp>-fPIC</samp>, <samp>-fpic</samp>, <samp>-fpie</samp>, <samp>-fcommon</samp>,
- <samp>-fexceptions</samp>, <samp>-fnon-call-exceptions</samp>, <samp>-fgnu-tm</samp>
- and all the <samp>-m</samp> target flags.
- </p>
- <p>Certain ABI-changing flags are required to match in all compilation units,
- and trying to override this at link time with a conflicting value
- is ignored. This includes options such as <samp>-freg-struct-return</samp>
- and <samp>-fpcc-struct-return</samp>.
- </p>
- <p>Other options such as <samp>-ffp-contract</samp>, <samp>-fno-strict-overflow</samp>,
- <samp>-fwrapv</samp>, <samp>-fno-trapv</samp> or <samp>-fno-strict-aliasing</samp>
- are passed through to the link stage and merged conservatively for
- conflicting translation units. Specifically
- <samp>-fno-strict-overflow</samp>, <samp>-fwrapv</samp> and <samp>-fno-trapv</samp> take
- precedence; and for example <samp>-ffp-contract=off</samp> takes precedence
- over <samp>-ffp-contract=fast</samp>. You can override them at link time.
- </p>
- <p>Diagnostic options such as <samp>-Wstringop-overflow</samp> are passed
- through to the link stage and their setting matches that of the
- compile-step at function granularity. Note that this matters only
- for diagnostics emitted during optimization. Note that code
- transforms such as inlining can lead to warnings being enabled
- or disabled for regions if code not consistent with the setting
- at compile time.
- </p>
- <p>When you need to pass options to the assembler via <samp>-Wa</samp> or
- <samp>-Xassembler</samp> make sure to either compile such translation
- units with <samp>-fno-lto</samp> or consistently use the same assembler
- options on all translation units. You can alternatively also
- specify assembler options at LTO link time.
- </p>
- <p>To enable debug info generation you need to supply <samp>-g</samp> at
- compile time. If any of the input files at link time were built
- with debug info generation enabled the link will enable debug info
- generation as well. Any elaborate debug info settings
- like the dwarf level <samp>-gdwarf-5</samp> need to be explicitly repeated
- at the linker command line and mixing different settings in different
- translation units is discouraged.
- </p>
- <p>If LTO encounters objects with C linkage declared with incompatible
- types in separate translation units to be linked together (undefined
- behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be
- issued. The behavior is still undefined at run time. Similar
- diagnostics may be raised for other languages.
- </p>
- <p>Another feature of LTO is that it is possible to apply interprocedural
- optimizations on files written in different languages:
- </p>
- <div class="smallexample">
- <pre class="smallexample">gcc -c -flto foo.c
- g++ -c -flto bar.cc
- gfortran -c -flto baz.f90
- g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
- </pre></div>
-
- <p>Notice that the final link is done with <code>g++</code> to get the C++
- runtime libraries and <samp>-lgfortran</samp> is added to get the Fortran
- runtime libraries. In general, when mixing languages in LTO mode, you
- should use the same link command options as when mixing languages in a
- regular (non-LTO) compilation.
- </p>
- <p>If object files containing GIMPLE bytecode are stored in a library archive, say
- <samp>libfoo.a</samp>, it is possible to extract and use them in an LTO link if you
- are using a linker with plugin support. To create static libraries suitable
- for LTO, use <code>gcc-ar</code> and <code>gcc-ranlib</code> instead of <code>ar</code>
- and <code>ranlib</code>;
- to show the symbols of object files with GIMPLE bytecode, use
- <code>gcc-nm</code>. Those commands require that <code>ar</code>, <code>ranlib</code>
- and <code>nm</code> have been compiled with plugin support. At link time, use the
- flag <samp>-fuse-linker-plugin</samp> to ensure that the library participates in
- the LTO optimization process:
- </p>
- <div class="smallexample">
- <pre class="smallexample">gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo
- </pre></div>
-
- <p>With the linker plugin enabled, the linker extracts the needed
- GIMPLE files from <samp>libfoo.a</samp> and passes them on to the running GCC
- to make them part of the aggregated GIMPLE image to be optimized.
- </p>
- <p>If you are not using a linker with plugin support and/or do not
- enable the linker plugin, then the objects inside <samp>libfoo.a</samp>
- are extracted and linked as usual, but they do not participate
- in the LTO optimization process. In order to make a static library suitable
- for both LTO optimization and usual linkage, compile its object files with
- <samp>-flto</samp> <samp>-ffat-lto-objects</samp>.
- </p>
- <p>Link-time optimizations do not require the presence of the whole program to
- operate. If the program does not require any symbols to be exported, it is
- possible to combine <samp>-flto</samp> and <samp>-fwhole-program</samp> to allow
- the interprocedural optimizers to use more aggressive assumptions which may
- lead to improved optimization opportunities.
- Use of <samp>-fwhole-program</samp> is not needed when linker plugin is
- active (see <samp>-fuse-linker-plugin</samp>).
- </p>
- <p>The current implementation of LTO makes no
- attempt to generate bytecode that is portable between different
- types of hosts. The bytecode files are versioned and there is a
- strict version check, so bytecode files generated in one version of
- GCC do not work with an older or newer version of GCC.
- </p>
- <p>Link-time optimization does not work well with generation of debugging
- information on systems other than those using a combination of ELF and
- DWARF.
- </p>
- <p>If you specify the optional <var>n</var>, the optimization and code
- generation done at link time is executed in parallel using <var>n</var>
- parallel jobs by utilizing an installed <code>make</code> program. The
- environment variable <code>MAKE</code> may be used to override the program
- used.
- </p>
- <p>You can also specify <samp>-flto=jobserver</samp> to use GNU make’s
- job server mode to determine the number of parallel jobs. This
- is useful when the Makefile calling GCC is already executing in parallel.
- You must prepend a ‘<samp>+</samp>’ to the command recipe in the parent Makefile
- for this to work. This option likely only works if <code>MAKE</code> is
- GNU make. Even without the option value, GCC tries to automatically
- detect a running GNU make’s job server.
- </p>
- <p>Use <samp>-flto=auto</samp> to use GNU make’s job server, if available,
- or otherwise fall back to autodetection of the number of CPU threads
- present in your system.
- </p>
- </dd>
- <dt><code>-flto-partition=<var>alg</var></code></dt>
- <dd><a name="index-flto_002dpartition"></a>
- <p>Specify the partitioning algorithm used by the link-time optimizer.
- The value is either ‘<samp>1to1</samp>’ to specify a partitioning mirroring
- the original source files or ‘<samp>balanced</samp>’ to specify partitioning
- into equally sized chunks (whenever possible) or ‘<samp>max</samp>’ to create
- new partition for every symbol where possible. Specifying ‘<samp>none</samp>’
- as an algorithm disables partitioning and streaming completely.
- The default value is ‘<samp>balanced</samp>’. While ‘<samp>1to1</samp>’ can be used
- as an workaround for various code ordering issues, the ‘<samp>max</samp>’
- partitioning is intended for internal testing only.
- The value ‘<samp>one</samp>’ specifies that exactly one partition should be
- used while the value ‘<samp>none</samp>’ bypasses partitioning and executes
- the link-time optimization step directly from the WPA phase.
- </p>
- </dd>
- <dt><code>-flto-compression-level=<var>n</var></code></dt>
- <dd><a name="index-flto_002dcompression_002dlevel"></a>
- <p>This option specifies the level of compression used for intermediate
- language written to LTO object files, and is only meaningful in
- conjunction with LTO mode (<samp>-flto</samp>). Valid
- values are 0 (no compression) to 9 (maximum compression). Values
- outside this range are clamped to either 0 or 9. If the option is not
- given, a default balanced compression setting is used.
- </p>
- </dd>
- <dt><code>-fuse-linker-plugin</code></dt>
- <dd><a name="index-fuse_002dlinker_002dplugin"></a>
- <p>Enables the use of a linker plugin during link-time optimization. This
- option relies on plugin support in the linker, which is available in gold
- or in GNU ld 2.21 or newer.
- </p>
- <p>This option enables the extraction of object files with GIMPLE bytecode out
- of library archives. This improves the quality of optimization by exposing
- more code to the link-time optimizer. This information specifies what
- symbols can be accessed externally (by non-LTO object or during dynamic
- linking). Resulting code quality improvements on binaries (and shared
- libraries that use hidden visibility) are similar to <samp>-fwhole-program</samp>.
- See <samp>-flto</samp> for a description of the effect of this flag and how to
- use it.
- </p>
- <p>This option is enabled by default when LTO support in GCC is enabled
- and GCC was configured for use with
- a linker supporting plugins (GNU ld 2.21 or newer or gold).
- </p>
- </dd>
- <dt><code>-ffat-lto-objects</code></dt>
- <dd><a name="index-ffat_002dlto_002dobjects"></a>
- <p>Fat LTO objects are object files that contain both the intermediate language
- and the object code. This makes them usable for both LTO linking and normal
- linking. This option is effective only when compiling with <samp>-flto</samp>
- and is ignored at link time.
- </p>
- <p><samp>-fno-fat-lto-objects</samp> improves compilation time over plain LTO, but
- requires the complete toolchain to be aware of LTO. It requires a linker with
- linker plugin support for basic functionality. Additionally,
- <code>nm</code>, <code>ar</code> and <code>ranlib</code>
- need to support linker plugins to allow a full-featured build environment
- (capable of building static libraries etc). GCC provides the <code>gcc-ar</code>,
- <code>gcc-nm</code>, <code>gcc-ranlib</code> wrappers to pass the right options
- to these tools. With non fat LTO makefiles need to be modified to use them.
- </p>
- <p>Note that modern binutils provide plugin auto-load mechanism.
- Installing the linker plugin into <samp>$libdir/bfd-plugins</samp> has the same
- effect as usage of the command wrappers (<code>gcc-ar</code>, <code>gcc-nm</code> and
- <code>gcc-ranlib</code>).
- </p>
- <p>The default is <samp>-fno-fat-lto-objects</samp> on targets with linker plugin
- support.
- </p>
- </dd>
- <dt><code>-fcompare-elim</code></dt>
- <dd><a name="index-fcompare_002delim"></a>
- <p>After register allocation and post-register allocation instruction splitting,
- identify arithmetic instructions that compute processor flags similar to a
- comparison operation based on that arithmetic. If possible, eliminate the
- explicit comparison operation.
- </p>
- <p>This pass only applies to certain targets that cannot explicitly represent
- the comparison operation before register allocation is complete.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fcprop-registers</code></dt>
- <dd><a name="index-fcprop_002dregisters"></a>
- <p>After register allocation and post-register allocation instruction splitting,
- perform a copy-propagation pass to try to reduce scheduling dependencies
- and occasionally eliminate the copy.
- </p>
- <p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-fprofile-correction</code></dt>
- <dd><a name="index-fprofile_002dcorrection"></a>
- <p>Profiles collected using an instrumented binary for multi-threaded programs may
- be inconsistent due to missed counter updates. When this option is specified,
- GCC uses heuristics to correct or smooth out such inconsistencies. By
- default, GCC emits an error message when an inconsistent profile is detected.
- </p>
- <p>This option is enabled by <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fprofile-partial-training</code></dt>
- <dd><a name="index-fprofile_002dpartial_002dtraining"></a>
- <p>With <code>-fprofile-use</code> all portions of programs not executed during train
- run are optimized agressively for size rather than speed. In some cases it is
- not practical to train all possible hot paths in the program. (For
- example, program may contain functions specific for a given hardware and
- trianing may not cover all hardware configurations program is run on.) With
- <code>-fprofile-partial-training</code> profile feedback will be ignored for all
- functions not executed during the train run leading them to be optimized as if
- they were compiled without profile feedback. This leads to better performance
- when train run is not representative but also leads to significantly bigger
- code.
- </p>
- </dd>
- <dt><code>-fprofile-use</code></dt>
- <dt><code>-fprofile-use=<var>path</var></code></dt>
- <dd><a name="index-fprofile_002duse"></a>
- <p>Enable profile feedback-directed optimizations,
- and the following optimizations, many of which
- are generally profitable only with profile feedback available:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-fbranch-probabilities -fprofile-values
- -funroll-loops -fpeel-loops -ftracer -fvpt
- -finline-functions -fipa-cp -fipa-cp-clone -fipa-bit-cp
- -fpredictive-commoning -fsplit-loops -funswitch-loops
- -fgcse-after-reload -ftree-loop-vectorize -ftree-slp-vectorize
- -fvect-cost-model=dynamic -ftree-loop-distribute-patterns
- -fprofile-reorder-functions
- </pre></div>
-
- <p>Before you can use this option, you must first generate profiling information.
- See <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>, for information about the
- <samp>-fprofile-generate</samp> option.
- </p>
- <p>By default, GCC emits an error message if the feedback profiles do not
- match the source code. This error can be turned into a warning by using
- <samp>-Wno-error=coverage-mismatch</samp>. Note this may result in poorly
- optimized code. Additionally, by default, GCC also emits a warning message if
- the feedback profiles do not exist (see <samp>-Wmissing-profile</samp>).
- </p>
- <p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find
- the profile feedback data files. See <samp>-fprofile-dir</samp>.
- </p>
- </dd>
- <dt><code>-fauto-profile</code></dt>
- <dt><code>-fauto-profile=<var>path</var></code></dt>
- <dd><a name="index-fauto_002dprofile"></a>
- <p>Enable sampling-based feedback-directed optimizations,
- and the following optimizations,
- many of which are generally profitable only with profile feedback available:
- </p>
- <div class="smallexample">
- <pre class="smallexample">-fbranch-probabilities -fprofile-values
- -funroll-loops -fpeel-loops -ftracer -fvpt
- -finline-functions -fipa-cp -fipa-cp-clone -fipa-bit-cp
- -fpredictive-commoning -fsplit-loops -funswitch-loops
- -fgcse-after-reload -ftree-loop-vectorize -ftree-slp-vectorize
- -fvect-cost-model=dynamic -ftree-loop-distribute-patterns
- -fprofile-correction
- </pre></div>
-
- <p><var>path</var> is the name of a file containing AutoFDO profile information.
- If omitted, it defaults to <samp>fbdata.afdo</samp> in the current directory.
- </p>
- <p>Producing an AutoFDO profile data file requires running your program
- with the <code>perf</code> utility on a supported GNU/Linux target system.
- For more information, see <a href="https://perf.wiki.kernel.org/">https://perf.wiki.kernel.org/</a>.
- </p>
- <p>E.g.
- </p><div class="smallexample">
- <pre class="smallexample">perf record -e br_inst_retired:near_taken -b -o perf.data \
- -- your_program
- </pre></div>
-
- <p>Then use the <code>create_gcov</code> tool to convert the raw profile data
- to a format that can be used by GCC. You must also supply the
- unstripped binary for your program to this tool.
- See <a href="https://github.com/google/autofdo">https://github.com/google/autofdo</a>.
- </p>
- <p>E.g.
- </p><div class="smallexample">
- <pre class="smallexample">create_gcov --binary=your_program.unstripped --profile=perf.data \
- --gcov=profile.afdo
- </pre></div>
- </dd>
- </dl>
-
- <p>The following options control compiler behavior regarding floating-point
- arithmetic. These options trade off between speed and
- correctness. All must be specifically enabled.
- </p>
- <dl compact="compact">
- <dt><code>-ffloat-store</code></dt>
- <dd><a name="index-ffloat_002dstore"></a>
- <p>Do not store floating-point variables in registers, and inhibit other
- options that might change whether a floating-point value is taken from a
- register or memory.
- </p>
- <a name="index-floating_002dpoint-precision"></a>
- <p>This option prevents undesirable excess precision on machines such as
- the 68000 where the floating registers (of the 68881) keep more
- precision than a <code>double</code> is supposed to have. Similarly for the
- x86 architecture. For most programs, the excess precision does only
- good, but a few programs rely on the precise definition of IEEE floating
- point. Use <samp>-ffloat-store</samp> for such programs, after modifying
- them to store all pertinent intermediate computations into variables.
- </p>
- </dd>
- <dt><code>-fexcess-precision=<var>style</var></code></dt>
- <dd><a name="index-fexcess_002dprecision"></a>
- <p>This option allows further control over excess precision on machines
- where floating-point operations occur in a format with more precision or
- range than the IEEE standard and interchange floating-point types. By
- default, <samp>-fexcess-precision=fast</samp> is in effect; this means that
- operations may be carried out in a wider precision than the types specified
- in the source if that would result in faster code, and it is unpredictable
- when rounding to the types specified in the source code takes place.
- When compiling C, if <samp>-fexcess-precision=standard</samp> is specified then
- excess precision follows the rules specified in ISO C99; in particular,
- both casts and assignments cause values to be rounded to their
- semantic types (whereas <samp>-ffloat-store</samp> only affects
- assignments). This option is enabled by default for C if a strict
- conformance option such as <samp>-std=c99</samp> is used.
- <samp>-ffast-math</samp> enables <samp>-fexcess-precision=fast</samp> by default
- regardless of whether a strict conformance option is used.
- </p>
- <a name="index-mfpmath"></a>
- <p><samp>-fexcess-precision=standard</samp> is not implemented for languages
- other than C. On the x86, it has no effect if <samp>-mfpmath=sse</samp>
- or <samp>-mfpmath=sse+387</samp> is specified; in the former case, IEEE
- semantics apply without excess precision, and in the latter, rounding
- is unpredictable.
- </p>
- </dd>
- <dt><code>-ffast-math</code></dt>
- <dd><a name="index-ffast_002dmath"></a>
- <p>Sets the options <samp>-fno-math-errno</samp>, <samp>-funsafe-math-optimizations</samp>,
- <samp>-ffinite-math-only</samp>, <samp>-fno-rounding-math</samp>,
- <samp>-fno-signaling-nans</samp>, <samp>-fcx-limited-range</samp> and
- <samp>-fexcess-precision=fast</samp>.
- </p>
- <p>This option causes the preprocessor macro <code>__FAST_MATH__</code> to be defined.
- </p>
- <p>This option is not turned on by any <samp>-O</samp> option besides
- <samp>-Ofast</samp> since it can result in incorrect output for programs
- that depend on an exact implementation of IEEE or ISO rules/specifications
- for math functions. It may, however, yield faster code for programs
- that do not require the guarantees of these specifications.
- </p>
- </dd>
- <dt><code>-fno-math-errno</code></dt>
- <dd><a name="index-fno_002dmath_002derrno"></a>
- <a name="index-fmath_002derrno"></a>
- <p>Do not set <code>errno</code> after calling math functions that are executed
- with a single instruction, e.g., <code>sqrt</code>. A program that relies on
- IEEE exceptions for math error handling may want to use this flag
- for speed while maintaining IEEE arithmetic compatibility.
- </p>
- <p>This option is not turned on by any <samp>-O</samp> option since
- it can result in incorrect output for programs that depend on
- an exact implementation of IEEE or ISO rules/specifications for
- math functions. It may, however, yield faster code for programs
- that do not require the guarantees of these specifications.
- </p>
- <p>The default is <samp>-fmath-errno</samp>.
- </p>
- <p>On Darwin systems, the math library never sets <code>errno</code>. There is
- therefore no reason for the compiler to consider the possibility that
- it might, and <samp>-fno-math-errno</samp> is the default.
- </p>
- </dd>
- <dt><code>-funsafe-math-optimizations</code></dt>
- <dd><a name="index-funsafe_002dmath_002doptimizations"></a>
-
- <p>Allow optimizations for floating-point arithmetic that (a) assume
- that arguments and results are valid and (b) may violate IEEE or
- ANSI standards. When used at link time, it may include libraries
- or startup files that change the default FPU control word or other
- similar optimizations.
- </p>
- <p>This option is not turned on by any <samp>-O</samp> option since
- it can result in incorrect output for programs that depend on
- an exact implementation of IEEE or ISO rules/specifications for
- math functions. It may, however, yield faster code for programs
- that do not require the guarantees of these specifications.
- Enables <samp>-fno-signed-zeros</samp>, <samp>-fno-trapping-math</samp>,
- <samp>-fassociative-math</samp> and <samp>-freciprocal-math</samp>.
- </p>
- <p>The default is <samp>-fno-unsafe-math-optimizations</samp>.
- </p>
- </dd>
- <dt><code>-fassociative-math</code></dt>
- <dd><a name="index-fassociative_002dmath"></a>
-
- <p>Allow re-association of operands in series of floating-point operations.
- This violates the ISO C and C++ language standard by possibly changing
- computation result. NOTE: re-ordering may change the sign of zero as
- well as ignore NaNs and inhibit or create underflow or overflow (and
- thus cannot be used on code that relies on rounding behavior like
- <code>(x + 2**52) - 2**52</code>. May also reorder floating-point comparisons
- and thus may not be used when ordered comparisons are required.
- This option requires that both <samp>-fno-signed-zeros</samp> and
- <samp>-fno-trapping-math</samp> be in effect. Moreover, it doesn’t make
- much sense with <samp>-frounding-math</samp>. For Fortran the option
- is automatically enabled when both <samp>-fno-signed-zeros</samp> and
- <samp>-fno-trapping-math</samp> are in effect.
- </p>
- <p>The default is <samp>-fno-associative-math</samp>.
- </p>
- </dd>
- <dt><code>-freciprocal-math</code></dt>
- <dd><a name="index-freciprocal_002dmath"></a>
-
- <p>Allow the reciprocal of a value to be used instead of dividing by
- the value if this enables optimizations. For example <code>x / y</code>
- can be replaced with <code>x * (1/y)</code>, which is useful if <code>(1/y)</code>
- is subject to common subexpression elimination. Note that this loses
- precision and increases the number of flops operating on the value.
- </p>
- <p>The default is <samp>-fno-reciprocal-math</samp>.
- </p>
- </dd>
- <dt><code>-ffinite-math-only</code></dt>
- <dd><a name="index-ffinite_002dmath_002donly"></a>
- <p>Allow optimizations for floating-point arithmetic that assume
- that arguments and results are not NaNs or +-Infs.
- </p>
- <p>This option is not turned on by any <samp>-O</samp> option since
- it can result in incorrect output for programs that depend on
- an exact implementation of IEEE or ISO rules/specifications for
- math functions. It may, however, yield faster code for programs
- that do not require the guarantees of these specifications.
- </p>
- <p>The default is <samp>-fno-finite-math-only</samp>.
- </p>
- </dd>
- <dt><code>-fno-signed-zeros</code></dt>
- <dd><a name="index-fno_002dsigned_002dzeros"></a>
- <a name="index-fsigned_002dzeros"></a>
- <p>Allow optimizations for floating-point arithmetic that ignore the
- signedness of zero. IEEE arithmetic specifies the behavior of
- distinct +0.0 and -0.0 values, which then prohibits simplification
- of expressions such as x+0.0 or 0.0*x (even with <samp>-ffinite-math-only</samp>).
- This option implies that the sign of a zero result isn’t significant.
- </p>
- <p>The default is <samp>-fsigned-zeros</samp>.
- </p>
- </dd>
- <dt><code>-fno-trapping-math</code></dt>
- <dd><a name="index-fno_002dtrapping_002dmath"></a>
- <a name="index-ftrapping_002dmath"></a>
- <p>Compile code assuming that floating-point operations cannot generate
- user-visible traps. These traps include division by zero, overflow,
- underflow, inexact result and invalid operation. This option requires
- that <samp>-fno-signaling-nans</samp> be in effect. Setting this option may
- allow faster code if one relies on “non-stop” IEEE arithmetic, for example.
- </p>
- <p>This option should never be turned on by any <samp>-O</samp> option since
- it can result in incorrect output for programs that depend on
- an exact implementation of IEEE or ISO rules/specifications for
- math functions.
- </p>
- <p>The default is <samp>-ftrapping-math</samp>.
- </p>
- </dd>
- <dt><code>-frounding-math</code></dt>
- <dd><a name="index-frounding_002dmath"></a>
- <p>Disable transformations and optimizations that assume default floating-point
- rounding behavior. This is round-to-zero for all floating point
- to integer conversions, and round-to-nearest for all other arithmetic
- truncations. This option should be specified for programs that change
- the FP rounding mode dynamically, or that may be executed with a
- non-default rounding mode. This option disables constant folding of
- floating-point expressions at compile time (which may be affected by
- rounding mode) and arithmetic transformations that are unsafe in the
- presence of sign-dependent rounding modes.
- </p>
- <p>The default is <samp>-fno-rounding-math</samp>.
- </p>
- <p>This option is experimental and does not currently guarantee to
- disable all GCC optimizations that are affected by rounding mode.
- Future versions of GCC may provide finer control of this setting
- using C99’s <code>FENV_ACCESS</code> pragma. This command-line option
- will be used to specify the default state for <code>FENV_ACCESS</code>.
- </p>
- </dd>
- <dt><code>-fsignaling-nans</code></dt>
- <dd><a name="index-fsignaling_002dnans"></a>
- <p>Compile code assuming that IEEE signaling NaNs may generate user-visible
- traps during floating-point operations. Setting this option disables
- optimizations that may change the number of exceptions visible with
- signaling NaNs. This option implies <samp>-ftrapping-math</samp>.
- </p>
- <p>This option causes the preprocessor macro <code>__SUPPORT_SNAN__</code> to
- be defined.
- </p>
- <p>The default is <samp>-fno-signaling-nans</samp>.
- </p>
- <p>This option is experimental and does not currently guarantee to
- disable all GCC optimizations that affect signaling NaN behavior.
- </p>
- </dd>
- <dt><code>-fno-fp-int-builtin-inexact</code></dt>
- <dd><a name="index-fno_002dfp_002dint_002dbuiltin_002dinexact"></a>
- <a name="index-ffp_002dint_002dbuiltin_002dinexact"></a>
- <p>Do not allow the built-in functions <code>ceil</code>, <code>floor</code>,
- <code>round</code> and <code>trunc</code>, and their <code>float</code> and <code>long
- double</code> variants, to generate code that raises the “inexact”
- floating-point exception for noninteger arguments. ISO C99 and C11
- allow these functions to raise the “inexact” exception, but ISO/IEC
- TS 18661-1:2014, the C bindings to IEEE 754-2008, as integrated into
- ISO C2X, does not allow these functions to do so.
- </p>
- <p>The default is <samp>-ffp-int-builtin-inexact</samp>, allowing the
- exception to be raised, unless C2X or a later C standard is selected.
- This option does nothing unless <samp>-ftrapping-math</samp> is in effect.
- </p>
- <p>Even if <samp>-fno-fp-int-builtin-inexact</samp> is used, if the functions
- generate a call to a library function then the “inexact” exception
- may be raised if the library implementation does not follow TS 18661.
- </p>
- </dd>
- <dt><code>-fsingle-precision-constant</code></dt>
- <dd><a name="index-fsingle_002dprecision_002dconstant"></a>
- <p>Treat floating-point constants as single precision instead of
- implicitly converting them to double-precision constants.
- </p>
- </dd>
- <dt><code>-fcx-limited-range</code></dt>
- <dd><a name="index-fcx_002dlimited_002drange"></a>
- <p>When enabled, this option states that a range reduction step is not
- needed when performing complex division. Also, there is no checking
- whether the result of a complex multiplication or division is <code>NaN
- + I*NaN</code>, with an attempt to rescue the situation in that case. The
- default is <samp>-fno-cx-limited-range</samp>, but is enabled by
- <samp>-ffast-math</samp>.
- </p>
- <p>This option controls the default setting of the ISO C99
- <code>CX_LIMITED_RANGE</code> pragma. Nevertheless, the option applies to
- all languages.
- </p>
- </dd>
- <dt><code>-fcx-fortran-rules</code></dt>
- <dd><a name="index-fcx_002dfortran_002drules"></a>
- <p>Complex multiplication and division follow Fortran rules. Range
- reduction is done as part of complex division, but there is no checking
- whether the result of a complex multiplication or division is <code>NaN
- + I*NaN</code>, with an attempt to rescue the situation in that case.
- </p>
- <p>The default is <samp>-fno-cx-fortran-rules</samp>.
- </p>
- </dd>
- </dl>
-
- <p>The following options control optimizations that may improve
- performance, but are not enabled by any <samp>-O</samp> options. This
- section includes experimental options that may produce broken code.
- </p>
- <dl compact="compact">
- <dt><code>-fbranch-probabilities</code></dt>
- <dd><a name="index-fbranch_002dprobabilities"></a>
- <p>After running a program compiled with <samp>-fprofile-arcs</samp>
- (see <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>),
- you can compile it a second time using
- <samp>-fbranch-probabilities</samp>, to improve optimizations based on
- the number of times each branch was taken. When a program
- compiled with <samp>-fprofile-arcs</samp> exits, it saves arc execution
- counts to a file called <samp><var>sourcename</var>.gcda</samp> for each source
- file. The information in this data file is very dependent on the
- structure of the generated code, so you must use the same source code
- and the same optimization options for both compilations.
- </p>
- <p>With <samp>-fbranch-probabilities</samp>, GCC puts a
- ‘<samp>REG_BR_PROB</samp>’ note on each ‘<samp>JUMP_INSN</samp>’ and ‘<samp>CALL_INSN</samp>’.
- These can be used to improve optimization. Currently, they are only
- used in one place: in <samp>reorg.c</samp>, instead of guessing which path a
- branch is most likely to take, the ‘<samp>REG_BR_PROB</samp>’ values are used to
- exactly determine which path is taken more often.
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fprofile-values</code></dt>
- <dd><a name="index-fprofile_002dvalues"></a>
- <p>If combined with <samp>-fprofile-arcs</samp>, it adds code so that some
- data about values of expressions in the program is gathered.
- </p>
- <p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered
- from profiling values of expressions for usage in optimizations.
- </p>
- <p>Enabled by <samp>-fprofile-generate</samp>, <samp>-fprofile-use</samp>, and
- <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fprofile-reorder-functions</code></dt>
- <dd><a name="index-fprofile_002dreorder_002dfunctions"></a>
- <p>Function reordering based on profile instrumentation collects
- first time of execution of a function and orders these functions
- in ascending order.
- </p>
- <p>Enabled with <samp>-fprofile-use</samp>.
- </p>
- </dd>
- <dt><code>-fvpt</code></dt>
- <dd><a name="index-fvpt"></a>
- <p>If combined with <samp>-fprofile-arcs</samp>, this option instructs the compiler
- to add code to gather information about values of expressions.
- </p>
- <p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered
- and actually performs the optimizations based on them.
- Currently the optimizations include specialization of division operations
- using the knowledge about the value of the denominator.
- </p>
- <p>Enabled with <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-frename-registers</code></dt>
- <dd><a name="index-frename_002dregisters"></a>
- <p>Attempt to avoid false dependencies in scheduled code by making use
- of registers left over after register allocation. This optimization
- most benefits processors with lots of registers. Depending on the
- debug information format adopted by the target, however, it can
- make debugging impossible, since variables no longer stay in
- a “home register”.
- </p>
- <p>Enabled by default with <samp>-funroll-loops</samp>.
- </p>
- </dd>
- <dt><code>-fschedule-fusion</code></dt>
- <dd><a name="index-fschedule_002dfusion"></a>
- <p>Performs a target dependent pass over the instruction stream to schedule
- instructions of same type together because target machine can execute them
- more efficiently if they are adjacent to each other in the instruction flow.
- </p>
- <p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
- </p>
- </dd>
- <dt><code>-ftracer</code></dt>
- <dd><a name="index-ftracer"></a>
- <p>Perform tail duplication to enlarge superblock size. This transformation
- simplifies the control flow of the function allowing other optimizations to do
- a better job.
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-funroll-loops</code></dt>
- <dd><a name="index-funroll_002dloops"></a>
- <p>Unroll loops whose number of iterations can be determined at compile time or
- upon entry to the loop. <samp>-funroll-loops</samp> implies
- <samp>-frerun-cse-after-loop</samp>, <samp>-fweb</samp> and <samp>-frename-registers</samp>.
- It also turns on complete loop peeling (i.e. complete removal of loops with
- a small constant number of iterations). This option makes code larger, and may
- or may not make it run faster.
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-funroll-all-loops</code></dt>
- <dd><a name="index-funroll_002dall_002dloops"></a>
- <p>Unroll all loops, even if their number of iterations is uncertain when
- the loop is entered. This usually makes programs run more slowly.
- <samp>-funroll-all-loops</samp> implies the same options as
- <samp>-funroll-loops</samp>.
- </p>
- </dd>
- <dt><code>-fpeel-loops</code></dt>
- <dd><a name="index-fpeel_002dloops"></a>
- <p>Peels loops for which there is enough information that they do not
- roll much (from profile feedback or static analysis). It also turns on
- complete loop peeling (i.e. complete removal of loops with small constant
- number of iterations).
- </p>
- <p>Enabled by <samp>-O3</samp>, <samp>-fprofile-use</samp>, and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fmove-loop-invariants</code></dt>
- <dd><a name="index-fmove_002dloop_002dinvariants"></a>
- <p>Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
- at level <samp>-O1</samp> and higher, except for <samp>-Og</samp>.
- </p>
- </dd>
- <dt><code>-fsplit-loops</code></dt>
- <dd><a name="index-fsplit_002dloops"></a>
- <p>Split a loop into two if it contains a condition that’s always true
- for one side of the iteration space and false for the other.
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-funswitch-loops</code></dt>
- <dd><a name="index-funswitch_002dloops"></a>
- <p>Move branches with loop invariant conditions out of the loop, with duplicates
- of the loop on both branches (modified according to result of the condition).
- </p>
- <p>Enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-fversion-loops-for-strides</code></dt>
- <dd><a name="index-fversion_002dloops_002dfor_002dstrides"></a>
- <p>If a loop iterates over an array with a variable stride, create another
- version of the loop that assumes the stride is always one. For example:
- </p>
- <div class="smallexample">
- <pre class="smallexample">for (int i = 0; i < n; ++i)
- x[i * stride] = …;
- </pre></div>
-
- <p>becomes:
- </p>
- <div class="smallexample">
- <pre class="smallexample">if (stride == 1)
- for (int i = 0; i < n; ++i)
- x[i] = …;
- else
- for (int i = 0; i < n; ++i)
- x[i * stride] = …;
- </pre></div>
-
- <p>This is particularly useful for assumed-shape arrays in Fortran where
- (for example) it allows better vectorization assuming contiguous accesses.
- This flag is enabled by default at <samp>-O3</samp>.
- It is also enabled by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
- </p>
- </dd>
- <dt><code>-ffunction-sections</code></dt>
- <dt><code>-fdata-sections</code></dt>
- <dd><a name="index-ffunction_002dsections"></a>
- <a name="index-fdata_002dsections"></a>
- <p>Place each function or data item into its own section in the output
- file if the target supports arbitrary sections. The name of the
- function or the name of the data item determines the section’s name
- in the output file.
- </p>
- <p>Use these options on systems where the linker can perform optimizations to
- improve locality of reference in the instruction space. Most systems using the
- ELF object format have linkers with such optimizations. On AIX, the linker
- rearranges sections (CSECTs) based on the call graph. The performance impact
- varies.
- </p>
- <p>Together with a linker garbage collection (linker <samp>--gc-sections</samp>
- option) these options may lead to smaller statically-linked executables (after
- stripping).
- </p>
- <p>On ELF/DWARF systems these options do not degenerate the quality of the debug
- information. There could be issues with other object files/debug info formats.
- </p>
- <p>Only use these options when there are significant benefits from doing so. When
- you specify these options, the assembler and linker create larger object and
- executable files and are also slower. These options affect code generation.
- They prevent optimizations by the compiler and assembler using relative
- locations inside a translation unit since the locations are unknown until
- link time. An example of such an optimization is relaxing calls to short call
- instructions.
- </p>
- </dd>
- <dt><code>-fstdarg-opt</code></dt>
- <dd><a name="index-fstdarg_002dopt"></a>
- <p>Optimize the prologue of variadic argument functions with respect to usage of
- those arguments.
- </p>
- </dd>
- <dt><code>-fsection-anchors</code></dt>
- <dd><a name="index-fsection_002danchors"></a>
- <p>Try to reduce the number of symbolic address calculations by using
- shared “anchor” symbols to address nearby objects. This transformation
- can help to reduce the number of GOT entries and GOT accesses on some
- targets.
- </p>
- <p>For example, the implementation of the following function <code>foo</code>:
- </p>
- <div class="smallexample">
- <pre class="smallexample">static int a, b, c;
- int foo (void) { return a + b + c; }
- </pre></div>
-
- <p>usually calculates the addresses of all three variables, but if you
- compile it with <samp>-fsection-anchors</samp>, it accesses the variables
- from a common anchor point instead. The effect is similar to the
- following pseudocode (which isn’t valid C):
- </p>
- <div class="smallexample">
- <pre class="smallexample">int foo (void)
- {
- register int *xr = &x;
- return xr[&a - &x] + xr[&b - &x] + xr[&c - &x];
- }
- </pre></div>
-
- <p>Not all targets support this option.
- </p>
- </dd>
- <dt><code>--param <var>name</var>=<var>value</var></code></dt>
- <dd><a name="index-param"></a>
- <p>In some places, GCC uses various constants to control the amount of
- optimization that is done. For example, GCC does not inline functions
- that contain more than a certain number of instructions. You can
- control some of these constants on the command line using the
- <samp>--param</samp> option.
- </p>
- <p>The names of specific parameters, and the meaning of the values, are
- tied to the internals of the compiler, and are subject to change
- without notice in future releases.
- </p>
- <p>In order to get minimal, maximal and default value of a parameter,
- one can use <samp>--help=param -Q</samp> options.
- </p>
- <p>In each case, the <var>value</var> is an integer. The following choices
- of <var>name</var> are recognized for all targets:
- </p>
- <dl compact="compact">
- <dt><code>predictable-branch-outcome</code></dt>
- <dd><p>When branch is predicted to be taken with probability lower than this threshold
- (in percent), then it is considered well predictable.
- </p>
- </dd>
- <dt><code>max-rtl-if-conversion-insns</code></dt>
- <dd><p>RTL if-conversion tries to remove conditional branches around a block and
- replace them with conditionally executed instructions. This parameter
- gives the maximum number of instructions in a block which should be
- considered for if-conversion. The compiler will
- also use other heuristics to decide whether if-conversion is likely to be
- profitable.
- </p>
- </dd>
- <dt><code>max-rtl-if-conversion-predictable-cost</code></dt>
- <dt><code>max-rtl-if-conversion-unpredictable-cost</code></dt>
- <dd><p>RTL if-conversion will try to remove conditional branches around a block
- and replace them with conditionally executed instructions. These parameters
- give the maximum permissible cost for the sequence that would be generated
- by if-conversion depending on whether the branch is statically determined
- to be predictable or not. The units for this parameter are the same as
- those for the GCC internal seq_cost metric. The compiler will try to
- provide a reasonable default for this parameter using the BRANCH_COST
- target macro.
- </p>
- </dd>
- <dt><code>max-crossjump-edges</code></dt>
- <dd><p>The maximum number of incoming edges to consider for cross-jumping.
- The algorithm used by <samp>-fcrossjumping</samp> is <em>O(N^2)</em> in
- the number of edges incoming to each block. Increasing values mean
- more aggressive optimization, making the compilation time increase with
- probably small improvement in executable size.
- </p>
- </dd>
- <dt><code>min-crossjump-insns</code></dt>
- <dd><p>The minimum number of instructions that must be matched at the end
- of two blocks before cross-jumping is performed on them. This
- value is ignored in the case where all instructions in the block being
- cross-jumped from are matched.
- </p>
- </dd>
- <dt><code>max-grow-copy-bb-insns</code></dt>
- <dd><p>The maximum code size expansion factor when copying basic blocks
- instead of jumping. The expansion is relative to a jump instruction.
- </p>
- </dd>
- <dt><code>max-goto-duplication-insns</code></dt>
- <dd><p>The maximum number of instructions to duplicate to a block that jumps
- to a computed goto. To avoid <em>O(N^2)</em> behavior in a number of
- passes, GCC factors computed gotos early in the compilation process,
- and unfactors them as late as possible. Only computed jumps at the
- end of a basic blocks with no more than max-goto-duplication-insns are
- unfactored.
- </p>
- </dd>
- <dt><code>max-delay-slot-insn-search</code></dt>
- <dd><p>The maximum number of instructions to consider when looking for an
- instruction to fill a delay slot. If more than this arbitrary number of
- instructions are searched, the time savings from filling the delay slot
- are minimal, so stop searching. Increasing values mean more
- aggressive optimization, making the compilation time increase with probably
- small improvement in execution time.
- </p>
- </dd>
- <dt><code>max-delay-slot-live-search</code></dt>
- <dd><p>When trying to fill delay slots, the maximum number of instructions to
- consider when searching for a block with valid live register
- information. Increasing this arbitrarily chosen value means more
- aggressive optimization, increasing the compilation time. This parameter
- should be removed when the delay slot code is rewritten to maintain the
- control-flow graph.
- </p>
- </dd>
- <dt><code>max-gcse-memory</code></dt>
- <dd><p>The approximate maximum amount of memory that can be allocated in
- order to perform the global common subexpression elimination
- optimization. If more memory than specified is required, the
- optimization is not done.
- </p>
- </dd>
- <dt><code>max-gcse-insertion-ratio</code></dt>
- <dd><p>If the ratio of expression insertions to deletions is larger than this value
- for any expression, then RTL PRE inserts or removes the expression and thus
- leaves partially redundant computations in the instruction stream.
- </p>
- </dd>
- <dt><code>max-pending-list-length</code></dt>
- <dd><p>The maximum number of pending dependencies scheduling allows
- before flushing the current state and starting over. Large functions
- with few branches or calls can create excessively large lists which
- needlessly consume memory and resources.
- </p>
- </dd>
- <dt><code>max-modulo-backtrack-attempts</code></dt>
- <dd><p>The maximum number of backtrack attempts the scheduler should make
- when modulo scheduling a loop. Larger values can exponentially increase
- compilation time.
- </p>
- </dd>
- <dt><code>max-inline-insns-single</code></dt>
- <dd><p>Several parameters control the tree inliner used in GCC. This number sets the
- maximum number of instructions (counted in GCC’s internal representation) in a
- single function that the tree inliner considers for inlining. This only
- affects functions declared inline and methods implemented in a class
- declaration (C++).
- </p>
-
- </dd>
- <dt><code>max-inline-insns-auto</code></dt>
- <dd><p>When you use <samp>-finline-functions</samp> (included in <samp>-O3</samp>),
- a lot of functions that would otherwise not be considered for inlining
- by the compiler are investigated. To those functions, a different
- (more restrictive) limit compared to functions declared inline can
- be applied (<samp>--param max-inline-insns-auto</samp>).
- </p>
- </dd>
- <dt><code>max-inline-insns-small</code></dt>
- <dd><p>This is bound applied to calls which are considered relevant with
- <samp>-finline-small-functions</samp>.
- </p>
- </dd>
- <dt><code>max-inline-insns-size</code></dt>
- <dd><p>This is bound applied to calls which are optimized for size. Small growth
- may be desirable to anticipate optimization oppurtunities exposed by inlining.
- </p>
- </dd>
- <dt><code>uninlined-function-insns</code></dt>
- <dd><p>Number of instructions accounted by inliner for function overhead such as
- function prologue and epilogue.
- </p>
- </dd>
- <dt><code>uninlined-function-time</code></dt>
- <dd><p>Extra time accounted by inliner for function overhead such as time needed to
- execute function prologue and epilogue
- </p>
- </dd>
- <dt><code>inline-heuristics-hint-percent</code></dt>
- <dd><p>The scale (in percents) applied to <samp>inline-insns-single</samp>,
- <samp>inline-insns-single-O2</samp>, <samp>inline-insns-auto</samp>
- when inline heuristics hints that inlining is
- very profitable (will enable later optimizations).
- </p>
- </dd>
- <dt><code>uninlined-thunk-insns</code></dt>
- <dt><code>uninlined-thunk-time</code></dt>
- <dd><p>Same as <samp>--param uninlined-function-insns</samp> and
- <samp>--param uninlined-function-time</samp> but applied to function thunks
- </p>
- </dd>
- <dt><code>inline-min-speedup</code></dt>
- <dd><p>When estimated performance improvement of caller + callee runtime exceeds this
- threshold (in percent), the function can be inlined regardless of the limit on
- <samp>--param max-inline-insns-single</samp> and <samp>--param
- max-inline-insns-auto</samp>.
- </p>
- </dd>
- <dt><code>large-function-insns</code></dt>
- <dd><p>The limit specifying really large functions. For functions larger than this
- limit after inlining, inlining is constrained by
- <samp>--param large-function-growth</samp>. This parameter is useful primarily
- to avoid extreme compilation time caused by non-linear algorithms used by the
- back end.
- </p>
- </dd>
- <dt><code>large-function-growth</code></dt>
- <dd><p>Specifies maximal growth of large function caused by inlining in percents.
- For example, parameter value 100 limits large function growth to 2.0 times
- the original size.
- </p>
- </dd>
- <dt><code>large-unit-insns</code></dt>
- <dd><p>The limit specifying large translation unit. Growth caused by inlining of
- units larger than this limit is limited by <samp>--param inline-unit-growth</samp>.
- For small units this might be too tight.
- For example, consider a unit consisting of function A
- that is inline and B that just calls A three times. If B is small relative to
- A, the growth of unit is 300\% and yet such inlining is very sane. For very
- large units consisting of small inlineable functions, however, the overall unit
- growth limit is needed to avoid exponential explosion of code size. Thus for
- smaller units, the size is increased to <samp>--param large-unit-insns</samp>
- before applying <samp>--param inline-unit-growth</samp>.
- </p>
- </dd>
- <dt><code>inline-unit-growth</code></dt>
- <dd><p>Specifies maximal overall growth of the compilation unit caused by inlining.
- For example, parameter value 20 limits unit growth to 1.2 times the original
- size. Cold functions (either marked cold via an attribute or by profile
- feedback) are not accounted into the unit size.
- </p>
- </dd>
- <dt><code>ipa-cp-unit-growth</code></dt>
- <dd><p>Specifies maximal overall growth of the compilation unit caused by
- interprocedural constant propagation. For example, parameter value 10 limits
- unit growth to 1.1 times the original size.
- </p>
- </dd>
- <dt><code>large-stack-frame</code></dt>
- <dd><p>The limit specifying large stack frames. While inlining the algorithm is trying
- to not grow past this limit too much.
- </p>
- </dd>
- <dt><code>large-stack-frame-growth</code></dt>
- <dd><p>Specifies maximal growth of large stack frames caused by inlining in percents.
- For example, parameter value 1000 limits large stack frame growth to 11 times
- the original size.
- </p>
- </dd>
- <dt><code>max-inline-insns-recursive</code></dt>
- <dt><code>max-inline-insns-recursive-auto</code></dt>
- <dd><p>Specifies the maximum number of instructions an out-of-line copy of a
- self-recursive inline
- function can grow into by performing recursive inlining.
- </p>
- <p><samp>--param max-inline-insns-recursive</samp> applies to functions
- declared inline.
- For functions not declared inline, recursive inlining
- happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is
- enabled; <samp>--param max-inline-insns-recursive-auto</samp> applies instead.
- </p>
- </dd>
- <dt><code>max-inline-recursive-depth</code></dt>
- <dt><code>max-inline-recursive-depth-auto</code></dt>
- <dd><p>Specifies the maximum recursion depth used for recursive inlining.
- </p>
- <p><samp>--param max-inline-recursive-depth</samp> applies to functions
- declared inline. For functions not declared inline, recursive inlining
- happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is
- enabled; <samp>--param max-inline-recursive-depth-auto</samp> applies instead.
- </p>
- </dd>
- <dt><code>min-inline-recursive-probability</code></dt>
- <dd><p>Recursive inlining is profitable only for function having deep recursion
- in average and can hurt for function having little recursion depth by
- increasing the prologue size or complexity of function body to other
- optimizers.
- </p>
- <p>When profile feedback is available (see <samp>-fprofile-generate</samp>) the actual
- recursion depth can be guessed from the probability that function recurses
- via a given call expression. This parameter limits inlining only to call
- expressions whose probability exceeds the given threshold (in percents).
- </p>
- </dd>
- <dt><code>early-inlining-insns</code></dt>
- <dd><p>Specify growth that the early inliner can make. In effect it increases
- the amount of inlining for code having a large abstraction penalty.
- </p>
- </dd>
- <dt><code>max-early-inliner-iterations</code></dt>
- <dd><p>Limit of iterations of the early inliner. This basically bounds
- the number of nested indirect calls the early inliner can resolve.
- Deeper chains are still handled by late inlining.
- </p>
- </dd>
- <dt><code>comdat-sharing-probability</code></dt>
- <dd><p>Probability (in percent) that C++ inline function with comdat visibility
- are shared across multiple compilation units.
- </p>
- </dd>
- <dt><code>profile-func-internal-id</code></dt>
- <dd><p>A parameter to control whether to use function internal id in profile
- database lookup. If the value is 0, the compiler uses an id that
- is based on function assembler name and filename, which makes old profile
- data more tolerant to source changes such as function reordering etc.
- </p>
- </dd>
- <dt><code>min-vect-loop-bound</code></dt>
- <dd><p>The minimum number of iterations under which loops are not vectorized
- when <samp>-ftree-vectorize</samp> is used. The number of iterations after
- vectorization needs to be greater than the value specified by this option
- to allow vectorization.
- </p>
- </dd>
- <dt><code>gcse-cost-distance-ratio</code></dt>
- <dd><p>Scaling factor in calculation of maximum distance an expression
- can be moved by GCSE optimizations. This is currently supported only in the
- code hoisting pass. The bigger the ratio, the more aggressive code hoisting
- is with simple expressions, i.e., the expressions that have cost
- less than <samp>gcse-unrestricted-cost</samp>. Specifying 0 disables
- hoisting of simple expressions.
- </p>
- </dd>
- <dt><code>gcse-unrestricted-cost</code></dt>
- <dd><p>Cost, roughly measured as the cost of a single typical machine
- instruction, at which GCSE optimizations do not constrain
- the distance an expression can travel. This is currently
- supported only in the code hoisting pass. The lesser the cost,
- the more aggressive code hoisting is. Specifying 0
- allows all expressions to travel unrestricted distances.
- </p>
- </dd>
- <dt><code>max-hoist-depth</code></dt>
- <dd><p>The depth of search in the dominator tree for expressions to hoist.
- This is used to avoid quadratic behavior in hoisting algorithm.
- The value of 0 does not limit on the search, but may slow down compilation
- of huge functions.
- </p>
- </dd>
- <dt><code>max-tail-merge-comparisons</code></dt>
- <dd><p>The maximum amount of similar bbs to compare a bb with. This is used to
- avoid quadratic behavior in tree tail merging.
- </p>
- </dd>
- <dt><code>max-tail-merge-iterations</code></dt>
- <dd><p>The maximum amount of iterations of the pass over the function. This is used to
- limit compilation time in tree tail merging.
- </p>
- </dd>
- <dt><code>store-merging-allow-unaligned</code></dt>
- <dd><p>Allow the store merging pass to introduce unaligned stores if it is legal to
- do so.
- </p>
- </dd>
- <dt><code>max-stores-to-merge</code></dt>
- <dd><p>The maximum number of stores to attempt to merge into wider stores in the store
- merging pass.
- </p>
- </dd>
- <dt><code>max-unrolled-insns</code></dt>
- <dd><p>The maximum number of instructions that a loop may have to be unrolled.
- If a loop is unrolled, this parameter also determines how many times
- the loop code is unrolled.
- </p>
- </dd>
- <dt><code>max-average-unrolled-insns</code></dt>
- <dd><p>The maximum number of instructions biased by probabilities of their execution
- that a loop may have to be unrolled. If a loop is unrolled,
- this parameter also determines how many times the loop code is unrolled.
- </p>
- </dd>
- <dt><code>max-unroll-times</code></dt>
- <dd><p>The maximum number of unrollings of a single loop.
- </p>
- </dd>
- <dt><code>max-peeled-insns</code></dt>
- <dd><p>The maximum number of instructions that a loop may have to be peeled.
- If a loop is peeled, this parameter also determines how many times
- the loop code is peeled.
- </p>
- </dd>
- <dt><code>max-peel-times</code></dt>
- <dd><p>The maximum number of peelings of a single loop.
- </p>
- </dd>
- <dt><code>max-peel-branches</code></dt>
- <dd><p>The maximum number of branches on the hot path through the peeled sequence.
- </p>
- </dd>
- <dt><code>max-completely-peeled-insns</code></dt>
- <dd><p>The maximum number of insns of a completely peeled loop.
- </p>
- </dd>
- <dt><code>max-completely-peel-times</code></dt>
- <dd><p>The maximum number of iterations of a loop to be suitable for complete peeling.
- </p>
- </dd>
- <dt><code>max-completely-peel-loop-nest-depth</code></dt>
- <dd><p>The maximum depth of a loop nest suitable for complete peeling.
- </p>
- </dd>
- <dt><code>max-unswitch-insns</code></dt>
- <dd><p>The maximum number of insns of an unswitched loop.
- </p>
- </dd>
- <dt><code>max-unswitch-level</code></dt>
- <dd><p>The maximum number of branches unswitched in a single loop.
- </p>
- </dd>
- <dt><code>lim-expensive</code></dt>
- <dd><p>The minimum cost of an expensive expression in the loop invariant motion.
- </p>
- </dd>
- <dt><code>min-loop-cond-split-prob</code></dt>
- <dd><p>When FDO profile information is available, <samp>min-loop-cond-split-prob</samp>
- specifies minimum threshold for probability of semi-invariant condition
- statement to trigger loop split.
- </p>
- </dd>
- <dt><code>iv-consider-all-candidates-bound</code></dt>
- <dd><p>Bound on number of candidates for induction variables, below which
- all candidates are considered for each use in induction variable
- optimizations. If there are more candidates than this,
- only the most relevant ones are considered to avoid quadratic time complexity.
- </p>
- </dd>
- <dt><code>iv-max-considered-uses</code></dt>
- <dd><p>The induction variable optimizations give up on loops that contain more
- induction variable uses.
- </p>
- </dd>
- <dt><code>iv-always-prune-cand-set-bound</code></dt>
- <dd><p>If the number of candidates in the set is smaller than this value,
- always try to remove unnecessary ivs from the set
- when adding a new one.
- </p>
- </dd>
- <dt><code>avg-loop-niter</code></dt>
- <dd><p>Average number of iterations of a loop.
- </p>
- </dd>
- <dt><code>dse-max-object-size</code></dt>
- <dd><p>Maximum size (in bytes) of objects tracked bytewise by dead store elimination.
- Larger values may result in larger compilation times.
- </p>
- </dd>
- <dt><code>dse-max-alias-queries-per-store</code></dt>
- <dd><p>Maximum number of queries into the alias oracle per store.
- Larger values result in larger compilation times and may result in more
- removed dead stores.
- </p>
- </dd>
- <dt><code>scev-max-expr-size</code></dt>
- <dd><p>Bound on size of expressions used in the scalar evolutions analyzer.
- Large expressions slow the analyzer.
- </p>
- </dd>
- <dt><code>scev-max-expr-complexity</code></dt>
- <dd><p>Bound on the complexity of the expressions in the scalar evolutions analyzer.
- Complex expressions slow the analyzer.
- </p>
- </dd>
- <dt><code>max-tree-if-conversion-phi-args</code></dt>
- <dd><p>Maximum number of arguments in a PHI supported by TREE if conversion
- unless the loop is marked with simd pragma.
- </p>
- </dd>
- <dt><code>vect-max-version-for-alignment-checks</code></dt>
- <dd><p>The maximum number of run-time checks that can be performed when
- doing loop versioning for alignment in the vectorizer.
- </p>
- </dd>
- <dt><code>vect-max-version-for-alias-checks</code></dt>
- <dd><p>The maximum number of run-time checks that can be performed when
- doing loop versioning for alias in the vectorizer.
- </p>
- </dd>
- <dt><code>vect-max-peeling-for-alignment</code></dt>
- <dd><p>The maximum number of loop peels to enhance access alignment
- for vectorizer. Value -1 means no limit.
- </p>
- </dd>
- <dt><code>max-iterations-to-track</code></dt>
- <dd><p>The maximum number of iterations of a loop the brute-force algorithm
- for analysis of the number of iterations of the loop tries to evaluate.
- </p>
- </dd>
- <dt><code>hot-bb-count-fraction</code></dt>
- <dd><p>The denominator n of fraction 1/n of the maximal execution count of a
- basic block in the entire program that a basic block needs to at least
- have in order to be considered hot. The default is 10000, which means
- that a basic block is considered hot if its execution count is greater
- than 1/10000 of the maximal execution count. 0 means that it is never
- considered hot. Used in non-LTO mode.
- </p>
- </dd>
- <dt><code>hot-bb-count-ws-permille</code></dt>
- <dd><p>The number of most executed permilles, ranging from 0 to 1000, of the
- profiled execution of the entire program to which the execution count
- of a basic block must be part of in order to be considered hot. The
- default is 990, which means that a basic block is considered hot if
- its execution count contributes to the upper 990 permilles, or 99.0%,
- of the profiled execution of the entire program. 0 means that it is
- never considered hot. Used in LTO mode.
- </p>
- </dd>
- <dt><code>hot-bb-frequency-fraction</code></dt>
- <dd><p>The denominator n of fraction 1/n of the execution frequency of the
- entry block of a function that a basic block of this function needs
- to at least have in order to be considered hot. The default is 1000,
- which means that a basic block is considered hot in a function if it
- is executed more frequently than 1/1000 of the frequency of the entry
- block of the function. 0 means that it is never considered hot.
- </p>
- </dd>
- <dt><code>unlikely-bb-count-fraction</code></dt>
- <dd><p>The denominator n of fraction 1/n of the number of profiled runs of
- the entire program below which the execution count of a basic block
- must be in order for the basic block to be considered unlikely executed.
- The default is 20, which means that a basic block is considered unlikely
- executed if it is executed in fewer than 1/20, or 5%, of the runs of
- the program. 0 means that it is always considered unlikely executed.
- </p>
- </dd>
- <dt><code>max-predicted-iterations</code></dt>
- <dd><p>The maximum number of loop iterations we predict statically. This is useful
- in cases where a function contains a single loop with known bound and
- another loop with unknown bound.
- The known number of iterations is predicted correctly, while
- the unknown number of iterations average to roughly 10. This means that the
- loop without bounds appears artificially cold relative to the other one.
- </p>
- </dd>
- <dt><code>builtin-expect-probability</code></dt>
- <dd><p>Control the probability of the expression having the specified value. This
- parameter takes a percentage (i.e. 0 ... 100) as input.
- </p>
- </dd>
- <dt><code>builtin-string-cmp-inline-length</code></dt>
- <dd><p>The maximum length of a constant string for a builtin string cmp call
- eligible for inlining.
- </p>
- </dd>
- <dt><code>align-threshold</code></dt>
- <dd>
- <p>Select fraction of the maximal frequency of executions of a basic block in
- a function to align the basic block.
- </p>
- </dd>
- <dt><code>align-loop-iterations</code></dt>
- <dd>
- <p>A loop expected to iterate at least the selected number of iterations is
- aligned.
- </p>
- </dd>
- <dt><code>tracer-dynamic-coverage</code></dt>
- <dt><code>tracer-dynamic-coverage-feedback</code></dt>
- <dd>
- <p>This value is used to limit superblock formation once the given percentage of
- executed instructions is covered. This limits unnecessary code size
- expansion.
- </p>
- <p>The <samp>tracer-dynamic-coverage-feedback</samp> parameter
- is used only when profile
- feedback is available. The real profiles (as opposed to statically estimated
- ones) are much less balanced allowing the threshold to be larger value.
- </p>
- </dd>
- <dt><code>tracer-max-code-growth</code></dt>
- <dd><p>Stop tail duplication once code growth has reached given percentage. This is
- a rather artificial limit, as most of the duplicates are eliminated later in
- cross jumping, so it may be set to much higher values than is the desired code
- growth.
- </p>
- </dd>
- <dt><code>tracer-min-branch-ratio</code></dt>
- <dd>
- <p>Stop reverse growth when the reverse probability of best edge is less than this
- threshold (in percent).
- </p>
- </dd>
- <dt><code>tracer-min-branch-probability</code></dt>
- <dt><code>tracer-min-branch-probability-feedback</code></dt>
- <dd>
- <p>Stop forward growth if the best edge has probability lower than this
- threshold.
- </p>
- <p>Similarly to <samp>tracer-dynamic-coverage</samp> two parameters are
- provided. <samp>tracer-min-branch-probability-feedback</samp> is used for
- compilation with profile feedback and <samp>tracer-min-branch-probability</samp>
- compilation without. The value for compilation with profile feedback
- needs to be more conservative (higher) in order to make tracer
- effective.
- </p>
- </dd>
- <dt><code>stack-clash-protection-guard-size</code></dt>
- <dd><p>Specify the size of the operating system provided stack guard as
- 2 raised to <var>num</var> bytes. Higher values may reduce the
- number of explicit probes, but a value larger than the operating system
- provided guard will leave code vulnerable to stack clash style attacks.
- </p>
- </dd>
- <dt><code>stack-clash-protection-probe-interval</code></dt>
- <dd><p>Stack clash protection involves probing stack space as it is allocated. This
- param controls the maximum distance between probes into the stack as 2 raised
- to <var>num</var> bytes. Higher values may reduce the number of explicit probes, but a value
- larger than the operating system provided guard will leave code vulnerable to
- stack clash style attacks.
- </p>
- </dd>
- <dt><code>max-cse-path-length</code></dt>
- <dd>
- <p>The maximum number of basic blocks on path that CSE considers.
- </p>
- </dd>
- <dt><code>max-cse-insns</code></dt>
- <dd><p>The maximum number of instructions CSE processes before flushing.
- </p>
- </dd>
- <dt><code>ggc-min-expand</code></dt>
- <dd>
- <p>GCC uses a garbage collector to manage its own memory allocation. This
- parameter specifies the minimum percentage by which the garbage
- collector’s heap should be allowed to expand between collections.
- Tuning this may improve compilation speed; it has no effect on code
- generation.
- </p>
- <p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when
- RAM >= 1GB. If <code>getrlimit</code> is available, the notion of “RAM” is
- the smallest of actual RAM and <code>RLIMIT_DATA</code> or <code>RLIMIT_AS</code>. If
- GCC is not able to calculate RAM on a particular platform, the lower
- bound of 30% is used. Setting this parameter and
- <samp>ggc-min-heapsize</samp> to zero causes a full collection to occur at
- every opportunity. This is extremely slow, but can be useful for
- debugging.
- </p>
- </dd>
- <dt><code>ggc-min-heapsize</code></dt>
- <dd>
- <p>Minimum size of the garbage collector’s heap before it begins bothering
- to collect garbage. The first collection occurs after the heap expands
- by <samp>ggc-min-expand</samp>% beyond <samp>ggc-min-heapsize</samp>. Again,
- tuning this may improve compilation speed, and has no effect on code
- generation.
- </p>
- <p>The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that
- tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but
- with a lower bound of 4096 (four megabytes) and an upper bound of
- 131072 (128 megabytes). If GCC is not able to calculate RAM on a
- particular platform, the lower bound is used. Setting this parameter
- very large effectively disables garbage collection. Setting this
- parameter and <samp>ggc-min-expand</samp> to zero causes a full collection
- to occur at every opportunity.
- </p>
- </dd>
- <dt><code>max-reload-search-insns</code></dt>
- <dd><p>The maximum number of instruction reload should look backward for equivalent
- register. Increasing values mean more aggressive optimization, making the
- compilation time increase with probably slightly better performance.
- </p>
- </dd>
- <dt><code>max-cselib-memory-locations</code></dt>
- <dd><p>The maximum number of memory locations cselib should take into account.
- Increasing values mean more aggressive optimization, making the compilation time
- increase with probably slightly better performance.
- </p>
- </dd>
- <dt><code>max-sched-ready-insns</code></dt>
- <dd><p>The maximum number of instructions ready to be issued the scheduler should
- consider at any given time during the first scheduling pass. Increasing
- values mean more thorough searches, making the compilation time increase
- with probably little benefit.
- </p>
- </dd>
- <dt><code>max-sched-region-blocks</code></dt>
- <dd><p>The maximum number of blocks in a region to be considered for
- interblock scheduling.
- </p>
- </dd>
- <dt><code>max-pipeline-region-blocks</code></dt>
- <dd><p>The maximum number of blocks in a region to be considered for
- pipelining in the selective scheduler.
- </p>
- </dd>
- <dt><code>max-sched-region-insns</code></dt>
- <dd><p>The maximum number of insns in a region to be considered for
- interblock scheduling.
- </p>
- </dd>
- <dt><code>max-pipeline-region-insns</code></dt>
- <dd><p>The maximum number of insns in a region to be considered for
- pipelining in the selective scheduler.
- </p>
- </dd>
- <dt><code>min-spec-prob</code></dt>
- <dd><p>The minimum probability (in percents) of reaching a source block
- for interblock speculative scheduling.
- </p>
- </dd>
- <dt><code>max-sched-extend-regions-iters</code></dt>
- <dd><p>The maximum number of iterations through CFG to extend regions.
- A value of 0 disables region extensions.
- </p>
- </dd>
- <dt><code>max-sched-insn-conflict-delay</code></dt>
- <dd><p>The maximum conflict delay for an insn to be considered for speculative motion.
- </p>
- </dd>
- <dt><code>sched-spec-prob-cutoff</code></dt>
- <dd><p>The minimal probability of speculation success (in percents), so that
- speculative insns are scheduled.
- </p>
- </dd>
- <dt><code>sched-state-edge-prob-cutoff</code></dt>
- <dd><p>The minimum probability an edge must have for the scheduler to save its
- state across it.
- </p>
- </dd>
- <dt><code>sched-mem-true-dep-cost</code></dt>
- <dd><p>Minimal distance (in CPU cycles) between store and load targeting same
- memory locations.
- </p>
- </dd>
- <dt><code>selsched-max-lookahead</code></dt>
- <dd><p>The maximum size of the lookahead window of selective scheduling. It is a
- depth of search for available instructions.
- </p>
- </dd>
- <dt><code>selsched-max-sched-times</code></dt>
- <dd><p>The maximum number of times that an instruction is scheduled during
- selective scheduling. This is the limit on the number of iterations
- through which the instruction may be pipelined.
- </p>
- </dd>
- <dt><code>selsched-insns-to-rename</code></dt>
- <dd><p>The maximum number of best instructions in the ready list that are considered
- for renaming in the selective scheduler.
- </p>
- </dd>
- <dt><code>sms-min-sc</code></dt>
- <dd><p>The minimum value of stage count that swing modulo scheduler
- generates.
- </p>
- </dd>
- <dt><code>max-last-value-rtl</code></dt>
- <dd><p>The maximum size measured as number of RTLs that can be recorded in an expression
- in combiner for a pseudo register as last known value of that register.
- </p>
- </dd>
- <dt><code>max-combine-insns</code></dt>
- <dd><p>The maximum number of instructions the RTL combiner tries to combine.
- </p>
- </dd>
- <dt><code>integer-share-limit</code></dt>
- <dd><p>Small integer constants can use a shared data structure, reducing the
- compiler’s memory usage and increasing its speed. This sets the maximum
- value of a shared integer constant.
- </p>
- </dd>
- <dt><code>ssp-buffer-size</code></dt>
- <dd><p>The minimum size of buffers (i.e. arrays) that receive stack smashing
- protection when <samp>-fstack-protection</samp> is used.
- </p>
- </dd>
- <dt><code>min-size-for-stack-sharing</code></dt>
- <dd><p>The minimum size of variables taking part in stack slot sharing when not
- optimizing.
- </p>
- </dd>
- <dt><code>max-jump-thread-duplication-stmts</code></dt>
- <dd><p>Maximum number of statements allowed in a block that needs to be
- duplicated when threading jumps.
- </p>
- </dd>
- <dt><code>max-fields-for-field-sensitive</code></dt>
- <dd><p>Maximum number of fields in a structure treated in
- a field sensitive manner during pointer analysis.
- </p>
- </dd>
- <dt><code>prefetch-latency</code></dt>
- <dd><p>Estimate on average number of instructions that are executed before
- prefetch finishes. The distance prefetched ahead is proportional
- to this constant. Increasing this number may also lead to less
- streams being prefetched (see <samp>simultaneous-prefetches</samp>).
- </p>
- </dd>
- <dt><code>simultaneous-prefetches</code></dt>
- <dd><p>Maximum number of prefetches that can run at the same time.
- </p>
- </dd>
- <dt><code>l1-cache-line-size</code></dt>
- <dd><p>The size of cache line in L1 data cache, in bytes.
- </p>
- </dd>
- <dt><code>l1-cache-size</code></dt>
- <dd><p>The size of L1 data cache, in kilobytes.
- </p>
- </dd>
- <dt><code>l2-cache-size</code></dt>
- <dd><p>The size of L2 data cache, in kilobytes.
- </p>
- </dd>
- <dt><code>prefetch-dynamic-strides</code></dt>
- <dd><p>Whether the loop array prefetch pass should issue software prefetch hints
- for strides that are non-constant. In some cases this may be
- beneficial, though the fact the stride is non-constant may make it
- hard to predict when there is clear benefit to issuing these hints.
- </p>
- <p>Set to 1 if the prefetch hints should be issued for non-constant
- strides. Set to 0 if prefetch hints should be issued only for strides that
- are known to be constant and below <samp>prefetch-minimum-stride</samp>.
- </p>
- </dd>
- <dt><code>prefetch-minimum-stride</code></dt>
- <dd><p>Minimum constant stride, in bytes, to start using prefetch hints for. If
- the stride is less than this threshold, prefetch hints will not be issued.
- </p>
- <p>This setting is useful for processors that have hardware prefetchers, in
- which case there may be conflicts between the hardware prefetchers and
- the software prefetchers. If the hardware prefetchers have a maximum
- stride they can handle, it should be used here to improve the use of
- software prefetchers.
- </p>
- <p>A value of -1 means we don’t have a threshold and therefore
- prefetch hints can be issued for any constant stride.
- </p>
- <p>This setting is only useful for strides that are known and constant.
- </p>
- </dd>
- <dt><code>loop-interchange-max-num-stmts</code></dt>
- <dd><p>The maximum number of stmts in a loop to be interchanged.
- </p>
- </dd>
- <dt><code>loop-interchange-stride-ratio</code></dt>
- <dd><p>The minimum ratio between stride of two loops for interchange to be profitable.
- </p>
- </dd>
- <dt><code>min-insn-to-prefetch-ratio</code></dt>
- <dd><p>The minimum ratio between the number of instructions and the
- number of prefetches to enable prefetching in a loop.
- </p>
- </dd>
- <dt><code>prefetch-min-insn-to-mem-ratio</code></dt>
- <dd><p>The minimum ratio between the number of instructions and the
- number of memory references to enable prefetching in a loop.
- </p>
- </dd>
- <dt><code>use-canonical-types</code></dt>
- <dd><p>Whether the compiler should use the “canonical” type system.
- Should always be 1, which uses a more efficient internal
- mechanism for comparing types in C++ and Objective-C++. However, if
- bugs in the canonical type system are causing compilation failures,
- set this value to 0 to disable canonical types.
- </p>
- </dd>
- <dt><code>switch-conversion-max-branch-ratio</code></dt>
- <dd><p>Switch initialization conversion refuses to create arrays that are
- bigger than <samp>switch-conversion-max-branch-ratio</samp> times the number of
- branches in the switch.
- </p>
- </dd>
- <dt><code>max-partial-antic-length</code></dt>
- <dd><p>Maximum length of the partial antic set computed during the tree
- partial redundancy elimination optimization (<samp>-ftree-pre</samp>) when
- optimizing at <samp>-O3</samp> and above. For some sorts of source code
- the enhanced partial redundancy elimination optimization can run away,
- consuming all of the memory available on the host machine. This
- parameter sets a limit on the length of the sets that are computed,
- which prevents the runaway behavior. Setting a value of 0 for
- this parameter allows an unlimited set length.
- </p>
- </dd>
- <dt><code>rpo-vn-max-loop-depth</code></dt>
- <dd><p>Maximum loop depth that is value-numbered optimistically.
- When the limit hits the innermost
- <var>rpo-vn-max-loop-depth</var> loops and the outermost loop in the
- loop nest are value-numbered optimistically and the remaining ones not.
- </p>
- </dd>
- <dt><code>sccvn-max-alias-queries-per-access</code></dt>
- <dd><p>Maximum number of alias-oracle queries we perform when looking for
- redundancies for loads and stores. If this limit is hit the search
- is aborted and the load or store is not considered redundant. The
- number of queries is algorithmically limited to the number of
- stores on all paths from the load to the function entry.
- </p>
- </dd>
- <dt><code>ira-max-loops-num</code></dt>
- <dd><p>IRA uses regional register allocation by default. If a function
- contains more loops than the number given by this parameter, only at most
- the given number of the most frequently-executed loops form regions
- for regional register allocation.
- </p>
- </dd>
- <dt><code>ira-max-conflict-table-size</code></dt>
- <dd><p>Although IRA uses a sophisticated algorithm to compress the conflict
- table, the table can still require excessive amounts of memory for
- huge functions. If the conflict table for a function could be more
- than the size in MB given by this parameter, the register allocator
- instead uses a faster, simpler, and lower-quality
- algorithm that does not require building a pseudo-register conflict table.
- </p>
- </dd>
- <dt><code>ira-loop-reserved-regs</code></dt>
- <dd><p>IRA can be used to evaluate more accurate register pressure in loops
- for decisions to move loop invariants (see <samp>-O3</samp>). The number
- of available registers reserved for some other purposes is given
- by this parameter. Default of the parameter
- is the best found from numerous experiments.
- </p>
- </dd>
- <dt><code>lra-inheritance-ebb-probability-cutoff</code></dt>
- <dd><p>LRA tries to reuse values reloaded in registers in subsequent insns.
- This optimization is called inheritance. EBB is used as a region to
- do this optimization. The parameter defines a minimal fall-through
- edge probability in percentage used to add BB to inheritance EBB in
- LRA. The default value was chosen
- from numerous runs of SPEC2000 on x86-64.
- </p>
- </dd>
- <dt><code>loop-invariant-max-bbs-in-loop</code></dt>
- <dd><p>Loop invariant motion can be very expensive, both in compilation time and
- in amount of needed compile-time memory, with very large loops. Loops
- with more basic blocks than this parameter won’t have loop invariant
- motion optimization performed on them.
- </p>
- </dd>
- <dt><code>loop-max-datarefs-for-datadeps</code></dt>
- <dd><p>Building data dependencies is expensive for very large loops. This
- parameter limits the number of data references in loops that are
- considered for data dependence analysis. These large loops are no
- handled by the optimizations using loop data dependencies.
- </p>
- </dd>
- <dt><code>max-vartrack-size</code></dt>
- <dd><p>Sets a maximum number of hash table slots to use during variable
- tracking dataflow analysis of any function. If this limit is exceeded
- with variable tracking at assignments enabled, analysis for that
- function is retried without it, after removing all debug insns from
- the function. If the limit is exceeded even without debug insns, var
- tracking analysis is completely disabled for the function. Setting
- the parameter to zero makes it unlimited.
- </p>
- </dd>
- <dt><code>max-vartrack-expr-depth</code></dt>
- <dd><p>Sets a maximum number of recursion levels when attempting to map
- variable names or debug temporaries to value expressions. This trades
- compilation time for more complete debug information. If this is set too
- low, value expressions that are available and could be represented in
- debug information may end up not being used; setting this higher may
- enable the compiler to find more complex debug expressions, but compile
- time and memory use may grow.
- </p>
- </dd>
- <dt><code>max-debug-marker-count</code></dt>
- <dd><p>Sets a threshold on the number of debug markers (e.g. begin stmt
- markers) to avoid complexity explosion at inlining or expanding to RTL.
- If a function has more such gimple stmts than the set limit, such stmts
- will be dropped from the inlined copy of a function, and from its RTL
- expansion.
- </p>
- </dd>
- <dt><code>min-nondebug-insn-uid</code></dt>
- <dd><p>Use uids starting at this parameter for nondebug insns. The range below
- the parameter is reserved exclusively for debug insns created by
- <samp>-fvar-tracking-assignments</samp>, but debug insns may get
- (non-overlapping) uids above it if the reserved range is exhausted.
- </p>
- </dd>
- <dt><code>ipa-sra-ptr-growth-factor</code></dt>
- <dd><p>IPA-SRA replaces a pointer to an aggregate with one or more new
- parameters only when their cumulative size is less or equal to
- <samp>ipa-sra-ptr-growth-factor</samp> times the size of the original
- pointer parameter.
- </p>
- </dd>
- <dt><code>ipa-sra-max-replacements</code></dt>
- <dd><p>Maximum pieces of an aggregate that IPA-SRA tracks. As a
- consequence, it is also the maximum number of replacements of a formal
- parameter.
- </p>
- </dd>
- <dt><code>sra-max-scalarization-size-Ospeed</code></dt>
- <dt><code>sra-max-scalarization-size-Osize</code></dt>
- <dd><p>The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to
- replace scalar parts of aggregates with uses of independent scalar
- variables. These parameters control the maximum size, in storage units,
- of aggregate which is considered for replacement when compiling for
- speed
- (<samp>sra-max-scalarization-size-Ospeed</samp>) or size
- (<samp>sra-max-scalarization-size-Osize</samp>) respectively.
- </p>
- </dd>
- <dt><code>sra-max-propagations</code></dt>
- <dd><p>The maximum number of artificial accesses that Scalar Replacement of
- Aggregates (SRA) will track, per one local variable, in order to
- facilitate copy propagation.
- </p>
- </dd>
- <dt><code>tm-max-aggregate-size</code></dt>
- <dd><p>When making copies of thread-local variables in a transaction, this
- parameter specifies the size in bytes after which variables are
- saved with the logging functions as opposed to save/restore code
- sequence pairs. This option only applies when using
- <samp>-fgnu-tm</samp>.
- </p>
- </dd>
- <dt><code>graphite-max-nb-scop-params</code></dt>
- <dd><p>To avoid exponential effects in the Graphite loop transforms, the
- number of parameters in a Static Control Part (SCoP) is bounded.
- A value of zero can be used to lift
- the bound. A variable whose value is unknown at compilation time and
- defined outside a SCoP is a parameter of the SCoP.
- </p>
- </dd>
- <dt><code>loop-block-tile-size</code></dt>
- <dd><p>Loop blocking or strip mining transforms, enabled with
- <samp>-floop-block</samp> or <samp>-floop-strip-mine</samp>, strip mine each
- loop in the loop nest by a given number of iterations. The strip
- length can be changed using the <samp>loop-block-tile-size</samp>
- parameter.
- </p>
- </dd>
- <dt><code>ipa-cp-value-list-size</code></dt>
- <dd><p>IPA-CP attempts to track all possible values and types passed to a function’s
- parameter in order to propagate them and perform devirtualization.
- <samp>ipa-cp-value-list-size</samp> is the maximum number of values and types it
- stores per one formal parameter of a function.
- </p>
- </dd>
- <dt><code>ipa-cp-eval-threshold</code></dt>
- <dd><p>IPA-CP calculates its own score of cloning profitability heuristics
- and performs those cloning opportunities with scores that exceed
- <samp>ipa-cp-eval-threshold</samp>.
- </p>
- </dd>
- <dt><code>ipa-cp-max-recursive-depth</code></dt>
- <dd><p>Maximum depth of recursive cloning for self-recursive function.
- </p>
- </dd>
- <dt><code>ipa-cp-min-recursive-probability</code></dt>
- <dd><p>Recursive cloning only when the probability of call being executed exceeds
- the parameter.
- </p>
- </dd>
- <dt><code>ipa-cp-recursion-penalty</code></dt>
- <dd><p>Percentage penalty the recursive functions will receive when they
- are evaluated for cloning.
- </p>
- </dd>
- <dt><code>ipa-cp-single-call-penalty</code></dt>
- <dd><p>Percentage penalty functions containing a single call to another
- function will receive when they are evaluated for cloning.
- </p>
- </dd>
- <dt><code>ipa-max-agg-items</code></dt>
- <dd><p>IPA-CP is also capable to propagate a number of scalar values passed
- in an aggregate. <samp>ipa-max-agg-items</samp> controls the maximum
- number of such values per one parameter.
- </p>
- </dd>
- <dt><code>ipa-cp-loop-hint-bonus</code></dt>
- <dd><p>When IPA-CP determines that a cloning candidate would make the number
- of iterations of a loop known, it adds a bonus of
- <samp>ipa-cp-loop-hint-bonus</samp> to the profitability score of
- the candidate.
- </p>
- </dd>
- <dt><code>ipa-max-aa-steps</code></dt>
- <dd><p>During its analysis of function bodies, IPA-CP employs alias analysis
- in order to track values pointed to by function parameters. In order
- not spend too much time analyzing huge functions, it gives up and
- consider all memory clobbered after examining
- <samp>ipa-max-aa-steps</samp> statements modifying memory.
- </p>
- </dd>
- <dt><code>ipa-max-switch-predicate-bounds</code></dt>
- <dd><p>Maximal number of boundary endpoints of case ranges of switch statement.
- For switch exceeding this limit, IPA-CP will not construct cloning cost
- predicate, which is used to estimate cloning benefit, for default case
- of the switch statement.
- </p>
- </dd>
- <dt><code>ipa-max-param-expr-ops</code></dt>
- <dd><p>IPA-CP will analyze conditional statement that references some function
- parameter to estimate benefit for cloning upon certain constant value.
- But if number of operations in a parameter expression exceeds
- <samp>ipa-max-param-expr-ops</samp>, the expression is treated as complicated
- one, and is not handled by IPA analysis.
- </p>
- </dd>
- <dt><code>lto-partitions</code></dt>
- <dd><p>Specify desired number of partitions produced during WHOPR compilation.
- The number of partitions should exceed the number of CPUs used for compilation.
- </p>
- </dd>
- <dt><code>lto-min-partition</code></dt>
- <dd><p>Size of minimal partition for WHOPR (in estimated instructions).
- This prevents expenses of splitting very small programs into too many
- partitions.
- </p>
- </dd>
- <dt><code>lto-max-partition</code></dt>
- <dd><p>Size of max partition for WHOPR (in estimated instructions).
- to provide an upper bound for individual size of partition.
- Meant to be used only with balanced partitioning.
- </p>
- </dd>
- <dt><code>lto-max-streaming-parallelism</code></dt>
- <dd><p>Maximal number of parallel processes used for LTO streaming.
- </p>
- </dd>
- <dt><code>cxx-max-namespaces-for-diagnostic-help</code></dt>
- <dd><p>The maximum number of namespaces to consult for suggestions when C++
- name lookup fails for an identifier.
- </p>
- </dd>
- <dt><code>sink-frequency-threshold</code></dt>
- <dd><p>The maximum relative execution frequency (in percents) of the target block
- relative to a statement’s original block to allow statement sinking of a
- statement. Larger numbers result in more aggressive statement sinking.
- A small positive adjustment is applied for
- statements with memory operands as those are even more profitable so sink.
- </p>
- </dd>
- <dt><code>max-stores-to-sink</code></dt>
- <dd><p>The maximum number of conditional store pairs that can be sunk. Set to 0
- if either vectorization (<samp>-ftree-vectorize</samp>) or if-conversion
- (<samp>-ftree-loop-if-convert</samp>) is disabled.
- </p>
- </dd>
- <dt><code>case-values-threshold</code></dt>
- <dd><p>The smallest number of different values for which it is best to use a
- jump-table instead of a tree of conditional branches. If the value is
- 0, use the default for the machine.
- </p>
- </dd>
- <dt><code>jump-table-max-growth-ratio-for-size</code></dt>
- <dd><p>The maximum code size growth ratio when expanding
- into a jump table (in percent). The parameter is used when
- optimizing for size.
- </p>
- </dd>
- <dt><code>jump-table-max-growth-ratio-for-speed</code></dt>
- <dd><p>The maximum code size growth ratio when expanding
- into a jump table (in percent). The parameter is used when
- optimizing for speed.
- </p>
- </dd>
- <dt><code>tree-reassoc-width</code></dt>
- <dd><p>Set the maximum number of instructions executed in parallel in
- reassociated tree. This parameter overrides target dependent
- heuristics used by default if has non zero value.
- </p>
- </dd>
- <dt><code>sched-pressure-algorithm</code></dt>
- <dd><p>Choose between the two available implementations of
- <samp>-fsched-pressure</samp>. Algorithm 1 is the original implementation
- and is the more likely to prevent instructions from being reordered.
- Algorithm 2 was designed to be a compromise between the relatively
- conservative approach taken by algorithm 1 and the rather aggressive
- approach taken by the default scheduler. It relies more heavily on
- having a regular register file and accurate register pressure classes.
- See <samp>haifa-sched.c</samp> in the GCC sources for more details.
- </p>
- <p>The default choice depends on the target.
- </p>
- </dd>
- <dt><code>max-slsr-cand-scan</code></dt>
- <dd><p>Set the maximum number of existing candidates that are considered when
- seeking a basis for a new straight-line strength reduction candidate.
- </p>
- </dd>
- <dt><code>asan-globals</code></dt>
- <dd><p>Enable buffer overflow detection for global objects. This kind
- of protection is enabled by default if you are using
- <samp>-fsanitize=address</samp> option.
- To disable global objects protection use <samp>--param asan-globals=0</samp>.
- </p>
- </dd>
- <dt><code>asan-stack</code></dt>
- <dd><p>Enable buffer overflow detection for stack objects. This kind of
- protection is enabled by default when using <samp>-fsanitize=address</samp>.
- To disable stack protection use <samp>--param asan-stack=0</samp> option.
- </p>
- </dd>
- <dt><code>asan-instrument-reads</code></dt>
- <dd><p>Enable buffer overflow detection for memory reads. This kind of
- protection is enabled by default when using <samp>-fsanitize=address</samp>.
- To disable memory reads protection use
- <samp>--param asan-instrument-reads=0</samp>.
- </p>
- </dd>
- <dt><code>asan-instrument-writes</code></dt>
- <dd><p>Enable buffer overflow detection for memory writes. This kind of
- protection is enabled by default when using <samp>-fsanitize=address</samp>.
- To disable memory writes protection use
- <samp>--param asan-instrument-writes=0</samp> option.
- </p>
- </dd>
- <dt><code>asan-memintrin</code></dt>
- <dd><p>Enable detection for built-in functions. This kind of protection
- is enabled by default when using <samp>-fsanitize=address</samp>.
- To disable built-in functions protection use
- <samp>--param asan-memintrin=0</samp>.
- </p>
- </dd>
- <dt><code>asan-use-after-return</code></dt>
- <dd><p>Enable detection of use-after-return. This kind of protection
- is enabled by default when using the <samp>-fsanitize=address</samp> option.
- To disable it use <samp>--param asan-use-after-return=0</samp>.
- </p>
- <p>Note: By default the check is disabled at run time. To enable it,
- add <code>detect_stack_use_after_return=1</code> to the environment variable
- <code>ASAN_OPTIONS</code>.
- </p>
- </dd>
- <dt><code>asan-instrumentation-with-call-threshold</code></dt>
- <dd><p>If number of memory accesses in function being instrumented
- is greater or equal to this number, use callbacks instead of inline checks.
- E.g. to disable inline code use
- <samp>--param asan-instrumentation-with-call-threshold=0</samp>.
- </p>
- </dd>
- <dt><code>use-after-scope-direct-emission-threshold</code></dt>
- <dd><p>If the size of a local variable in bytes is smaller or equal to this
- number, directly poison (or unpoison) shadow memory instead of using
- run-time callbacks.
- </p>
- </dd>
- <dt><code>max-fsm-thread-path-insns</code></dt>
- <dd><p>Maximum number of instructions to copy when duplicating blocks on a
- finite state automaton jump thread path.
- </p>
- </dd>
- <dt><code>max-fsm-thread-length</code></dt>
- <dd><p>Maximum number of basic blocks on a finite state automaton jump thread
- path.
- </p>
- </dd>
- <dt><code>max-fsm-thread-paths</code></dt>
- <dd><p>Maximum number of new jump thread paths to create for a finite state
- automaton.
- </p>
- </dd>
- <dt><code>parloops-chunk-size</code></dt>
- <dd><p>Chunk size of omp schedule for loops parallelized by parloops.
- </p>
- </dd>
- <dt><code>parloops-schedule</code></dt>
- <dd><p>Schedule type of omp schedule for loops parallelized by parloops (static,
- dynamic, guided, auto, runtime).
- </p>
- </dd>
- <dt><code>parloops-min-per-thread</code></dt>
- <dd><p>The minimum number of iterations per thread of an innermost parallelized
- loop for which the parallelized variant is preferred over the single threaded
- one. Note that for a parallelized loop nest the
- minimum number of iterations of the outermost loop per thread is two.
- </p>
- </dd>
- <dt><code>max-ssa-name-query-depth</code></dt>
- <dd><p>Maximum depth of recursion when querying properties of SSA names in things
- like fold routines. One level of recursion corresponds to following a
- use-def chain.
- </p>
- </dd>
- <dt><code>hsa-gen-debug-stores</code></dt>
- <dd><p>Enable emission of special debug stores within HSA kernels which are
- then read and reported by libgomp plugin. Generation of these stores
- is disabled by default, use <samp>--param hsa-gen-debug-stores=1</samp> to
- enable it.
- </p>
- </dd>
- <dt><code>max-speculative-devirt-maydefs</code></dt>
- <dd><p>The maximum number of may-defs we analyze when looking for a must-def
- specifying the dynamic type of an object that invokes a virtual call
- we may be able to devirtualize speculatively.
- </p>
- </dd>
- <dt><code>max-vrp-switch-assertions</code></dt>
- <dd><p>The maximum number of assertions to add along the default edge of a switch
- statement during VRP.
- </p>
- </dd>
- <dt><code>unroll-jam-min-percent</code></dt>
- <dd><p>The minimum percentage of memory references that must be optimized
- away for the unroll-and-jam transformation to be considered profitable.
- </p>
- </dd>
- <dt><code>unroll-jam-max-unroll</code></dt>
- <dd><p>The maximum number of times the outer loop should be unrolled by
- the unroll-and-jam transformation.
- </p>
- </dd>
- <dt><code>max-rtl-if-conversion-unpredictable-cost</code></dt>
- <dd><p>Maximum permissible cost for the sequence that would be generated
- by the RTL if-conversion pass for a branch that is considered unpredictable.
- </p>
- </dd>
- <dt><code>max-variable-expansions-in-unroller</code></dt>
- <dd><p>If <samp>-fvariable-expansion-in-unroller</samp> is used, the maximum number
- of times that an individual variable will be expanded during loop unrolling.
- </p>
- </dd>
- <dt><code>tracer-min-branch-probability-feedback</code></dt>
- <dd><p>Stop forward growth if the probability of best edge is less than
- this threshold (in percent). Used when profile feedback is available.
- </p>
- </dd>
- <dt><code>partial-inlining-entry-probability</code></dt>
- <dd><p>Maximum probability of the entry BB of split region
- (in percent relative to entry BB of the function)
- to make partial inlining happen.
- </p>
- </dd>
- <dt><code>max-tracked-strlens</code></dt>
- <dd><p>Maximum number of strings for which strlen optimization pass will
- track string lengths.
- </p>
- </dd>
- <dt><code>gcse-after-reload-partial-fraction</code></dt>
- <dd><p>The threshold ratio for performing partial redundancy
- elimination after reload.
- </p>
- </dd>
- <dt><code>gcse-after-reload-critical-fraction</code></dt>
- <dd><p>The threshold ratio of critical edges execution count that
- permit performing redundancy elimination after reload.
- </p>
- </dd>
- <dt><code>max-loop-header-insns</code></dt>
- <dd><p>The maximum number of insns in loop header duplicated
- by the copy loop headers pass.
- </p>
- </dd>
- <dt><code>vect-epilogues-nomask</code></dt>
- <dd><p>Enable loop epilogue vectorization using smaller vector size.
- </p>
- </dd>
- <dt><code>slp-max-insns-in-bb</code></dt>
- <dd><p>Maximum number of instructions in basic block to be
- considered for SLP vectorization.
- </p>
- </dd>
- <dt><code>avoid-fma-max-bits</code></dt>
- <dd><p>Maximum number of bits for which we avoid creating FMAs.
- </p>
- </dd>
- <dt><code>sms-loop-average-count-threshold</code></dt>
- <dd><p>A threshold on the average loop count considered by the swing modulo scheduler.
- </p>
- </dd>
- <dt><code>sms-dfa-history</code></dt>
- <dd><p>The number of cycles the swing modulo scheduler considers when checking
- conflicts using DFA.
- </p>
- </dd>
- <dt><code>max-inline-insns-recursive-auto</code></dt>
- <dd><p>The maximum number of instructions non-inline function
- can grow to via recursive inlining.
- </p>
- </dd>
- <dt><code>graphite-allow-codegen-errors</code></dt>
- <dd><p>Whether codegen errors should be ICEs when <samp>-fchecking</samp>.
- </p>
- </dd>
- <dt><code>sms-max-ii-factor</code></dt>
- <dd><p>A factor for tuning the upper bound that swing modulo scheduler
- uses for scheduling a loop.
- </p>
- </dd>
- <dt><code>lra-max-considered-reload-pseudos</code></dt>
- <dd><p>The max number of reload pseudos which are considered during
- spilling a non-reload pseudo.
- </p>
- </dd>
- <dt><code>max-pow-sqrt-depth</code></dt>
- <dd><p>Maximum depth of sqrt chains to use when synthesizing exponentiation
- by a real constant.
- </p>
- </dd>
- <dt><code>max-dse-active-local-stores</code></dt>
- <dd><p>Maximum number of active local stores in RTL dead store elimination.
- </p>
- </dd>
- <dt><code>asan-instrument-allocas</code></dt>
- <dd><p>Enable asan allocas/VLAs protection.
- </p>
- </dd>
- <dt><code>max-iterations-computation-cost</code></dt>
- <dd><p>Bound on the cost of an expression to compute the number of iterations.
- </p>
- </dd>
- <dt><code>max-isl-operations</code></dt>
- <dd><p>Maximum number of isl operations, 0 means unlimited.
- </p>
- </dd>
- <dt><code>graphite-max-arrays-per-scop</code></dt>
- <dd><p>Maximum number of arrays per scop.
- </p>
- </dd>
- <dt><code>max-vartrack-reverse-op-size</code></dt>
- <dd><p>Max. size of loc list for which reverse ops should be added.
- </p>
- </dd>
- <dt><code>tracer-dynamic-coverage-feedback</code></dt>
- <dd><p>The percentage of function, weighted by execution frequency,
- that must be covered by trace formation.
- Used when profile feedback is available.
- </p>
- </dd>
- <dt><code>max-inline-recursive-depth-auto</code></dt>
- <dd><p>The maximum depth of recursive inlining for non-inline functions.
- </p>
- </dd>
- <dt><code>fsm-scale-path-stmts</code></dt>
- <dd><p>Scale factor to apply to the number of statements in a threading path
- when comparing to the number of (scaled) blocks.
- </p>
- </dd>
- <dt><code>fsm-maximum-phi-arguments</code></dt>
- <dd><p>Maximum number of arguments a PHI may have before the FSM threader
- will not try to thread through its block.
- </p>
- </dd>
- <dt><code>uninit-control-dep-attempts</code></dt>
- <dd><p>Maximum number of nested calls to search for control dependencies
- during uninitialized variable analysis.
- </p>
- </dd>
- <dt><code>sra-max-scalarization-size-Osize</code></dt>
- <dd><p>Maximum size, in storage units, of an aggregate
- which should be considered for scalarization when compiling for size.
- </p>
- </dd>
- <dt><code>fsm-scale-path-blocks</code></dt>
- <dd><p>Scale factor to apply to the number of blocks in a threading path
- when comparing to the number of (scaled) statements.
- </p>
- </dd>
- <dt><code>sched-autopref-queue-depth</code></dt>
- <dd><p>Hardware autoprefetcher scheduler model control flag.
- Number of lookahead cycles the model looks into; at ’
- ’ only enable instruction sorting heuristic.
- </p>
- </dd>
- <dt><code>loop-versioning-max-inner-insns</code></dt>
- <dd><p>The maximum number of instructions that an inner loop can have
- before the loop versioning pass considers it too big to copy.
- </p>
- </dd>
- <dt><code>loop-versioning-max-outer-insns</code></dt>
- <dd><p>The maximum number of instructions that an outer loop can have
- before the loop versioning pass considers it too big to copy,
- discounting any instructions in inner loops that directly benefit
- from versioning.
- </p>
- </dd>
- <dt><code>ssa-name-def-chain-limit</code></dt>
- <dd><p>The maximum number of SSA_NAME assignments to follow in determining
- a property of a variable such as its value. This limits the number
- of iterations or recursive calls GCC performs when optimizing certain
- statements or when determining their validity prior to issuing
- diagnostics.
- </p>
- </dd>
- <dt><code>store-merging-max-size</code></dt>
- <dd><p>Maximum size of a single store merging region in bytes.
- </p>
- </dd>
- <dt><code>hash-table-verification-limit</code></dt>
- <dd><p>The number of elements for which hash table verification is done
- for each searched element.
- </p>
- </dd>
- <dt><code>max-find-base-term-values</code></dt>
- <dd><p>Maximum number of VALUEs handled during a single find_base_term call.
- </p>
- </dd>
- <dt><code>analyzer-max-enodes-per-program-point</code></dt>
- <dd><p>The maximum number of exploded nodes per program point within
- the analyzer, before terminating analysis of that point.
- </p>
- </dd>
- <dt><code>analyzer-min-snodes-for-call-summary</code></dt>
- <dd><p>The minimum number of supernodes within a function for the
- analyzer to consider summarizing its effects at call sites.
- </p>
- </dd>
- <dt><code>analyzer-max-recursion-depth</code></dt>
- <dd><p>The maximum number of times a callsite can appear in a call stack
- within the analyzer, before terminating analysis of a call that would
- recurse deeper.
- </p>
- </dd>
- <dt><code>gimple-fe-computed-hot-bb-threshold</code></dt>
- <dd><p>The number of executions of a basic block which is considered hot.
- The parameter is used only in GIMPLE FE.
- </p>
- </dd>
- <dt><code>analyzer-bb-explosion-factor</code></dt>
- <dd><p>The maximum number of ’after supernode’ exploded nodes within the analyzer
- per supernode, before terminating analysis.
- </p>
- </dd>
- </dl>
-
- <p>The following choices of <var>name</var> are available on AArch64 targets:
- </p>
- <dl compact="compact">
- <dt><code>aarch64-sve-compare-costs</code></dt>
- <dd><p>When vectorizing for SVE, consider using “unpacked” vectors for
- smaller elements and use the cost model to pick the cheapest approach.
- Also use the cost model to choose between SVE and Advanced SIMD vectorization.
- </p>
- <p>Using unpacked vectors includes storing smaller elements in larger
- containers and accessing elements with extending loads and truncating
- stores.
- </p>
- </dd>
- <dt><code>aarch64-float-recp-precision</code></dt>
- <dd><p>The number of Newton iterations for calculating the reciprocal for float type.
- The precision of division is proportional to this param when division
- approximation is enabled. The default value is 1.
- </p>
- </dd>
- <dt><code>aarch64-double-recp-precision</code></dt>
- <dd><p>The number of Newton iterations for calculating the reciprocal for double type.
- The precision of division is propotional to this param when division
- approximation is enabled. The default value is 2.
- </p>
- </dd>
- </dl>
-
- </dd>
- </dl>
-
- <hr>
- <div class="header">
- <p>
- Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
- </div>
-
-
-
- </body>
- </html>
|