Next: WHOPR, Previous: LTO object file layout, Up: LTO [Contents][Index]
Programs are represented internally as a callgraph (a multi-graph where nodes are functions and edges are call sites) and a varpool (a list of static and external variables in the program).
The inter-procedural optimization is organized as a sequence of individual passes, which operate on the callgraph and the varpool. To make the implementation of WHOPR possible, every inter-procedural optimization pass is split into several stages that are executed at different times during WHOPR compilation:
generate_summary
in
struct ipa_opt_pass_d
). This stage analyzes every function
body and variable initializer is examined and stores relevant
information into a pass-specific data structure.
write_summary
in
struct ipa_opt_pass_d
). This stage writes all the
pass-specific information generated by generate_summary
.
Summaries go into their own LTO_section_*
sections that
have to be declared in lto-streamer.h:enum
lto_section_type
. A new section is created by calling
create_output_block
and data can be written using the
lto_output_*
routines.
read_summary
in
struct ipa_opt_pass_d
). This stage reads all the
pass-specific information in exactly the same order that it was
written by write_summary
.
execute
in struct
opt_pass
). This performs inter-procedural propagation. This
must be done without actual access to the individual function
bodies or variable initializers. Typically, this results in a
transitive closure operation over the summary information of all
the nodes in the callgraph.
write_optimization_summary
in struct
ipa_opt_pass_d
). This writes the result of the inter-procedural
propagation into the object file. This can use the same data
structures and helper routines used in write_summary
.
read_optimization_summary
in struct
ipa_opt_pass_d
). The counterpart to
write_optimization_summary
. This reads the interprocedural
optimization decisions in exactly the same format emitted by
write_optimization_summary
.
function_transform
and
variable_transform
in struct ipa_opt_pass_d
).
The actual function bodies and variable initializers are updated
based on the information passed down from the Execute stage.
The implementation of the inter-procedural passes are shared between LTO, WHOPR and classic non-LTO compilation.
To simplify development, the GCC pass manager differentiates
between normal inter-procedural passes (see Regular IPA passes),
small inter-procedural passes (see Small IPA passes)
and late inter-procedural passes (see Late IPA passes).
A small or late IPA pass (SIMPLE_IPA_PASS
) does
everything at once and thus cannot be executed during WPA in
WHOPR mode. It defines only the Execute stage and during
this stage it accesses and modifies the function bodies. Such
passes are useful for optimization at LGEN or LTRANS time and are
used, for example, to implement early optimization before writing
object files. The simple inter-procedural passes can also be used
for easier prototyping and development of a new inter-procedural
pass.
One of the main challenges of introducing the WHOPR compilation mode was addressing the interactions between optimization passes. In LTO compilation mode, the passes are executed in a sequence, each of which consists of analysis (or Generate summary), propagation (or Execute) and Transform stages. Once the work of one pass is finished, the next pass sees the updated program representation and can execute. This makes the individual passes dependent on each other.
In WHOPR mode all passes first execute their Generate summary stage. Then summary writing marks the end of the LGEN stage. At WPA time, the summaries are read back into memory and all passes run the Execute stage. Optimization summaries are streamed and sent to LTRANS, where all the passes execute the Transform stage.
Most optimization passes split naturally into analysis, propagation and transformation stages. But some do not. The main problem arises when one pass performs changes and the following pass gets confused by seeing different callgraphs between the Transform stage and the Generate summary or Execute stage. This means that the passes are required to communicate their decisions with each other.
To facilitate this communication, the GCC callgraph infrastructure implements virtual clones, a method of representing the changes performed by the optimization passes in the callgraph without needing to update function bodies.
A virtual clone in the callgraph is a function that has no associated body, just a description of how to create its body based on a different function (which itself may be a virtual clone).
The description of function modifications includes adjustments to the function’s signature (which allows, for example, removing or adding function arguments), substitutions to perform on the function body, and, for inlined functions, a pointer to the function that it will be inlined into.
It is also possible to redirect any edge of the callgraph from a function to its virtual clone. This implies updating of the call site to adjust for the new function signature.
Most of the transformations performed by inter-procedural optimizations can be represented via virtual clones. For instance, a constant propagation pass can produce a virtual clone of the function which replaces one of its arguments by a constant. The inliner can represent its decisions by producing a clone of a function whose body will be later integrated into a given function.
Using virtual clones, the program can be easily updated during the Execute stage, solving most of pass interactions problems that would otherwise occur during Transform.
Virtual clones are later materialized in the LTRANS stage and turned into real functions. Passes executed after the virtual clone were introduced also perform their Transform stage on new functions, so for a pass there is no significant difference between operating on a real function or a virtual clone introduced before its Execute stage.
Optimization passes then work on virtual clones introduced before their Execute stage as if they were real functions. The only difference is that clones are not visible during the Generate Summary stage.
To keep function summaries updated, the callgraph interface allows an optimizer to register a callback that is called every time a new clone is introduced as well as when the actual function or variable is generated or when a function or variable is removed. These hooks are registered in the Generate summary stage and allow the pass to keep its information intact until the Execute stage. The same hooks can also be registered during the Execute stage to keep the optimization summaries updated for the Transform stage.
GCC represents IPA references in the callgraph. For a function
or variable A
, the IPA reference is a list of all
locations where the address of A
is taken and, when
A
is a variable, a list of all direct stores and reads
to/from A
. References represent an oriented multi-graph on
the union of nodes of the callgraph and the varpool. See
ipa-reference.c:ipa_reference_write_optimization_summary
and
ipa-reference.c:ipa_reference_read_optimization_summary
for details.
Suppose that an optimization pass sees a function A
and it
knows the values of (some of) its arguments. The jump
function describes the value of a parameter of a given function
call in function A
based on this knowledge.
Jump functions are used by several optimizations, such as the inter-procedural constant propagation pass and the devirtualization pass. The inliner also uses jump functions to perform inlining of callbacks.
Next: WHOPR, Previous: LTO object file layout, Up: LTO [Contents][Index]