| 
							- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
 - <html>
 - <!-- This file documents the gprof profiler of the GNU system.
 - 
 - Copyright (C) 1988-2020 Free Software Foundation, Inc.
 - 
 - Permission is granted to copy, distribute and/or modify this document
 - under the terms of the GNU Free Documentation License, Version 1.3
 - or any later version published by the Free Software Foundation;
 - with no Invariant Sections, with no Front-Cover Texts, and with no
 - Back-Cover Texts.  A copy of the license is included in the
 - section entitled "GNU Free Documentation License".
 -  -->
 - <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
 - <head>
 - <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 - <title>Implementation (GNU gprof)</title>
 - 
 - <meta name="description" content="Implementation (GNU gprof)">
 - <meta name="keywords" content="Implementation (GNU gprof)">
 - <meta name="resource-type" content="document">
 - <meta name="distribution" content="global">
 - <meta name="Generator" content="makeinfo">
 - <link href="index.html#Top" rel="start" title="Top">
 - <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
 - <link href="Details.html#Details" rel="up" title="Details">
 - <link href="File-Format.html#File-Format" rel="next" title="File Format">
 - <link href="Details.html#Details" rel="prev" title="Details">
 - <style type="text/css">
 - <!--
 - a.summary-letter {text-decoration: none}
 - blockquote.indentedblock {margin-right: 0em}
 - blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
 - blockquote.smallquotation {font-size: smaller}
 - div.display {margin-left: 3.2em}
 - div.example {margin-left: 3.2em}
 - div.lisp {margin-left: 3.2em}
 - div.smalldisplay {margin-left: 3.2em}
 - div.smallexample {margin-left: 3.2em}
 - div.smalllisp {margin-left: 3.2em}
 - kbd {font-style: oblique}
 - pre.display {font-family: inherit}
 - pre.format {font-family: inherit}
 - pre.menu-comment {font-family: serif}
 - pre.menu-preformatted {font-family: serif}
 - pre.smalldisplay {font-family: inherit; font-size: smaller}
 - pre.smallexample {font-size: smaller}
 - pre.smallformat {font-family: inherit; font-size: smaller}
 - pre.smalllisp {font-size: smaller}
 - span.nolinebreak {white-space: nowrap}
 - span.roman {font-family: initial; font-weight: normal}
 - span.sansserif {font-family: sans-serif; font-weight: normal}
 - ul.no-bullet {list-style: none}
 - -->
 - </style>
 - 
 - 
 - </head>
 - 
 - <body lang="en">
 - <a name="Implementation"></a>
 - <div class="header">
 - <p>
 - Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
 - </div>
 - <hr>
 - <a name="Implementation-of-Profiling"></a>
 - <h3 class="section">9.1 Implementation of Profiling</h3>
 - 
 - <p>Profiling works by changing how every function in your program is compiled
 - so that when it is called, it will stash away some information about where
 - it was called from.  From this, the profiler can figure out what function
 - called it, and can count how many times it was called.  This change is made
 - by the compiler when your program is compiled with the ‘<samp>-pg</samp>’ option,
 - which causes every function to call <code>mcount</code>
 - (or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler)
 - as one of its first operations.
 - </p>
 - <p>The <code>mcount</code> routine, included in the profiling library,
 - is responsible for recording in an in-memory call graph table
 - both its parent routine (the child) and its parent’s parent.  This is
 - typically done by examining the stack frame to find both
 - the address of the child, and the return address in the original parent.
 - Since this is a very machine-dependent operation, <code>mcount</code>
 - itself is typically a short assembly-language stub routine
 - that extracts the required
 - information, and then calls <code>__mcount_internal</code>
 - (a normal C function) with two arguments—<code>frompc</code> and <code>selfpc</code>.
 - <code>__mcount_internal</code> is responsible for maintaining
 - the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>,
 - and the number of times each of these call arcs was traversed.
 - </p>
 - <p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>),
 - which allows a generic <code>mcount</code> function to extract the
 - required information from the stack frame.  However, on some
 - architectures, most notably the SPARC, using this builtin can be
 - very computationally expensive, and an assembly language version
 - of <code>mcount</code> is used for performance reasons.
 - </p>
 - <p>Number-of-calls information for library routines is collected by using a
 - special version of the C library.  The programs in it are the same as in
 - the usual C library, but they were compiled with ‘<samp>-pg</samp>’.  If you
 - link your program with ‘<samp>gcc … -pg</samp>’, it automatically uses the
 - profiling version of the library.
 - </p>
 - <p>Profiling also involves watching your program as it runs, and keeping a
 - histogram of where the program counter happens to be every now and then.
 - Typically the program counter is looked at around 100 times per second of
 - run time, but the exact frequency may vary from system to system.
 - </p>
 - <p>This is done is one of two ways.  Most UNIX-like operating systems
 - provide a <code>profil()</code> system call, which registers a memory
 - array with the kernel, along with a scale
 - factor that determines how the program’s address space maps
 - into the array.
 - Typical scaling values cause every 2 to 8 bytes of address space
 - to map into a single array slot.
 - On every tick of the system clock
 - (assuming the profiled program is running), the value of the
 - program counter is examined and the corresponding slot in
 - the memory array is incremented.  Since this is done in the kernel,
 - which had to interrupt the process anyway to handle the clock
 - interrupt, very little additional system overhead is required.
 - </p>
 - <p>However, some operating systems, most notably Linux 2.0 (and earlier),
 - do not provide a <code>profil()</code> system call.  On such a system,
 - arrangements are made for the kernel to periodically deliver
 - a signal to the process (typically via <code>setitimer()</code>),
 - which then performs the same operation of examining the
 - program counter and incrementing a slot in the memory array.
 - Since this method requires a signal to be delivered to
 - user space every time a sample is taken, it uses considerably
 - more overhead than kernel-based profiling.  Also, due to the
 - added delay required to deliver the signal, this method is
 - less accurate as well.
 - </p>
 - <p>A special startup routine allocates memory for the histogram and
 - either calls <code>profil()</code> or sets up
 - a clock signal handler.
 - This routine (<code>monstartup</code>) can be invoked in several ways.
 - On Linux systems, a special profiling startup file <code>gcrt0.o</code>,
 - which invokes <code>monstartup</code> before <code>main</code>,
 - is used instead of the default <code>crt0.o</code>.
 - Use of this special startup file is one of the effects
 - of using ‘<samp>gcc … -pg</samp>’ to link.
 - On SPARC systems, no special startup files are used.
 - Rather, the <code>mcount</code> routine, when it is invoked for
 - the first time (typically when <code>main</code> is called),
 - calls <code>monstartup</code>.
 - </p>
 - <p>If the compiler’s ‘<samp>-a</samp>’ option was used, basic-block counting
 - is also enabled.  Each object file is then compiled with a static array
 - of counts, initially zero.
 - In the executable code, every time a new basic-block begins
 - (i.e., when an <code>if</code> statement appears), an extra instruction
 - is inserted to increment the corresponding count in the array.
 - At compile time, a paired array was constructed that recorded
 - the starting address of each basic-block.  Taken together,
 - the two arrays record the starting address of every basic-block,
 - along with the number of times it was executed.
 - </p>
 - <p>The profiling library also includes a function (<code>mcleanup</code>) which is
 - typically registered using <code>atexit()</code> to be called as the
 - program exits, and is responsible for writing the file <samp>gmon.out</samp>.
 - Profiling is turned off, various headers are output, and the histogram
 - is written, followed by the call-graph arcs and the basic-block counts.
 - </p>
 - <p>The output from <code>gprof</code> gives no indication of parts of your program that
 - are limited by I/O or swapping bandwidth.  This is because samples of the
 - program counter are taken at fixed intervals of the program’s run time.
 - Therefore, the
 - time measurements in <code>gprof</code> output say nothing about time that your
 - program was not running.  For example, a part of the program that creates
 - so much data that it cannot all fit in physical memory at once may run very
 - slowly due to thrashing, but <code>gprof</code> will say it uses little time.  On
 - the other hand, sampling by run time has the advantage that the amount of
 - load due to other users won’t directly affect the output you get.
 - </p>
 - <hr>
 - <div class="header">
 - <p>
 - Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a>   [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
 - </div>
 - 
 - 
 - 
 - </body>
 - </html>
 
 
  |