|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188 |
- <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
- <html>
- <!-- This file documents the gprof profiler of the GNU system.
-
- Copyright (C) 1988-2020 Free Software Foundation, Inc.
-
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3
- or any later version published by the Free Software Foundation;
- with no Invariant Sections, with no Front-Cover Texts, and with no
- Back-Cover Texts. A copy of the license is included in the
- section entitled "GNU Free Documentation License".
- -->
- <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
- <title>Implementation (GNU gprof)</title>
-
- <meta name="description" content="Implementation (GNU gprof)">
- <meta name="keywords" content="Implementation (GNU gprof)">
- <meta name="resource-type" content="document">
- <meta name="distribution" content="global">
- <meta name="Generator" content="makeinfo">
- <link href="index.html#Top" rel="start" title="Top">
- <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
- <link href="Details.html#Details" rel="up" title="Details">
- <link href="File-Format.html#File-Format" rel="next" title="File Format">
- <link href="Details.html#Details" rel="prev" title="Details">
- <style type="text/css">
- <!--
- a.summary-letter {text-decoration: none}
- blockquote.indentedblock {margin-right: 0em}
- blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
- blockquote.smallquotation {font-size: smaller}
- div.display {margin-left: 3.2em}
- div.example {margin-left: 3.2em}
- div.lisp {margin-left: 3.2em}
- div.smalldisplay {margin-left: 3.2em}
- div.smallexample {margin-left: 3.2em}
- div.smalllisp {margin-left: 3.2em}
- kbd {font-style: oblique}
- pre.display {font-family: inherit}
- pre.format {font-family: inherit}
- pre.menu-comment {font-family: serif}
- pre.menu-preformatted {font-family: serif}
- pre.smalldisplay {font-family: inherit; font-size: smaller}
- pre.smallexample {font-size: smaller}
- pre.smallformat {font-family: inherit; font-size: smaller}
- pre.smalllisp {font-size: smaller}
- span.nolinebreak {white-space: nowrap}
- span.roman {font-family: initial; font-weight: normal}
- span.sansserif {font-family: sans-serif; font-weight: normal}
- ul.no-bullet {list-style: none}
- -->
- </style>
-
-
- </head>
-
- <body lang="en">
- <a name="Implementation"></a>
- <div class="header">
- <p>
- Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
- </div>
- <hr>
- <a name="Implementation-of-Profiling"></a>
- <h3 class="section">9.1 Implementation of Profiling</h3>
-
- <p>Profiling works by changing how every function in your program is compiled
- so that when it is called, it will stash away some information about where
- it was called from. From this, the profiler can figure out what function
- called it, and can count how many times it was called. This change is made
- by the compiler when your program is compiled with the ‘<samp>-pg</samp>’ option,
- which causes every function to call <code>mcount</code>
- (or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler)
- as one of its first operations.
- </p>
- <p>The <code>mcount</code> routine, included in the profiling library,
- is responsible for recording in an in-memory call graph table
- both its parent routine (the child) and its parent’s parent. This is
- typically done by examining the stack frame to find both
- the address of the child, and the return address in the original parent.
- Since this is a very machine-dependent operation, <code>mcount</code>
- itself is typically a short assembly-language stub routine
- that extracts the required
- information, and then calls <code>__mcount_internal</code>
- (a normal C function) with two arguments—<code>frompc</code> and <code>selfpc</code>.
- <code>__mcount_internal</code> is responsible for maintaining
- the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>,
- and the number of times each of these call arcs was traversed.
- </p>
- <p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>),
- which allows a generic <code>mcount</code> function to extract the
- required information from the stack frame. However, on some
- architectures, most notably the SPARC, using this builtin can be
- very computationally expensive, and an assembly language version
- of <code>mcount</code> is used for performance reasons.
- </p>
- <p>Number-of-calls information for library routines is collected by using a
- special version of the C library. The programs in it are the same as in
- the usual C library, but they were compiled with ‘<samp>-pg</samp>’. If you
- link your program with ‘<samp>gcc … -pg</samp>’, it automatically uses the
- profiling version of the library.
- </p>
- <p>Profiling also involves watching your program as it runs, and keeping a
- histogram of where the program counter happens to be every now and then.
- Typically the program counter is looked at around 100 times per second of
- run time, but the exact frequency may vary from system to system.
- </p>
- <p>This is done is one of two ways. Most UNIX-like operating systems
- provide a <code>profil()</code> system call, which registers a memory
- array with the kernel, along with a scale
- factor that determines how the program’s address space maps
- into the array.
- Typical scaling values cause every 2 to 8 bytes of address space
- to map into a single array slot.
- On every tick of the system clock
- (assuming the profiled program is running), the value of the
- program counter is examined and the corresponding slot in
- the memory array is incremented. Since this is done in the kernel,
- which had to interrupt the process anyway to handle the clock
- interrupt, very little additional system overhead is required.
- </p>
- <p>However, some operating systems, most notably Linux 2.0 (and earlier),
- do not provide a <code>profil()</code> system call. On such a system,
- arrangements are made for the kernel to periodically deliver
- a signal to the process (typically via <code>setitimer()</code>),
- which then performs the same operation of examining the
- program counter and incrementing a slot in the memory array.
- Since this method requires a signal to be delivered to
- user space every time a sample is taken, it uses considerably
- more overhead than kernel-based profiling. Also, due to the
- added delay required to deliver the signal, this method is
- less accurate as well.
- </p>
- <p>A special startup routine allocates memory for the histogram and
- either calls <code>profil()</code> or sets up
- a clock signal handler.
- This routine (<code>monstartup</code>) can be invoked in several ways.
- On Linux systems, a special profiling startup file <code>gcrt0.o</code>,
- which invokes <code>monstartup</code> before <code>main</code>,
- is used instead of the default <code>crt0.o</code>.
- Use of this special startup file is one of the effects
- of using ‘<samp>gcc … -pg</samp>’ to link.
- On SPARC systems, no special startup files are used.
- Rather, the <code>mcount</code> routine, when it is invoked for
- the first time (typically when <code>main</code> is called),
- calls <code>monstartup</code>.
- </p>
- <p>If the compiler’s ‘<samp>-a</samp>’ option was used, basic-block counting
- is also enabled. Each object file is then compiled with a static array
- of counts, initially zero.
- In the executable code, every time a new basic-block begins
- (i.e., when an <code>if</code> statement appears), an extra instruction
- is inserted to increment the corresponding count in the array.
- At compile time, a paired array was constructed that recorded
- the starting address of each basic-block. Taken together,
- the two arrays record the starting address of every basic-block,
- along with the number of times it was executed.
- </p>
- <p>The profiling library also includes a function (<code>mcleanup</code>) which is
- typically registered using <code>atexit()</code> to be called as the
- program exits, and is responsible for writing the file <samp>gmon.out</samp>.
- Profiling is turned off, various headers are output, and the histogram
- is written, followed by the call-graph arcs and the basic-block counts.
- </p>
- <p>The output from <code>gprof</code> gives no indication of parts of your program that
- are limited by I/O or swapping bandwidth. This is because samples of the
- program counter are taken at fixed intervals of the program’s run time.
- Therefore, the
- time measurements in <code>gprof</code> output say nothing about time that your
- program was not running. For example, a part of the program that creates
- so much data that it cannot all fit in physical memory at once may run very
- slowly due to thrashing, but <code>gprof</code> will say it uses little time. On
- the other hand, sampling by run time has the advantage that the amount of
- load due to other users won’t directly affect the output you get.
- </p>
- <hr>
- <div class="header">
- <p>
- Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
- </div>
-
-
-
- </body>
- </html>
|