You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

3 年之前
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!-- This file documents the gprof profiler of the GNU system.
  4. Copyright (C) 1988-2020 Free Software Foundation, Inc.
  5. Permission is granted to copy, distribute and/or modify this document
  6. under the terms of the GNU Free Documentation License, Version 1.3
  7. or any later version published by the Free Software Foundation;
  8. with no Invariant Sections, with no Front-Cover Texts, and with no
  9. Back-Cover Texts. A copy of the license is included in the
  10. section entitled "GNU Free Documentation License".
  11. -->
  12. <!-- Created by GNU Texinfo 6.5, http://www.gnu.org/software/texinfo/ -->
  13. <head>
  14. <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  15. <title>Internals (GNU gprof)</title>
  16. <meta name="description" content="Internals (GNU gprof)">
  17. <meta name="keywords" content="Internals (GNU gprof)">
  18. <meta name="resource-type" content="document">
  19. <meta name="distribution" content="global">
  20. <meta name="Generator" content="makeinfo">
  21. <link href="index.html#Top" rel="start" title="Top">
  22. <link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
  23. <link href="Details.html#Details" rel="up" title="Details">
  24. <link href="Debugging.html#Debugging" rel="next" title="Debugging">
  25. <link href="File-Format.html#File-Format" rel="prev" title="File Format">
  26. <style type="text/css">
  27. <!--
  28. a.summary-letter {text-decoration: none}
  29. blockquote.indentedblock {margin-right: 0em}
  30. blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
  31. blockquote.smallquotation {font-size: smaller}
  32. div.display {margin-left: 3.2em}
  33. div.example {margin-left: 3.2em}
  34. div.lisp {margin-left: 3.2em}
  35. div.smalldisplay {margin-left: 3.2em}
  36. div.smallexample {margin-left: 3.2em}
  37. div.smalllisp {margin-left: 3.2em}
  38. kbd {font-style: oblique}
  39. pre.display {font-family: inherit}
  40. pre.format {font-family: inherit}
  41. pre.menu-comment {font-family: serif}
  42. pre.menu-preformatted {font-family: serif}
  43. pre.smalldisplay {font-family: inherit; font-size: smaller}
  44. pre.smallexample {font-size: smaller}
  45. pre.smallformat {font-family: inherit; font-size: smaller}
  46. pre.smalllisp {font-size: smaller}
  47. span.nolinebreak {white-space: nowrap}
  48. span.roman {font-family: initial; font-weight: normal}
  49. span.sansserif {font-family: sans-serif; font-weight: normal}
  50. ul.no-bullet {list-style: none}
  51. -->
  52. </style>
  53. </head>
  54. <body lang="en">
  55. <a name="Internals"></a>
  56. <div class="header">
  57. <p>
  58. Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="prev">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
  59. </div>
  60. <hr>
  61. <a name="gprof_0027s-Internal-Operation"></a>
  62. <h3 class="section">9.3 <code>gprof</code>&rsquo;s Internal Operation</h3>
  63. <p>Like most programs, <code>gprof</code> begins by processing its options.
  64. During this stage, it may building its symspec list
  65. (<code>sym_ids.c:sym_id_add</code>), if
  66. options are specified which use symspecs.
  67. <code>gprof</code> maintains a single linked list of symspecs,
  68. which will eventually get turned into 12 symbol tables,
  69. organized into six include/exclude pairs&mdash;one
  70. pair each for the flat profile (INCL_FLAT/EXCL_FLAT),
  71. the call graph arcs (INCL_ARCS/EXCL_ARCS),
  72. printing in the call graph (INCL_GRAPH/EXCL_GRAPH),
  73. timing propagation in the call graph (INCL_TIME/EXCL_TIME),
  74. the annotated source listing (INCL_ANNO/EXCL_ANNO),
  75. and the execution count listing (INCL_EXEC/EXCL_EXEC).
  76. </p>
  77. <p>After option processing, <code>gprof</code> finishes
  78. building the symspec list by adding all the symspecs in
  79. <code>default_excluded_list</code> to the exclude lists
  80. EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified,
  81. EXCL_FLAT as well.
  82. These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
  83. </p>
  84. <p>Next, the BFD library is called to open the object file,
  85. verify that it is an object file,
  86. and read its symbol table (<code>core.c:core_init</code>),
  87. using <code>bfd_canonicalize_symtab</code> after mallocing
  88. an appropriately sized array of symbols. At this point,
  89. function mappings are read (if the &lsquo;<samp>--file-ordering</samp>&rsquo; option
  90. has been specified), and the core text space is read into
  91. memory (if the &lsquo;<samp>-c</samp>&rsquo; option was given).
  92. </p>
  93. <p><code>gprof</code>&rsquo;s own symbol table, an array of Sym structures,
  94. is now built.
  95. This is done in one of two ways, by one of two routines, depending
  96. on whether line-by-line profiling (&lsquo;<samp>-l</samp>&rsquo; option) has been
  97. enabled.
  98. For normal profiling, the BFD canonical symbol table is scanned.
  99. For line-by-line profiling, every
  100. text space address is examined, and a new symbol table entry
  101. gets created every time the line number changes.
  102. In either case, two passes are made through the symbol
  103. table&mdash;one to count the size of the symbol table required,
  104. and the other to actually read the symbols. In between the
  105. two passes, a single array of type <code>Sym</code> is created of
  106. the appropriate length.
  107. Finally, <code>symtab.c:symtab_finalize</code>
  108. is called to sort the symbol table and remove duplicate entries
  109. (entries with the same memory address).
  110. </p>
  111. <p>The symbol table must be a contiguous array for two reasons.
  112. First, the <code>qsort</code> library function (which sorts an array)
  113. will be used to sort the symbol table.
  114. Also, the symbol lookup routine (<code>symtab.c:sym_lookup</code>),
  115. which finds symbols
  116. based on memory address, uses a binary search algorithm
  117. which requires the symbol table to be a sorted array.
  118. Function symbols are indicated with an <code>is_func</code> flag.
  119. Line number symbols have no special flags set.
  120. Additionally, a symbol can have an <code>is_static</code> flag
  121. to indicate that it is a local symbol.
  122. </p>
  123. <p>With the symbol table read, the symspecs can now be translated
  124. into Syms (<code>sym_ids.c:sym_id_parse</code>). Remember that a single
  125. symspec can match multiple symbols.
  126. An array of symbol tables
  127. (<code>syms</code>) is created, each entry of which is a symbol table
  128. of Syms to be included or excluded from a particular listing.
  129. The master symbol table and the symspecs are examined by nested
  130. loops, and every symbol that matches a symspec is inserted
  131. into the appropriate syms table. This is done twice, once to
  132. count the size of each required symbol table, and again to build
  133. the tables, which have been malloced between passes.
  134. From now on, to determine whether a symbol is on an include
  135. or exclude symspec list, <code>gprof</code> simply uses its
  136. standard symbol lookup routine on the appropriate table
  137. in the <code>syms</code> array.
  138. </p>
  139. <p>Now the profile data file(s) themselves are read
  140. (<code>gmon_io.c:gmon_out_read</code>),
  141. first by checking for a new-style &lsquo;<samp>gmon.out</samp>&rsquo; header,
  142. then assuming this is an old-style BSD &lsquo;<samp>gmon.out</samp>&rsquo;
  143. if the magic number test failed.
  144. </p>
  145. <p>New-style histogram records are read by <code>hist.c:hist_read_rec</code>.
  146. For the first histogram record, allocate a memory array to hold
  147. all the bins, and read them in.
  148. When multiple profile data files (or files with multiple histogram
  149. records) are read, the memory ranges of each pair of histogram records
  150. must be either equal, or non-overlapping. For each pair of histogram
  151. records, the resolution (memory region size divided by the number of
  152. bins) must be the same. The time unit must be the same for all
  153. histogram records. If the above containts are met, all histograms
  154. for the same memory range are merged.
  155. </p>
  156. <p>As each call graph record is read (<code>call_graph.c:cg_read_rec</code>),
  157. the parent and child addresses
  158. are matched to symbol table entries, and a call graph arc is
  159. created by <code>cg_arcs.c:arc_add</code>, unless the arc fails a symspec
  160. check against INCL_ARCS/EXCL_ARCS. As each arc is added,
  161. a linked list is maintained of the parent&rsquo;s child arcs, and of the child&rsquo;s
  162. parent arcs.
  163. Both the child&rsquo;s call count and the arc&rsquo;s call count are
  164. incremented by the record&rsquo;s call count.
  165. </p>
  166. <p>Basic-block records are read (<code>basic_blocks.c:bb_read_rec</code>),
  167. but only if line-by-line profiling has been selected.
  168. Each basic-block address is matched to a corresponding line
  169. symbol in the symbol table, and an entry made in the symbol&rsquo;s
  170. bb_addr and bb_calls arrays. Again, if multiple basic-block
  171. records are present for the same address, the call counts
  172. are cumulative.
  173. </p>
  174. <p>A gmon.sum file is dumped, if requested (<code>gmon_io.c:gmon_out_write</code>).
  175. </p>
  176. <p>If histograms were present in the data files, assign them to symbols
  177. (<code>hist.c:hist_assign_samples</code>) by iterating over all the sample
  178. bins and assigning them to symbols. Since the symbol table
  179. is sorted in order of ascending memory addresses, we can
  180. simple follow along in the symbol table as we make our pass
  181. over the sample bins.
  182. This step includes a symspec check against INCL_FLAT/EXCL_FLAT.
  183. Depending on the histogram
  184. scale factor, a sample bin may span multiple symbols,
  185. in which case a fraction of the sample count is allocated
  186. to each symbol, proportional to the degree of overlap.
  187. This effect is rare for normal profiling, but overlaps
  188. are more common during line-by-line profiling, and can
  189. cause each of two adjacent lines to be credited with half
  190. a hit, for example.
  191. </p>
  192. <p>If call graph data is present, <code>cg_arcs.c:cg_assemble</code> is called.
  193. First, if &lsquo;<samp>-c</samp>&rsquo; was specified, a machine-dependent
  194. routine (<code>find_call</code>) scans through each symbol&rsquo;s machine code,
  195. looking for subroutine call instructions, and adding them
  196. to the call graph with a zero call count.
  197. A topological sort is performed by depth-first numbering
  198. all the symbols (<code>cg_dfn.c:cg_dfn</code>), so that
  199. children are always numbered less than their parents,
  200. then making a array of pointers into the symbol table and sorting it into
  201. numerical order, which is reverse topological
  202. order (children appear before parents).
  203. Cycles are also detected at this point, all members
  204. of which are assigned the same topological number.
  205. Two passes are now made through this sorted array of symbol pointers.
  206. The first pass, from end to beginning (parents to children),
  207. computes the fraction of child time to propagate to each parent
  208. and a print flag.
  209. The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH,
  210. with a parent&rsquo;s include or exclude (print or no print) property
  211. being propagated to its children, unless they themselves explicitly appear
  212. in INCL_GRAPH or EXCL_GRAPH.
  213. A second pass, from beginning to end (children to parents) actually
  214. propagates the timings along the call graph, subject
  215. to a check against INCL_TIME/EXCL_TIME.
  216. With the print flag, fractions, and timings now stored in the symbol
  217. structures, the topological sort array is now discarded, and a
  218. new array of pointers is assembled, this time sorted by propagated time.
  219. </p>
  220. <p>Finally, print the various outputs the user requested, which is now fairly
  221. straightforward. The call graph (<code>cg_print.c:cg_print</code>) and
  222. flat profile (<code>hist.c:hist_print</code>) are regurgitations of values
  223. already computed. The annotated source listing
  224. (<code>basic_blocks.c:print_annotated_source</code>) uses basic-block
  225. information, if present, to label each line of code with call counts,
  226. otherwise only the function call counts are presented.
  227. </p>
  228. <p>The function ordering code is marginally well documented
  229. in the source code itself (<code>cg_print.c</code>). Basically,
  230. the functions with the most use and the most parents are
  231. placed first, followed by other functions with the most use,
  232. followed by lower use functions, followed by unused functions
  233. at the end.
  234. </p>
  235. <hr>
  236. <div class="header">
  237. <p>
  238. Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="prev">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
  239. </div>
  240. </body>
  241. </html>