|  | IMPORTANT NOTE FOR 64-BIT USERS | 
|  | ------------------------------- | 
|  | There are known issues with some perftools functionality on x86_64 | 
|  | systems.  See 64-BIT ISSUES, below. | 
|  |  | 
|  |  | 
|  | TCMALLOC | 
|  | -------- | 
|  | Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of | 
|  | tcmalloc -- a replacement for malloc and new.  See below for some | 
|  | environment variables you can use with tcmalloc, as well. | 
|  |  | 
|  | tcmalloc functionality is available on all systems we've tested; see | 
|  | INSTALL for more details.  See README_windows.txt for instructions on | 
|  | using tcmalloc on Windows. | 
|  |  | 
|  | NOTE: When compiling with programs with gcc, that you plan to link | 
|  | with libtcmalloc, it's safest to pass in the flags | 
|  |  | 
|  | -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free | 
|  |  | 
|  | when compiling.  gcc makes some optimizations assuming it is using its | 
|  | own, built-in malloc; that assumption obviously isn't true with | 
|  | tcmalloc.  In practice, we haven't seen any problems with this, but | 
|  | the expected risk is highest for users who register their own malloc | 
|  | hooks with tcmalloc (using gperftools/malloc_hook.h).  The risk is | 
|  | lowest for folks who use tcmalloc_minimal (or, of course, who pass in | 
|  | the above flags :-) ). | 
|  |  | 
|  |  | 
|  | HEAP PROFILER | 
|  | ------------- | 
|  | See doc/heap-profiler.html for information about how to use tcmalloc's | 
|  | heap profiler and analyze its output. | 
|  |  | 
|  | As a quick-start, do the following after installing this package: | 
|  |  | 
|  | 1) Link your executable with -ltcmalloc | 
|  | 2) Run your executable with the HEAPPROFILE environment var set: | 
|  | $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] | 
|  | 3) Run pprof to analyze the heap usage | 
|  | $ pprof <path/to/binary> /tmp/heapprof.0045.heap  # run 'ls' to see options | 
|  | $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap | 
|  |  | 
|  | You can also use LD_PRELOAD to heap-profile an executable that you | 
|  | didn't compile. | 
|  |  | 
|  | There are other environment variables, besides HEAPPROFILE, you can | 
|  | set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" | 
|  | below. | 
|  |  | 
|  | The heap profiler is available on all unix-based systems we've tested; | 
|  | see INSTALL for more details.  It is not currently available on Windows. | 
|  |  | 
|  |  | 
|  | HEAP CHECKER | 
|  | ------------ | 
|  | See doc/heap-checker.html for information about how to use tcmalloc's | 
|  | heap checker. | 
|  |  | 
|  | In order to catch all heap leaks, tcmalloc must be linked *last* into | 
|  | your executable.  The heap checker may mischaracterize some memory | 
|  | accesses in libraries listed after it on the link line.  For instance, | 
|  | it may report these libraries as leaking memory when they're not. | 
|  | (See the source code for more details.) | 
|  |  | 
|  | Here's a quick-start for how to use: | 
|  |  | 
|  | As a quick-start, do the following after installing this package: | 
|  |  | 
|  | 1) Link your executable with -ltcmalloc | 
|  | 2) Run your executable with the HEAPCHECK environment var set: | 
|  | $ HEAPCHECK=1 <path/to/binary> [binary args] | 
|  |  | 
|  | Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian | 
|  |  | 
|  | You can also use LD_PRELOAD to heap-check an executable that you | 
|  | didn't compile. | 
|  |  | 
|  | The heap checker is only available on Linux at this time; see INSTALL | 
|  | for more details. | 
|  |  | 
|  |  | 
|  | CPU PROFILER | 
|  | ------------ | 
|  | See doc/cpu-profiler.html for information about how to use the CPU | 
|  | profiler and analyze its output. | 
|  |  | 
|  | As a quick-start, do the following after installing this package: | 
|  |  | 
|  | 1) Link your executable with -lprofiler | 
|  | 2) Run your executable with the CPUPROFILE environment var set: | 
|  | $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] | 
|  | 3) Run pprof to analyze the CPU usage | 
|  | $ pprof <path/to/binary> /tmp/prof.out      # -pg-like text output | 
|  | $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output | 
|  |  | 
|  | There are other environment variables, besides CPUPROFILE, you can set | 
|  | to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. | 
|  |  | 
|  | The CPU profiler is available on all unix-based systems we've tested; | 
|  | see INSTALL for more details.  It is not currently available on Windows. | 
|  |  | 
|  | NOTE: CPU profiling doesn't work after fork (unless you immediately | 
|  | do an exec()-like call afterwards).  Furthermore, if you do | 
|  | fork, and the child calls exit(), it may corrupt the profile | 
|  | data.  You can use _exit() to work around this.  We hope to have | 
|  | a fix for both problems in the next release of perftools | 
|  | (hopefully perftools 1.2). | 
|  |  | 
|  |  | 
|  | EVERYTHING IN ONE | 
|  | ----------------- | 
|  | If you want the CPU profiler, heap profiler, and heap leak-checker to | 
|  | all be available for your application, you can do: | 
|  | gcc -o myapp ... -lprofiler -ltcmalloc | 
|  |  | 
|  | However, if you have a reason to use the static versions of the | 
|  | library, this two-library linking won't work: | 
|  | gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a  # errors! | 
|  |  | 
|  | Instead, use the special libtcmalloc_and_profiler library, which we | 
|  | make for just this purpose: | 
|  | gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a | 
|  |  | 
|  |  | 
|  | CONFIGURATION OPTIONS | 
|  | --------------------- | 
|  | For advanced users, there are several flags you can pass to | 
|  | './configure' that tweak tcmalloc performace.  (These are in addition | 
|  | to the environment variables you can set at runtime to affect | 
|  | tcmalloc, described below.)  See the INSTALL file for details. | 
|  |  | 
|  |  | 
|  | ENVIRONMENT VARIABLES | 
|  | --------------------- | 
|  | The cpu profiler, heap checker, and heap profiler will lie dormant, | 
|  | using no memory or CPU, until you turn them on.  (Thus, there's no | 
|  | harm in linking -lprofiler into every application, and also -ltcmalloc | 
|  | assuming you're ok using the non-libc malloc library.) | 
|  |  | 
|  | The easiest way to turn them on is by setting the appropriate | 
|  | environment variables.  We have several variables that let you | 
|  | enable/disable features as well as tweak parameters. | 
|  |  | 
|  | Here are some of the most important variables: | 
|  |  | 
|  | HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix | 
|  | HEAPCHECK=<type>  -- turns on heap checking with strictness 'type' | 
|  | CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. | 
|  | PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code | 
|  | surrounded with ProfilerEnable()/ProfilerDisable(). | 
|  | PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples. | 
|  |  | 
|  | TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits | 
|  | MALLOCSTATS=<level>    -- prints memory-use stats at program-exit | 
|  |  | 
|  | For a full list of variables, see the documentation pages: | 
|  | doc/cpuprofile.html | 
|  | doc/heapprofile.html | 
|  | doc/heap_checker.html | 
|  |  | 
|  |  | 
|  | COMPILING ON NON-LINUX SYSTEMS | 
|  | ------------------------------ | 
|  |  | 
|  | Perftools was developed and tested on x86 Linux systems, and it works | 
|  | in its full generality only on those systems.  However, we've | 
|  | successfully ported much of the tcmalloc library to FreeBSD, Solaris | 
|  | x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic | 
|  | functionality in tcmalloc_minimal to Windows.  See INSTALL for details. | 
|  | See README_windows.txt for details on the Windows port. | 
|  |  | 
|  |  | 
|  | PERFORMANCE | 
|  | ----------- | 
|  |  | 
|  | If you're interested in some third-party comparisons of tcmalloc to | 
|  | other malloc libraries, here are a few web pages that have been | 
|  | brought to our attention.  The first discusses the effect of using | 
|  | various malloc libraries on OpenLDAP.  The second compares tcmalloc to | 
|  | win32's malloc. | 
|  | http://www.highlandsun.com/hyc/malloc/ | 
|  | http://gaiacrtn.free.fr/articles/win32perftools.html | 
|  |  | 
|  | It's possible to build tcmalloc in a way that trades off faster | 
|  | performance (particularly for deletes) at the cost of more memory | 
|  | fragmentation (that is, more unusable memory on your system).  See the | 
|  | INSTALL file for details. | 
|  |  | 
|  |  | 
|  | OLD SYSTEM ISSUES | 
|  | ----------------- | 
|  |  | 
|  | When compiling perftools on some old systems, like RedHat 8, you may | 
|  | get an error like this: | 
|  | ___tls_get_addr: symbol not found | 
|  |  | 
|  | This means that you have a system where some parts are updated enough | 
|  | to support Thread Local Storage, but others are not.  The perftools | 
|  | configure script can't always detect this kind of case, leading to | 
|  | that error.  To fix it, just comment out (or delete) the line | 
|  | #define HAVE_TLS 1 | 
|  | in your config.h file before building. | 
|  |  | 
|  |  | 
|  | 64-BIT ISSUES | 
|  | ------------- | 
|  |  | 
|  | There are two issues that can cause program hangs or crashes on x86_64 | 
|  | 64-bit systems, which use the libunwind library to get stack-traces. | 
|  | Neither issue should affect the core tcmalloc library; they both | 
|  | affect the perftools tools such as cpu-profiler, heap-checker, and | 
|  | heap-profiler. | 
|  |  | 
|  | 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the | 
|  | libc function dl_iterate_phdr() acquires its locks in the wrong | 
|  | order.  This bug should not affect tcmalloc, but may cause occasional | 
|  | deadlock with the cpu-profiler, heap-profiler, and heap-checker. | 
|  | Its likeliness increases the more dlopen() commands an executable has. | 
|  | Most executables don't have any, though several library routines like | 
|  | getgrgid() call dlopen() behind the scenes. | 
|  |  | 
|  | 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the | 
|  | cpu-profiler tool is unreliable: it will sometimes work, but sometimes | 
|  | cause a segfault.  I'll explain the problem first, and then some | 
|  | workarounds. | 
|  |  | 
|  | Note that this only affects the cpu-profiler, which is a | 
|  | gperftools feature you must turn on manually by setting the | 
|  | CPUPROFILE environment variable.  If you do not turn on cpu-profiling, | 
|  | you shouldn't see any crashes due to perftools. | 
|  |  | 
|  | The gory details: The underlying problem is in the backtrace() | 
|  | function, which is a built-in function in libc. | 
|  | Backtracing is fairly straightforward in the normal case, but can run | 
|  | into problems when having to backtrace across a signal frame. | 
|  | Unfortunately, the cpu-profiler uses signals in order to register a | 
|  | profiling event, so every backtrace that the profiler does crosses a | 
|  | signal frame. | 
|  |  | 
|  | In our experience, the only time there is trouble is when the signal | 
|  | fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is | 
|  | called quite a bit from system libraries, particularly at program | 
|  | startup and when creating a new thread. | 
|  |  | 
|  | The solution: The dwarf debugging format has support for 'cfi | 
|  | annotations', which make it easy to recognize a signal frame.  Some OS | 
|  | distributions, such as Fedora and gentoo 2007.0, already have added | 
|  | cfi annotations to their libc.  A future version of libunwind should | 
|  | recognize these annotations; these systems should not see any | 
|  | crashses. | 
|  |  | 
|  | Workarounds: If you see problems with crashes when running the | 
|  | cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into | 
|  | your code, rather than setting CPUPROFILE.  This will profile only | 
|  | those sections of the codebase.  Though we haven't done much testing, | 
|  | in theory this should reduce the chance of crashes by limiting the | 
|  | signal generation to only a small part of the codebase.  Ideally, you | 
|  | would not use ProfilerStart()/ProfilerStop() around code that spawns | 
|  | new threads, or is otherwise likely to cause a call to | 
|  | pthread_mutex_lock! | 
|  |  | 
|  | --- | 
|  | 17 May 2011 |