【gperftools】1——CPU profiler

1 gperftools 简介

gperftools 是一款 Google 的开源高性能内存相关工具集，包括 tcmalloc 内存管理工具，还有一些例如 cpu profiler、heap profiler 等性能分析工具，本系列将逐一介绍。

2 CPU profiler 简介

CPU profiler 主要是通过采样的的方式，给出一段时间内程序实际占用cpu时间偏进行统计和分析，优点是使用起来简洁方便。

性能分析通过抽样方法完成，默认是1秒100个样本，一个样本是10毫秒，所以如果程序运行时间不到10ms，那么得到的结果可能会和开始执行的时候不同。

3 本次测试简介

使用的是 ubuntu 20.04 系统。
采用直接调用提供的 API（在需要测试的代码的前后分别调用 ProfilerStart() 和ProfilerStop()）的方式进行测试。

4 安装环境并测试

安装 unwind

sudo apt install libunwind-dev

安装 gperftools

cd ~/Download
git clone https://github.com/gperftools/gperftools.git
cd gperftools
sh autogen.sh
./configure
make all
sudo make install

编写测试代码，监控开始，参数为需要生成的文件名：

/* Start profiling and write profile info into fname, discarding any
 * existing profiling data in that file.
 *
 * This is equivalent to calling ProfilerStartWithOptions(fname, NULL).
 */
PERFTOOLS_DLL_DECL int ProfilerStart(const char* fname);

监控结束：

/* Stop profiling. Can be started again with ProfilerStart(), but
 * the currently accumulated profiling data will be cleared.
 */
PERFTOOLS_DLL_DECL void ProfilerStop(void);

如果开启了新的线程，需要在线程起始添加如下函数进行线程的的注册，但测试发现有无该语句并不造成影响。

/* Routine for registering new threads with the profiler.
 */
PERFTOOLS_DLL_DECL void ProfilerRegisterThread(void);

使用时需要包含的头文件：

#include <gperftools/profiler.h>

测试代码
https://github.com/ixiaolonglong/memory_tool/blob/master/gperftools/tests/cpu_profiler/cpu_profiler_test.cpp

编译测试代码

g++ cpu_profiler_test.cpp -o cpu_profiler_test --lprofiler

执行 cpu_profiler_test，生成 .profile 文件。

5 无环境情况下测试

有时候并不能要求所有编译程序的环境都安装一遍 gperftools 和 unwind，更多的是直接编译运行，所以需要在编译可执行文件时就要准备好相关的源文件。

在这个项目中本人将源码作为 submodule，编译时本地生成 lib 文件，可以直接进行链接。

工程如下，使用流程详见 README.md 文件：
https://github.com/ixiaolonglong/memory_tool/tree/master/gperftools

在编译工程时需要添加如下编译选项，避免被编译器优化：

# tcmalloc options
add_compile_options(
    -fno-builtin-malloc
    -fno-builtin-calloc
    -fno-builtin-realloc
    -fno-builtin-free)

这里记录一下静态库与动态库链接，对于静态库来说，需要描述所有递归用到的 lib：

target_link_libraries(cpu_profiler_test PRIVATE
                                profiler
                                fake_stacktrace_scope
                                sysinfo
                                spinlock
                                maybe_threads
                                logging
                                unwind
                                pthread)

而如果是动态库，则链接就会简单很多：

target_link_libraries(cpu_profiler_test PRIVATE
                                profiler
                                pthread)

https://www.zhihu.com/question/277160878
这个不是cmake的坑，应该是你glog库的坑，我猜想glog库是你自行编译，而glog库编译时没有动态链接gflags导致的。如果你生成动态库时，就使用target link library生成，再配合上rpath寻找路径，是可以支持a依赖b，b依赖c，而你在a中只要写b的依赖而不用写c的依赖。

6 报告

执行程序的环境不一定非要安装 gperftools，但生成的 profile 文件时必须要安装 gperftools 使用其 pprof 工具进行解析。

需要安装图形工具 Graphviz：

sudo apt-get install graphviz

生成不同类型的报告命令：

# 生成性能报告（层次调用节点有向图）输出到web浏览器显示
pprof cpu_profiler_test cpu_test.profile --web

# 生成pdf格式的性能报告（层次调用节点有向图）
pprof cpu_profiler_test cpu_test.profile --pdf > prof.pdf

# 生成文本格式的性能报告输出到控制台
pprof cpu_profiler_test cpu_test.profile --text

6.1 文本

Total: 30 samples
       6  20.0%  20.0%       16  53.3% psiginfo
       5  16.7%  36.7%        5  16.7% __nss_database_lookup
       4  13.3%  50.0%        4  13.3% _IO_default_xsputn
       4  13.3%  63.3%       28  93.3% __snprintf
       3  10.0%  73.3%        3  10.0% _IO_enable_locks
       3  10.0%  83.3%       24  80.0% vscanf
       2   6.7%  90.0%        2   6.7% _IO_str_pbackfail
       1   3.3%  93.3%        1   3.3% cuserid
       1   3.3%  96.7%        9  30.0% test_main_thread
       1   3.3% 100.0%       21  70.0% test_other_thread
       0   0.0% 100.0%       21  70.0% RunFunctionInThread
       0   0.0% 100.0%        9  30.0% __libc_start_main
       0   0.0% 100.0%        9  30.0% _start
       0   0.0% 100.0%       21  70.0% clone
       0   0.0% 100.0%        9  30.0% main
       0   0.0% 100.0%       21  70.0% start_thread

上面文本中输出的内容是对程序中每一个函数的CPU使用时间分析，数据有两大列：

左：不包含内部其他函数调用所消耗的CPU时间（内联函数除外）如果函数内部没有任何调用，那么就和右列相等
右：整个函数消耗的CPU时间，包括函数内部其他函数调用所消耗的CPU时间

每行按照数据顺序：

分析样本数量（不包含其他函数调用）
分析样本百分比（不包含其他函数调用）
目前为止的分析样本百分比（不包含其他函数调用）
分析样本数量（包含其他函数调用）
分析样本百分比（包含其他函数调用）
函数名

6.2 图形

profile.png

每个节点代表一个函数，节点数据格式：

Class Name
Method Name
local (percentage) ，不包含内部其他函数调用所消耗的CPU时间（内联函数除外）
of cumulative (percentage) ，整个函数消耗的CPU时间，包括函数内部其他函数调用所消耗的CPU时间，如果与local相同，则不打印
有向边由调用者指向被调用者，有向边上的时间表示被调用者所消耗的CPU时间

meta 信息（图左上角）：

Total samples，总采样数
Focusing on，--focus option 所包含的采样数
Dropped nodes，忽略的节点
Dropped edges，忽略的边

focus 某些函数：

pprof --gv --focus=vsnprintf cpu_profiler_test cpu_test.profile

ignore 某些函数：

pprof --gv --ignore=snprintf cpu_profiler_test cpu_test.profile

更多操作可参考：https://gperftools.github.io/gperftools/cpuprofile.html

6.3 Kcachegrind

安装 Kcachegrind

sudo apt-get install kcachegrind

生成 .callgrind 文件

pprof --callgrind cpu_profiler_test cpu_test.profile > cpu_test.callgrind

分析命令

kcachegrind cpu_test.callgrind

分析结果如下图所示，相对来说功能比较强：

callgrind.png

7 控制监控开关

如果是server上的程序，启动后一般不会主动退出，即使退出，也一般不会正常退出，而 gperftools 必须在程序正常退出的情况下才能够正常收集或者收集完整数据。

7.1 请求服务

#include <gperftools/profiler.h>

void on_request(Request* req) {
    static bool is_profile_started = false;
    if (req->type == START_PROFILE && !is_profile_started) {
        ProfilerStart("xxx.profile");
        is_profile_started = true;
    } else if (req->type == STOP_PROFILE && is_profile_started) {
        ProfilerStop();
        is_profile_started = false;
    } else {
        // normal request processing here
    }
}

7.2 信号

static void gprof_callback(int signum) {
    if (signum == SIGUSR1) {
        printf("Catch the signal ProfilerStart\n");
        ProfilerStart("bs.prof");
    }
    else if (signum == SIGUSR2) {
        printf("Catch the signal ProfilerStop\n");
        ProfilerStop();
    }
}

static void setup_signal() {
    struct sigaction profstat;
    profstat.sa_handler = gprof_callback;
    profstat.sa_flags = 0;
    sigemptyset(&profstat.sa_mask);                                        
    sigaddset(&profstat.sa_mask, SIGUSR1);
    sigaddset(&profstat.sa_mask, SIGUSR2);

    if (sigaction(SIGUSR1, &profstat,NULL) < 0)
        fprintf(stderr, "SIGUSR1 Fail !");

    if (sigaction(SIGUSR2, &profstat,NULL) < 0)
        fprintf(stderr, "SIGUSR2 Fail !");
}

8 原理

如果只关心如何使用，则到这里就可以编写自己的工程了，下面对 CPU profiler 的源码进行简单的剖析。

站在巨人的肩膀上：http://www.tealcode.com/gperftool_source_analysis/

入口：

extern “C” PERFTOOLS_DLL_DECL int ProfilerStart(const char* fname) {
    return CpuProfiler::instance_.Start(fname, NULL);
}

bool CpuProfiler::Start(const char* fname, const ProfilerOptions* options) {
    collector_.Start(fname, collector_options);
    // Setup handler for SIGPROF interrupts
    EnableHandler();
    return true;
}

CPU profiler 启动的时候，核心功能就是启动数据收集器（collector_），这个数据收集器的 Start() 函数的功能就是初始化数据收集需要的数据结构，并创建数据收集文件：

bool ProfileData::Start(const char* fname, const ProfileData::Options& options) {
    // Open output file and initialize various data structures
    int fd =open(fname, O_CREAT | O_WRONLY | O_TRUNC, 0666);
    start_time_ = time(NULL);
    fname_ = strdup(fname);
    // Reset counters
    num_evicted_ = 0;
    count_ = 0;
    evictions_ = 0;
    total_bytes_ = 0;
    hash_ = new Bucket[kBuckets];
    evict_ = new Slot[kBufferLength];
    memset(hash_, 0, sizeof(hash_[0]) * kBuckets);
    // Record special entries
    evict_[num_evicted_++] = 0; // count for header
    evict_[num_evicted_++] = 3; // depth for header
    evict_[num_evicted_++] = 0; // Version number
    CHECK_NE(0, options.frequency());
    int period =1000000/ options.frequency();
    evict_[num_evicted_++] = period; // Period (microseconds)
    evict_[num_evicted_++] = 0; // Padding
    out_ = fd;
    return true;
}

然后就是开启 CPU profiler 的一个处理函数，这个函数就是把 prof_handler() 注册到了某个地方：

void CpuProfiler::EnableHandler() {
    prof_handler_token_ = ProfileHandlerRegisterCallback(prof_handler, this);
}

ProfileHandlerToken* ProfileHandlerRegisterCallback(
    ProfileHandlerCallback callback, void* callback_arg) {
    return ProfileHandler::Instance()->RegisterCallback(callback, callback_arg);
}

功能都在 ProfileHandler 里面，其为一个单例类，构造函数如下：

ProfileHandler::ProfileHandler() {
    timer_type_ = (getenv(“CPUPROFILE_REALTIME”) ? ITIMER_REAL : ITIMER_PROF);
    signal_number_ = (timer_type_ == ITIMER_PROF ? SIGPROF : SIGALRM);

    // Get frequency of interrupts (if specified)
    char junk;
    constchar* fr =getenv(“CPUPROFILE_FREQUENCY”);

    if (fr != NULL && (sscanf(fr, “%u%c”, &frequency_, &junk) == 1) && (frequency_ > 0)) {
        // Limit to kMaxFrequency
        frequency_ = (frequency_ > kMaxFrequency) ? kMaxFrequency : frequency_;
    } else {
        frequency_ = kDefaultFrequency;
    }

    // Install the signal handler.

    structsigaction sa;
    sa.sa_sigaction = SignalHandler;
    sa.sa_flags = SA_RESTART | SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    sigaction(signal_number_, &sa, NULL);
}

构造函数中，根据环境变量 CPUPROFILE_REALTIME 的配置，来决定让 SIGPROF 还是 SIGALRM 信号来触发 SignalHandler 信号处理函数，并根据环境变量 CPUPROFILE_FREQUENCY 的配置来设置自己的一个频率变量 frequency_ ，如果没有设置，就使用默认值，这个默认值是100，而最大值是4000。

然后 ProfileHandler 的 RegisterCallback() 函数的实现如下：

ProfileHandlerToken* ProfileHandler::RegisterCallback(ProfileHandlerCallback callback, void* callback_arg) {
    ProfileHandlerToken* token = new ProfileHandlerToken(callback, callback_arg);
    SpinLockHolder cl(&control_lock_);
    DisableHandler();
    {
        SpinLockHolder sl(&signal_lock_);
        callbacks_.push_back(token);
    }

    // Start the timer if timer is shared and this is a first callback.
    if ((callback_count_ == 0) && (timer_sharing_ == TIMERS_SHARED)) {
        StartTimer();
    }

    ++callback_count_;
    EnableHandler();
    return token;
}

这个函数就如其函数名字，把指定的回调函数添加到 callbacks_里面去，然后在加入第一个 callback 的时候调用 StartTimer() 函数来启动定时器，然后调用 EnableHander() 函数来开启回调。StartTimer() 的实现如下：

void ProfileHandler::StartTimer() {
    struct itimerval timer;
    timer.it_interval.tv_sec = 0;
    timer.it_interval.tv_usec = 1000000 / frequency_;
    timer.it_value = timer.it_interval;
    setitimer(timer_type_, &timer, 0);
}

EnableHandler() 的实现如下：

void ProfileHandler::EnableHandler() {
    struct sigaction sa;
    sa.sa_sigaction = SignalHandler;
    sa.sa_flags = SA_RESTART | SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    const int signal_number = (timer_type_ == ITIMER_PROF ? SIGPROF : SIGALRM);
    RAW_CHECK(sigaction(signal_number, &sa, NULL) == 0, “sigprof (enable)”);
}

到这里，这个工具的基本工作原理已经可以猜出个大概了。它用 setitimer() 启动一个系统定时器，这个定时器会每秒钟执行触发 frequency 次 SIGPROF 或者 SIGALRM 信号，从而去触发上面注册的信号处理函数。那么猜想，信号处理函数里面应该会用 backtrace 去检查一下目标程序执行到什么位置了。信号处理函数如下：

void CpuProfiler::prof_handler(int sig, siginfo_t*, void* signal_ucontext, void* cpu_profiler) {
    CpuProfiler* instance = static_cast<CpuProfiler*>(cpu_profiler);
    if (instance->filter_==NULL||(*instance->filter_)(instance->filter_arg_)) {
    void* stack[ProfileData::kMaxStackDepth];
    // Under frame-pointer-based unwinding at least on x86, the
    // top-most active routine doesn’t show up as a normal frame, but
    // as the “pc” value in the signal handler context.
    stack[0] = GetPC(*reinterpret_cast<ucontext_t*>(signal_ucontext));
    // We skip the top three stack trace entries (this function,
    // SignalHandler::SignalHandler and one signal handler frame)
    // since they are artifacts of profiling and should not be
    // measured. Other profiling related frames may be removed by
    // “pprof” at analysis time. Instead of skipping the top frames,
    // we could skip nothing, but that would increase the profile size
    // unnecessarily.
    int depth = GetStackTraceWithContext(stack +1, arraysize(stack) -1, 3, signal_ucontext);
    void**used_stack;
    if (depth >0&& stack[1] == stack[0]) {
        // in case of non-frame-pointer-based unwinding we will get
        // duplicate of PC in stack[1], which we don’t want
        used_stack = stack + 1;
    } else {
        used_stack = stack;
        depth++; // To account for pc value in stack[0];
    }
        instance->collector_.Add(depth, used_stack);
    }
}

果然是获取backtrace，然后记录到collector_里面去。

总结：

这个工具是用系统定时器定时产生信号的方式，在信号处理函数里面获取当前的调用堆栈来确定当前落在哪个函数里面的。获取频率默认是每10ms采样一次，参数是可调的，但是最大频率是4000，也就是支持的最小采样间隔是250微秒；
这个工具获取到的性能数据是基于统计数据的，也就是他并不真正跟踪函数的每一次调用过程，而是均匀地采样并记录采样点所落在的函数调用位置，用这些统计数据来计算每个函数的执行时间占比。这个数据并不是准确的数据，但是只要运行时间相对比较长，统计数据还是能比较准确地说明问题的。而这也是为什么说这个工具是比较好的服务器程序性能分析工具，而对一些客户端程序，比如游戏客户端并不是非常合适。因为游戏客户端上，相比长时间的统计数据，它们通常更加关心的是某些帧内的具体负载情况。
这个工具不工作的时候，就会把系统定时器取消掉，不会定时产生中断信号，不会触发中断处理程序，所以对运行程序的影响真的是很小，运行效率上可以说完全没有影响。而对产品的影响只是多占用一些链接 profiler 库的内存而已。

参考链接：

https://gperftools.github.io/gperftools/cpuprofile.html
http://airekans.github.io/cpp/2014/07/04/gperftools-profile
https://blog.csdn.net/aganlengzi/article/details/62893533
https://blog.csdn.net/10km/article/details/83820080
https://www.zhihu.com/question/277160878
http://www.tealcode.com/gperftool_source_analysis/

最后编辑于：2022.04.21 14:27:36

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 205,132评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,802评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,566评论 0赞 338
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,858评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,867评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,695评论 1赞 282
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,064评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,705评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 42,915评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,677评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,796评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,432评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,041评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,992评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,223评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,185评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,535评论 2赞 343