Introduction
is a great tool to analyse performances on Linux boxes. For example,
perf top
will give you this type of output on a box running
on a high speed network:
Events: 32K cycles
28.41% suricata [.] SCACSearch
19.86% libc-2.15.so [.] tolower
17.83% suricata [.] SigMatchSignaturesBuildMatchArray
6.11% suricata [.] SigMatchSignaturesBuildMatchArrayAddSignature
2.06% suricata [.] tolower@plt
1.70% libpthread-2.15.so [.] pthread_mutex_trylock
1.17% suricata [.] StreamTcpGetFlowState
1.10% libc-2.15.so [.] __memcpy_ssse3_back
0.90% libpthread-2.15.so [.] pthread_mutex_lock
The functions are sorted by CPU consumption. Using arrow key it is possible to jump into the annotated code to see where most CPU cycles are used.
This is really useful but in the case of a function like
pthread_mutex_trylock, the interesting part is to be able to find where this function is called.
Getting function call graph in perf
Once suricata was restarted (with pid being 9366), I was then able to record the data:
sudo perf record -a --call-graph -p 9366
Extracting the call graph was then possible by running:
sudo perf report --call-graph --stdio
The result is a huge detailed report. For example, here’s the part on pthread_mutex_lock:
0.94% Suricata-Main libpthread-2.15.so [.] pthread_mutex_lock
|
--- pthread_mutex_lock
|
|--48.69%-- FlowHandlePacket
| |
| |--53.04%-- DecodeUDP
| | |
| | |--95.84%-- DecodeIPV4
| | | |
| | | |--99.97%-- DecodeVLAN
| | | | DecodeEthernet
| | | | DecodePfring
| | | | TmThreadsSlotVarRun
| | | | TmThreadsSlotProcessPkt
| | | | ReceivePfringLoop
| | | | TmThreadsSlotPktAcqLoop
| | | | start_thread
| | | --0.03%-- [...]
| | |
| | --4.16%-- DecodeIPV6
| | |
| | |--97.59%-- DecodeTunnel
| | | |
| | | |--99.18%-- DecodeTeredo
| | | | DecodeUDP
| | | | DecodeIPV4
| | | | DecodeVLAN
| | | | DecodeEthernet
| | | | DecodePfring
| | | | TmThreadsSlotVarRun
| | | | TmThreadsSlotProcessPkt
| | | | ReceivePfringLoop
| | | | TmThreadsSlotPktAcqLoop
| | | | start_thread
| | | |
| | | --0.82%-- DecodeIPV4
| | | DecodeVLAN
| | | DecodeEthernet
| | | DecodePfring
| | | TmThreadsSlotVarRun
| | | TmThreadsSlotProcessPkt
| | | ReceivePfringLoop
| | | TmThreadsSlotPktAcqLoop
| | | start_thread
| | |
| | --2.41%-- DecodeIPV6
| | DecodeTunnel
| | DecodeTeredo
| | DecodeUDP
| | DecodeIPV4
| | DecodeVLAN
| | DecodeEthernet
| | DecodePfring
| | TmThreadsSlotVarRun
| | TmThreadsSlotProcessPkt
| | ReceivePfringLoop
| | TmThreadsSlotPktAcqLoop
| | start_thread