1、第一節

1、perf top

關於perf top界面常用命令如下：

h：顯示幫助，即可顯示詳細的幫助信息。
UP/DOWN/PGUP/PGDN/SPACE：上下和翻頁。
a：annotate current symbol，註解當前符號。能夠給出彙編語言的註解，給出各條指令的採樣率。
d：過濾掉所有不屬於此DSO的符號。非常方便查看同一類別的符號。
P：將當前信息保存到perf.hist.N中。

perf top常用選項有：

-e <event>：指明要分析的性能事件。
-p <pid>：Profile events on existing Process ID (comma sperated list). 僅分析目標進程及其創建的線程。
-k <path>：Path to vmlinux. Required for annotation functionality. 帶符號表的內核映像所在的路徑。
-K：不顯示屬於內核或模塊的符號。
-U：不顯示屬於用戶態程序的符號。
-d <n>：界面的刷新週期，默認爲2s，因爲perf top默認每2s從mmap的內存區域讀取一次性能數據。
-g：得到函數的調用關係圖。

perf top --call-graph [fractal]，路徑概率爲相對值，加起來爲100%，調用順序爲從下往上。
perf top --call-graph graph，路徑概率爲絕對值，加起來爲該函數的熱度。

2、perf stat

perf stat用於運行指令，並分析其統計結果。雖然perf top也可以指定pid，但是必須先啓動應用才能查看信息。
perf stat能完整統計應用整個生命週期的信息。
命令格式爲：

perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]

下面簡單看一下perf stat 的輸出：

al@al-System-Product-Name:~/perf$ sudo perf stat
^C
  Performance counter stats for 'system wide':

      40904.820871      cpu-clock (msec)          #    5.000 CPUs utilized          
             18,132      context-switches          #    0.443 K/sec                  
              1,053      cpu-migrations            #    0.026 K/sec                  
              2,420      page-faults               #    0.059 K/sec                  
      3,958,376,712      cycles                    #    0.097 GHz                      (49.99%)
        574,598,403      stalled-cycles-frontend   #   14.52% frontend cycles idle     (49.98%)
      9,392,982,910      stalled-cycles-backend    #  237.29% backend cycles idle      (50.00%)
      1,653,185,883      instructions              #    0.42  insn per cycle         
                                                   #    5.68  stalled cycles per insn  (50.01%)
        237,061,366      branches                  #    5.795 M/sec                    (50.02%)
         18,333,168      branch-misses             #    7.73% of all branches          (50.00%)
       8.181521203 seconds time elapsed

輸出解釋如下：

cpu-clock：任務真正佔用的處理器時間，單位爲ms。CPUs utilized = task-clock / time elapsed，CPU的佔用率。
context-switches：程序在運行過程中上下文的切換次數。
CPU-migrations：程序在運行過程中發生的處理器遷移次數。Linux爲了維持多個處理器的負載均衡，在特定條件下會將某個任務從一個CPU遷移到另一個CPU。
CPU遷移和上下文切換：發生上下文切換不一定會發生CPU遷移，而發生CPU遷移時肯定會發生上下文切換。發生上下文切換有可能只是把上下文從當前CPU中換出，下一次調度器還是將進程安排在這個CPU上執行。
page-faults：缺頁異常的次數。當應用程序請求的頁面尚未建立、請求的頁面不在內存中，或者請求的頁面雖然在內存中，但物理地址和虛擬地址的映射關係尚未建立時，都會觸發一次缺頁異常。另外TLB不命中，頁面訪問權限不匹配等情況也會觸發缺頁異常。
cycles：消耗的處理器週期數。如果把被ls使用的cpu cycles看成是一個處理器的，那麼它的主頻爲2.486GHz。可以用cycles / task-clock算出。
stalled-cycles-frontend：指令讀取或解碼的質量步驟，未能按理想狀態發揮並行左右，發生停滯的時鐘週期。
stalled-cycles-backend：指令執行步驟，發生停滯的時鐘週期。
instructions：執行了多少條指令。IPC爲平均每個cpu cycle執行了多少條指令。
branches：遇到的分支指令數。branch-misses是預測錯誤的分支指令數。

其他常用參數

    -a, --all-cpus        顯示所有CPU上的統計信息
    -C, --cpu <cpu>       顯示指定CPU的統計信息
    -c, --scale           scale/normalize counters
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (>= 10)
    -i, --no-inherit      child tasks do not inherit counters
    -n, --null            null run - dont start any counters
    -o, --output <file>   輸出統計信息到文件
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
...

示例
前面統計程序的示例，下面看一下統計CPU信息的示例：
執行sudo perf stat -C 0，統計CPU 0的信息。想要停止後，按下Ctrl+C終止。可以看到統計項一樣，只是統計對象變了。

al@al-System-Product-Name:~/perf$ sudo perf stat -C 0
^C
  Performance counter stats for 'CPU(s) 0':

       2517.107315      cpu-clock (msec)          #    1.000 CPUs utilized          
              2,941      context-switches          #    0.001 M/sec                  
                109      cpu-migrations            #    0.043 K/sec                  
                 38      page-faults               #    0.015 K/sec                  
        644,094,340      cycles                    #    0.256 GHz                      (49.94%)
         70,425,076      stalled-cycles-frontend   #   10.93% frontend cycles idle     (49.94%)
        965,270,543      stalled-cycles-backend    #  149.86% backend cycles idle      (49.94%)
        623,284,864      instructions              #    0.97  insn per cycle         
                                                   #    1.55  stalled cycles per insn  (50.06%)
         65,658,190      branches                  #   26.085 M/sec                    (50.06%)
          3,276,104      branch-misses             #    4.99% of all branches          (50.06%)
       2.516996126 seconds time elapsed

如果需要統計更多的項，需要使用-e，如：

perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

結果如下，關注的特殊項也納入統計。

al@al-System-Product-Name:~/perf$ sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls
Performance counter stats for 'ls':

          2.319422      task-clock (msec)         #    0.719 CPUs utilized          
                  0      context-switches          #    0.000 K/sec                  
                  0      cpu-migrations            #    0.000 K/sec                  
                 89      page-faults               #    0.038 M/sec                  
          2,142,386      cycles                    #    0.924 GHz                    
            659,800      stalled-cycles-frontend   #   30.80% frontend cycles idle   
            725,343      stalled-cycles-backend    #   33.86% backend cycles idle    
          1,344,518      instructions              #    0.63  insn per cycle         
                                                   #    0.54  stalled cycles per insn
      <not counted>      branches                                                    
      <not counted>      branch-misses                                               
      <not counted>      L1-dcache-loads                                             
      <not counted>      L1-dcache-load-misses                                       
      <not counted>      LLC-loads                                                   
      <not counted>      LLC-load-misses                                             
      <not counted>      dTLB-loads                                                  
      <not counted>      dTLB-load-misses                                           
       0.003227507 seconds time elapsed

3、perf record

sudo perf record -a -g ./test 會在當前目錄生成perf.data文件。
sudo perf record sleep常用選項

-e record指定PMU事件
    --filter  event事件過濾器
-a  錄取所有CPU的事件
-p  錄取指定pid進程的事件
-o  指定錄取保存數據的文件名
-g  使能函數調用圖功能
-C 錄取指定CPU的事件

4、perf report

perf report -i perf.data，可以看出main函數所佔百分比，以及funcA和funcB分別所佔百分比。

sudo perf report --call-graph none結果如下,後面結合perf timechart分析.
上圖看上去比較亂，如果想只看test產生的信息：
sudo perf report --call-graph none -c test

2、第二節

全局性概況：

perf list查看當前系統支持的性能事件；
perf bench對系統性能進行摸底；
perf test對系統進行健全性測試；
perf stat對全局性能進行統計；

全局細節：

perf top可以實時查看當前系統進程函數佔用率情況；
perf probe可以自定義動態事件；

特定功能分析：

perf kmem針對slab子系統性能分析；
perf kvm針對kvm虛擬化分析；
perf lock分析鎖性能；
perf mem分析內存slab性能；
perf sched分析內核調度器性能；
perf trace記錄系統調用軌跡；

最常用功能perf record，可以系統全局，也可以具體到某個進程，更甚具體到某一進程某一事件；可宏觀，也可以很微觀。

pref record記錄信息到perf.data；
perf report生成報告；
perf diff對兩個記錄進行diff；
perf evlist列出記錄的性能事件；
perf annotate顯示perf.data函數代碼；
perf archive將相關符號打包，方便在其它機器進行分析；
perf script將perf.data輸出可讀性文本；

可視化工具perf timechart

perf timechart record記錄事件；
perf timechart生成output.svg文檔；

3、第三節

1、perf命令簡要介紹

性能調優時，我們通常需要分析查找到程序百分比高的熱點代碼片段，這便需要使用 perf record 記錄單個函數級別的統計信息，並使用 perf report 來顯示統計結果；
perf record
perf report
舉例：
sudo perf record -e cpu-clock -g -p 2548
-g 選項是告訴perf record額外記錄函數的調用關係
-e cpu-clock 指perf record監控的指標爲cpu週期
-p 指定需要record的進程pid

程序運行完之後，perf record會生成一個名爲perf.data的文件，如果之前已有，那麼之前的perf.data文件會被覆蓋
獲得這個perf.data文件之後，就需要perf report工具進行查看
perf report -i perf.data
-i 指定要查看的文件
以診斷mysql爲例,report結果：
$sudo perf report -i perf.data

2、使用火焰圖展示結果

1、Flame Graph項目位於GitHub上：https://github.com/brendangregg/FlameGraph
2、可以用git將其clone下來：git clone https://github.com/brendangregg/FlameGraph.git

我們以perf爲例，看一下flamegraph的使用方法：
1、第一步
$sudo perf record -e cpu-clock -g -p 28591
Ctrl+c結束執行後，在當前目錄下會生成採樣數據perf.data.
2、第二步
用perf script工具對perf.data進行解析
perf script -i perf.data &> perf.unfold
3、第三步
將perf.unfold中的符號進行摺疊：
#./stackcollapse-perf.pl perf.unfold &> perf.folded
4、最後生成svg圖：
./flamegraph.pl perf.folded > perf.svg

Linux 調試輔助工具之perf 火焰圖

1、第一節

1、perf top

2、perf stat

3、perf record

4、perf report

2、第二節

3、第三節

1、perf命令簡要介紹

2、使用火焰圖展示結果

Python 爬蟲：Spring Boot 反爬蟲的成功案例

京東科技數字化營銷能力的演進與最佳實踐| 京東雲技術團隊

基於VISA標準的儀器驅動器設計

vsftp 移植到arm

GDB 交差編譯

Linux C 程序執行shell命令並獲取返回值結果的方法

ffmpeg簡介及編碼支持

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結