使用 Perf 衡量程序 FLOPs

FLOPs

FLOPs 是用來衡量科學計算程序計算量的關鍵指標,表示一個程序完整運算所需的浮點運算次數。在此,我使用系統性能評測工具 Perf 來衡量一個程序的 FLOPs。

安裝 Perf

Ubuntu/Debian

apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`

CentOS

yum install perf

查看支持的事件及其代號

安裝 libpfm4

git clone git://perfmon2.git.sourceforge.net/gitroot/perfmon2/libpfm4
cd libpfm4
make

查看事件

進入examples文件夾,運行showevtinfo程序,查看哪些事件是與 flops 相關的,在我的電腦中,我找到以下幾個事件

IDX	 : 419430470
PMU name : skl (Intel Skylake)
Name     : FP_ARITH_INST_RETIRED
Equiv	 : None
Flags    : None
Desc     : Floating-point instructions retired
Code     : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX	 : 419430469
PMU name : skl (Intel Skylake)
Name     : FP_ARITH
Equiv	 : FP_ARITH_INST_RETIRED
Flags    : None
Desc     : Floating-point instructions retired
Code     : 0xc7
Umask-00 : 0x01 : PMU : [SCALAR_DOUBLE] : None : Number of scalar double precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-01 : 0x02 : PMU : [SCALAR_SINGLE] : None : Number of scalar single precision floating-point arithmetic instructions (multiply by 1 to get flops)
Umask-02 : 0x04 : PMU : [128B_PACKED_DOUBLE] : None : Number of scalar 128-bit packed double precision floating-point arithmetic instructions (multiply by 2 to get flops)
Umask-03 : 0x08 : PMU : [128B_PACKED_SINGLE] : None : Number of scalar 128-bit packed single precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-04 : 0x10 : PMU : [256B_PACKED_DOUBLE] : None : Number of scalar 256-bit packed double precision floating-point arithmetic instructions (multiply by 4 to get flops)
Umask-05 : 0x20 : PMU : [256B_PACKED_SINGLE] : None : Number of scalar 256-bit packed single precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-06 : 0x40 : PMU : [512B_PACKED_DOUBLE] : None : Number of scalar 512-bit packed double precision floating-point arithmetic instructions (multiply by 8 to get flops)
Umask-07 : 0x80 : PMU : [512B_PACKED_SINGLE] : None : Number of scalar 512-bit packed single precision floating-point arithmetic instructions (multiply by 16 to get flops)
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)
#-----------------------------
IDX	 : 419430414
PMU name : skl (Intel Skylake)
Name     : FP_ASSIST
Equiv	 : None
Flags    : None
Desc     : X87 floating-point assists
Code     : 0xca
Umask-00 : 0x1001e : PMU : [ANY] : [default] : Cycles with any input/output SEE or FP assists
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1) (boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-06 : 0x07 : PMU : [intx] : monitor only inside transactional memory region (boolean)
Modif-07 : 0x08 : PMU : [intxcp] : do not count occurrences inside aborted transactional memory region (boolean)

獲取代號

在相同目錄下,執行check_events程序,獲取指定代號,程序的參數就是上一步驟中獲取的Name和Umask,我的執行命令就是如下:

./check_events FP_ARITH_INST_RETIRED:SCALAR_SINGLE FP_ARITH:SCALAR_SINGLE FP_ASSIST

得到如下結果:

Requested Event: FP_ARITH_INST_RETIRED:SCALAR_SINGLE
Actual    Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430470
Codes          : 0x5302c7
Requested Event: FP_ARITH:SCALAR_SINGLE
Actual    Event: skl::FP_ARITH_INST_RETIRED:SCALAR_SINGLE:k=1:u=1:e=0:i=0:c=0:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430470
Codes          : 0x5302c7
Requested Event: FP_ASSIST
Actual    Event: skl::FP_ASSIST:ANY:k=1:u=1:e=0:i=0:c=1:t=0:intx=0:intxcp=0
PMU            : Intel Skylake
IDX            : 419430414
Codes          : 0x1531eca

結果中的 Codes,就是我們要的代號

衡量程序 FLOPs

找到要測量的程序,然後使用perf stat執行並給予事件代碼,即可獲得 FLOPs。示例如下:

sudo perf stat -e r5302c7 -e r1531eca  ./example.py

得到結果如下:

Performance counter stats for './example.py':

        13,061,638      r5302c7
        1      r1531eca
        
        1.834101748 seconds time elapsed
        
        1.888016000 seconds user
        0.231023000 seconds sys

其中,r5302c7對應的數值,即爲該程序的總 FLOPs。

歡迎關注我的公衆號~
public account

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章