CMCI介紹以及常用日誌解析

CMCI


Starting with 45 nm Intel 64 processor on which CP

UID reports DisplayFamily_DisplayModel as 06H_1AH (see

CPUID instruction in Chapter 3, “Instruction Set Reference, A-L” in the Intel® 64 and IA-32 Architectures Software

Developer’s Manual, Volume 2A), the processor can report information on corrected machine-check errors and deliver a programmable interrupt for software to respond to MC errors, referred to as corrected machine-check error interrupt (CMCI). See Section 15.5 for detail.

用來探測45nm64位intelcpu的錯誤的工具。他會針對cpu中發生的錯誤進行計數,如果計數超過了閾值就會進行報錯。有兩種模式:中斷模式(interrupt mode)和輪詢模式(poll mode)


錯誤信息的存放


The machine-check error reporting mechanism that Pentium processors use is similar to that used in Pentium 4, Intel Xeon, Intel Atom, and P6 family processors. When an error is detected, it is recorded in P5_MC_TYPE and P5_MC_ADDR; the processor then generates a machine-check exception (#MC)

當檢測到錯誤的時候,CMCI架構會把這些信息存放到P5_MC_TYPE和P5_MC_ADDR寄存器中。


日誌中的體現


kernel: CMCI storm subsided: switching to interrupt mode kernel: CMCI storm detected: switching to poll mode

上面是在message中輸出的錯誤信息。我們知道CMCI架構每遇到一個錯誤就會產生一箇中斷。如果這個錯誤產生頻率太高,CMCI架構就會切換到輪詢模式(隔幾秒報告一次),以減小對cpu的影響。當報錯信息頻率降下來之後,就會切換回中斷模式。

我們通常可以在/var/log/mcelog中找到相關報錯信息


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章