CMCI
Starting with 45 nm Intel 64 processor on which CP
UID reports DisplayFamily_DisplayModel as 06H_1AH (see
CPUID instruction in Chapter 3, “Instruction Set Reference, A-L” in the Intel® 64 and IA-32 Architectures Software
Developer’s Manual, Volume 2A), the processor can report information on corrected machine-check errors and deliver a programmable interrupt for software to respond to MC errors, referred to as corrected machine-check error interrupt (CMCI). See Section 15.5 for detail.
用來探測45nm64位intelcpu的錯誤的工具。他會針對cpu中發生的錯誤進行計數,如果計數超過了閾值就會進行報錯。有兩種模式:中斷模式(interrupt mode)和輪詢模式(poll mode)
錯誤信息的存放
The machine-check error reporting mechanism that Pentium processors use is similar to that used in Pentium 4, Intel Xeon, Intel Atom, and P6 family processors. When an error is detected, it is recorded in P5_MC_TYPE and P5_MC_ADDR; the processor then generates a machine-check exception (#MC)
當檢測到錯誤的時候,CMCI架構會把這些信息存放到P5_MC_TYPE和P5_MC_ADDR寄存器中。
日誌中的體現
kernel: CMCI storm subsided: switching to interrupt mode kernel: CMCI storm detected: switching to poll mode
上面是在message中輸出的錯誤信息。我們知道CMCI架構每遇到一個錯誤就會產生一箇中斷。如果這個錯誤產生頻率太高,CMCI架構就會切換到輪詢模式(隔幾秒報告一次),以減小對cpu的影響。當報錯信息頻率降下來之後,就會切換回中斷模式。
我們通常可以在/var/log/mcelog中找到相關報錯信息