Linux是各種服務器甚至各種基礎設施的關鍵載體。對於Linux的維護者或者說使用者,快速檢測其故障原因至關重要。
一、檢測硬件相關信息
首先我們要檢測硬件的相關信息,排除硬件故障纔可以進一步去檢測程序運行錯誤。
可以使用lsblk,lscpu來輸出硬件信息,這裏我們使用lsblk來舉例
lmh@ubuntu:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
loop0 7:0 0 44.9M 1 loop /snap/gtk-common-themes/1440
loop1 7:1 0 14.8M 1 loop /snap/gnome-characters/399
loop2 7:2 0 91.3M 1 loop /snap/core/8592
loop3 7:3 0 54.7M 1 loop /snap/core18/1668
loop4 7:4 0 3.7M 1 loop /snap/gnome-system-monitor/127
loop5 7:5 0 4.2M 1 loop /snap/gnome-calculator/544
loop6 7:6 0 91.4M 1 loop /snap/core/8689
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/296
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 160.2M 1 loop /snap/gnome-3-28-1804/116
loop11 7:11 0 42.8M 1 loop /snap/gtk-common-themes/1313
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop14 7:14 0 54.4M 1 loop /snap/core18/1066
loop15 7:15 0 4M 1 loop /snap/gnome-calculator/406
sda 8:0 0 70G 0 disk
└─sda1 8:1 0 70G 0 part /
sr0 11:0 1 2G 0 rom /media/lmh/Ubuntu 18.04.3 LTS amd641
sr1 11:1 1 2G 0 rom /media/lmh/Ubuntu 18.04.3 LTS amd64
一般這時候我們就可以查看到相關硬件錯誤。
二、從日誌中發現錯誤和警告
Linux系統在運行時會儲存日常運行的日誌,我們可以通過日誌來分析錯誤原因。使用dmesg | more可以查看日誌中的報錯和警告
lmh@ubuntu:~$ dmesg | more
[ 0.000000] Linux version 5.3.0-40-generic (buildd@lcy01-amd64-024) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)) #32~18.04.1-Ubuntu SMP Mon Feb 3 14:05:59 UTC 2020 (Ubuntu 5.3.0-40.32~18.04.1-ge
neric 5.3.18)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-40-generic root=UUID=e7ca2622-528b-400f-9b21-ac56ff834cd2 ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Hygon HygonGenuine
[ 0.000000] Centaur CentaurHauls
[ 0.000000] zhaoxin Shanghai
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] BIOS-provided physical RAM map:
三、分析網絡正常與否
Linux作爲以網絡爲中心的系統,分析其網絡連接正常與否也是我們一大檢查點。可以使用ip addr、dig、ping等來分析網絡情況。我們使用ping localhost來分析網絡
lmh@ubuntu:~$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.033 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.029 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.037 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0.035 ms
64 bytes from localhost (127.0.0.1): icmp_seq=5 ttl=64 time=0.032 ms
64 bytes from localhost (127.0.0.1): icmp_seq=6 ttl=64 time=0.035 ms
64 bytes from localhost (127.0.0.1): icmp_seq=7 ttl=64 time=0.038 ms