想用這個東西很久了,免得每天都有跑到機房去檢查每臺服務器的硬件狀態指示燈。本文采取直播的方式記錄,一邊配置一邊寫,經驗證成功的步驟貼在此處,供以後參考。廢話不多說,直接上吧。
1、下載check_dell_openmanage插件,地址:
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=85&cf_id=24
2、將插件cp到以下目錄:
/usr/local/nagios/libexec
3、先看看插件的用法:
[root@pcnnagios libexec]# ./check_dell_openmanage.pl -h SNMP Dell OpenManage Monitor for Nagios version 1.3 by Jason Ellison - infotek(at)gmail.com Usage: ./check_dell_openmanage.pl [-v] -H <host> -C <snmp_community> [-2] | (-l login -x passwd) [-P <port>] -T test|de llom|dellom_storage|blade|global|chassis|custom [-t <timeout>] [-V] [-u <unknown_default>] -v, --verbose print extra debugging information -h, --help print this help message -H, --hostname=HOST name or IP address of host to check -C, --community=COMMUNITY NAME community name for the host's SNMP agent (implies v 1 protocol) -2, --v2c use SNMP v2 (instead of SNMP v1) -P, --port=PORT SNMPd port (Default 161) -t, --timeout=INTEGER timeout for SNMP in seconds (Default: 5) -V, --version prints version number -u, --unknown_default=INT If attribute is not found then report the output as this number (i.e. -u 0) -T, --type=test|dellom|dellom_storage|blade|global|chassis|custom This allows to use pre-defined system type Currently support systems types are: test (tries all OID's in verbose mode can be used to generate new system type) dellom (Dell OpenManage general detailed) dellom_storage (Dell OpenManage plus Storage Management detailed) blade (some features are on the chassis not the blade) global (only check the global health status) chassis (only check the system chassis health status) custom (intended for customization)
4、測試一下環境:
./check_dell_openmanage.pl -v 2C -C public -H 192.168.0.1 -T test
報錯了,提示:On the nagios server that will be running the plugin you must have the perl "Net::SNMP" module installed.有給出安裝方法:
perl -MCPAN -e shell
cpan> install Net::SNMP
此處花了一點下心思,CPAN.pm我的系統中默認沒裝,經過一番摸索也裝起來了,此處僅貼上網上找來的衆大神給出的安裝步驟,並非我的環境:
參考資料:
http://blog.haohtml.com/archives/12708
http://www.cnblogs.com/mopmoq/archive/2009/04/06/1430210.html
[root@GM ~]#wget http://cpan.communilink.net/authors/id/A/AN/ANDK/CPAN-1.9600.tar.gz [root@GM ~]# tar -zxvf CPAN-1.9600.tar.gz [root@GM ~]#cd CPAN-1.9600 [root@GM CPAN-1.9600]# perl Makefile.PL [root@GM CPAN-1.9600]# make [root@GM CPAN-1.9600]# make install [root@GM CPAN-1.9600]# perl -MCPAN -e shell 此處省略n行,稍微看了下,全選的默認和自動配置 cpan(1)> install Net::SNMP
5、OK,裝完再測測看:
[root@pcnnagios libexec]# ./check_dell_openmanage.pl -v 2C -C public -H 10.40.1.131 -T test TEST MODE: Alarm at 5 The Net::SNMP library is available on your server Trying all preconfigured Dell OID's against target... StorageManagementGlobalSystemStatus (.1.3.6.1.4.1.674.10893.1.20.110.13.0) RESULT: 3(ok) chassisManufacturerName (1.3.6.1.4.1.674.10892.1.300.10.1.8.1) RESULT: Dell Inc. chassisModelName (1.3.6.1.4.1.674.10892.1.300.10.1.9.1) RESULT: PowerEdge 2950 chassisServiceTagName (1.3.6.1.4.1.674.10892.1.300.10.1.11.1) RESULT: 53DG52X chassisSystemName (1.3.6.1.4.1.674.10892.1.300.10.1.15.1) RESULT: pcnexconn operatingSystemOperatingSystemName (1.3.6.1.4.1.674.10892.1.400.10.1.6.1) RESULT: Microsoft Windows Server 2003, Enterprise Edition operatingSystemOperatingSystemVersionName (1.3.6.1.4.1.674.10892.1.400.10.1.7.1) RESULT: Version 5.2 (Build 3790 : Service Pack 2) (x86) systemStateACPowerCordStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.36.1) RESULT: NO RESPONSE systemStateACPowerSwitchStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.46.1) RESULT: NO RESPONSE systemStateAmperageStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.15.1) RESULT: NO RESPONSE systemStateBatteryStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.52.1) RESULT: 3(ok) systemStateChassisIntrusionStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.30.1) RESULT: 3(ok) systemStateChassisStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.4.1) RESULT: 3(ok) systemStateCoolingDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.21.1) RESULT: 3(ok) systemStateCoolingUnitStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.44.1) RESULT: 3(ok) systemStateEventLogStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.41.1) RESULT: 3(ok) systemStateGlobalSystemStatus (.1.3.6.1.4.1.674.10892.1.200.10.1.2.1) RESULT: 3(ok) systemStateMemoryDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.27.1) RESULT: 3(ok) systemStatePowerSupplyStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.9.1) RESULT: 3(ok) systemStatePowerUnitStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.42.1) RESULT: 3(ok) systemStateProcessorDeviceStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.50.1) RESULT: 3(ok) systemStateTemperatureStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.24.1) RESULT: 3(ok) systemStateVoltageStatusCombined (.1.3.6.1.4.1.674.10892.1.200.10.1.12.1) RESULT: 3(ok) Please email the results to Jason Ellison - [email protected] To add this system to check_dell_openmanage, use something like the following: "pexxxx" => [ 'StorageManagementGlobalSystemStatus', 'systemStateBatteryStatusCombined' 'systemStateChassisIntrusionStatusCombined' 'systemStateChassisStatus' 'systemStateCoolingDeviceStatusCombined' 'systemStateCoolingUnitStatusCombined' 'systemStateEventLogStatus' 'systemStateGlobalSystemStatus' 'systemStateMemoryDeviceStatusCombined' 'systemStatePowerSupplyStatusCombined', 'systemStatePowerUnitStatusCombined', 'systemStateProcessorDeviceStatusCombined', 'systemStateTemperatureStatusCombined', 'systemStateVoltageStatusCombined' ], [root@pcnnagios libexec]#
6、看起來一切OK,來用nagios讓這廝發揮功效吧:
#'chech_dell_openmanage' command definition define command{ command_name check_dell_openmanage command_line $USER1$/check_dell_openmanage.pl -v 2C -H $HOSTADDRESS$ -C public -T $ARG1$ }
define service{ use generic-service host_name hostname service_description Check Dell Hardware check_command check_dell_openmanage!dellom_storage }
7、檢查下配置:
nagioscheck ;service nagios reload
8、一切正常,看看成果吧;
(硬態 狀態) 當前的狀態: 正常(OK) 狀態信息: Alarm at 5 The Net::SNMP library is available on your server SNMP responses... RESULT: systemStateChassisStatus .1.3.6.1.4.1.674.10892.1.200.10.1.4.1 = 3(ok) RESULT: systemStatePowerSupplyStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.9.1 = 3(ok) RESULT: systemStateVoltageStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.12.1 = 3(ok) RESULT: systemStateCoolingDeviceStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.21.1 = 3(ok) RESULT: systemStateTemperatureStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.24.1 = 3(ok) RESULT: systemStateMemoryDeviceStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.27.1 = 3(ok) RESULT: systemStateChassisIntrusionStatusCombined .1.3.6.1.4.1.674.10892.1.200.10.1.30.1 = 3(ok) RESULT: systemStateEventLogStatus .1.3.6.1.4.1.674.10892.1.200.10.1.41.1 = 3(ok) RESULT: StorageManagementGlobalSystemStatus .1.3.6.1.4.1.674.10893.1.20.110.13.0 = 3(ok) Dell Status to Nagios Status mapping... systemStateTemperatureStatusCombined: statuscode = OK StorageManagementGlobalSystemStatus: statuscode = OK systemStateEventLogStatus: statuscode = OK systemStateMemoryDeviceStatusCombined: statuscode = OK systemStatePowerSupplyStatusCombined: statuscode = OK systemStateVoltageStatusCombined: statuscode = OK systemStateCoolingDeviceStatusCombined: statuscode = OK systemStateChassisIntrusionStatusCombined: statuscode = OK systemStateChassisStatus: statuscode = OK OK: EXIT CODE: 0 STATUS CODE: OK 性能數據: 當前嘗試: 1/3 最近檢查時間: 2013-06-22 23:04:23 檢測類型: 主動式 檢測等待時間/檢測時延: 0.770 / 0.195 秒 下次檢測計劃檢測時間: 2013-06-22 23:06:23 最近狀態改變時間: 2013-06-22 22:56:22 最後一次送出通知時間: N/A (通知次數 0) 抖動是否執行? 未抖動 抖動值(狀態變化率 0.00%) 處於計劃宕機時間? 沒有 最近更新: 2013-06-22 23:04:51 開啓主動檢查: 啓用 開啓被動檢查: 啓用 Obsessing: 啓用 通知: 啓用 事件處理: 啓用 抖動監測: 啓用
收工,洗洗睡咯!!