項目地址: https://github.com/zhangrj/EMC-VNX-Storage-Zabbix-Monitor
開發背景
EMC VNX5500存儲是公司最核心的存儲設備,一旦出問題,整個平臺就會陷入癱瘓。在我到來之前,EMC存儲的巡檢完全依賴人工遠程與現場代維,今年5月份的時候,我開始着手解決這個問題。
最先想到的監控方法肯定是SNMP/SNMPTRAP,但很不幸的是,我找了大半天也沒有找到配置SNMP或SNMPTRAP的地方,也沒有搜索到設備的MIB參考文檔。在瀏覽相關資料的時候,找到了通過命令行配置存儲設備的管理工具Navisphere,使用該工具可查看存儲狀態,簡單編寫一點程序結合Zabbix即可實現監控。
Navisphere命令行工具安裝
正常使用rpm安裝即可
[root@localhost ~]# rpm -ivh NaviCLI-Linux-64-x86-en_US-7.33.9.2.36-1.x86_64.rpm
Preparing... ########################################### [100%]
1:NaviCLI-Linux-64-x86-en########################################### [100%]
Run the script /opt/Navisphere/bin/setlevel_cli.sh to set the security level before you proceed.
根據提示設置安全等級,輸入2選擇medium等級即可。
[root@localhost ~]# /opt/Navisphere/bin/setlevel_cli.sh
Please enter the verifying level(low|medium|l|m) to set?
2
Setting (default) medium verifying level.....
Verification level medium has been set SUCCESSFULLY!!!
創建一個安全文件,這樣使用時就不用再輸入用戶名和密碼。安全文件是加密的,且與本機綁定,user參數爲EMC管理用戶名、password爲密碼,scope域的值對應<0 – global; 1 – local; 2 – LDAP>:
[root@localhost ~]# cd /opt/Navisphere/bin/
[root@localhost bin]# ls
admsnap naviseccli setlevel_cli.sh setlevel.log
[root@localhost bin]# ./naviseccli -AddUserSecurity -user emc_username -password emc_passwd -scope 0
[root@localhost bin]# cd /root
[root@localhost ~]# ls
SecuredCLISecurityFile.xml
SecuredCLIXMLEncrypted.key
第一次執行查詢命令需要保存證書,選擇2接受並保存,再次執行命令即可直接顯示信息:
[root@localhost ~]# cd /opt/Navisphere/bin/
[root@localhost bin]# ls
admsnap naviseccli setlevel_cli.sh setlevel.log
[root@localhost bin]# ./naviseccli -h 192.168.130.75 getcrus
Unable to validate the identity of the server. There are issues with the certificate presented.
Only import this certificate if you have reason to believe it was sent by a trusted source.
Certificate details:
Subject: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Issuer: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Serial#: fe91a4ec
Valid From: 20121126045806Z
Valid To: 20271123045806Z
Would you like to [1]Accept the certificate for this session, [2] Accept and store, [3] Reject the certificate?
Please input your selection(The default selection is [1]):
2
DPE7 Bus 0 Enclosure 0
SP A State: Present
SP B State: Present
......
[root@localhost bin]# ./naviseccli -h 192.168.130.75 getcrus
DPE7 Bus 0 Enclosure 0
SP A State: Present
SP B State: Present
......
查看已保存的證書:
[root@localhost ~]# /opt/Navisphere/bin/naviseccli security -certificate -list
--------------------------------------------
Subject: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Issuer: CN=192.168.130.75,CN=A-IMAGE,C=US,ST=Massachusetts,L=Southboro,O=EMC Corporation,OU=CLARiiON
Serial#: fe91a4ec
Valid From: 20121126045806Z
Valid To: 20271123045806Z
--------------------------------------------
NaviSecCLI常用命令
顯示系統中各組件狀態:
naviseccli -h <ip> getcrus
顯示哪個SP是某個LUN默認和當前的主SP:
naviseccli -h <ip> getlun -default -owner
顯示指定行數的SPlog日誌(如:200行):
naviseccli -h <ip> getlog -200
或將輸出結果另存爲本地文件:
naviseccli -h <ip> getlog -200 > getlog_spa.txt
確認SP Agent狀態:
naviseccli -h <ip> getagent
顯示主機LUN和陣列LUN信息:
naviseccli -h <ip> storagegroup -list
顯示RAID Group基本信息:
naviseccli -h <ip> getrg 0
顯示磁盤信息:
naviseccli -h <ip> getdisk
naviseccli -h <ip> getdisk 0_0_5
找出哪些LUN有Dirty Cache:
naviseccli -h <ip> getlun -luncache
顯示Rebuild進度:
naviseccli -h <ip> getlun [lun] -prb
收集SPCollects日誌:
naviseccli -h <ip> spcollect
naviseccli -h <ip> managefiles -retrieve
列出哪些HBA登錄了系統中:
naviseccli -h <ip> port -list
列出組件的部件號:
naviseccli -h <ip> getresume
顯示Cache是否啓用及配置信息:
naviseccli -h <ip> getcache
列出被啓用的系統功能包:
naviseccli -h <ip> ndu -list
Trespass某個LUN:
naviseccli -h <ip> trespass <lun>
發起一個後臺sniffer檢查命令:
naviseccli -h <ip> setsniffer <lun> -bv -bvtime high -cr
獲得Sniffer報告:
naviseccli -h <ip> getsniffer <lun>
監控腳本介紹及使用方法
emc_discovery.py ,用於構建json數據,實現Zabbix中的自動發現,可自動發現 CPU、DIMM、Disk、I/O、LCC、Power、SP、SPS、SPS Cable 。
emc_state.py ,獲取監控項的監控數據。
注意以下幾點:
- 數據均通過zabbix_sender向zabbix_server傳遞;
- 需要修改腳本中的EMC存儲地址及zabbix_server地址;
- 兩個腳本可能並不適用其他配置的EMC存儲,但基本思路及數據處理方法相同,讀者可根據自己的存儲配置進行修改。
- 工作雜事太多,沒有對腳本進行優化(包括自動發現通用性、處理過程函數化等),先將就一下。
配置兩條crontab定時任務即可,例如:
0 23 * * 6 /usr/bin/python /root/EMC/emc_discovery.py > /tmp/emc_discovery.log
5 * * * * /usr/bin/python /root/EMC/emc_state.py > /tmp/emc_state.log
每週六23點執行一次自動發現,每小時取一次監控項數據。
Zabbix web端的配置
新建主機,hostname字段與腳本中zabbix_sender的-z參數保持一致即可。
手動執行一次腳本,查看監控數據是否刷新。