zabbix問題記錄

zabbix部署好,在使用一段時間後,出現了不少報錯,在此簡單做一記錄。
1)Zabbix監控界面報錯Lack of free swap space on Zabbix server”解決
公司線上部署的zabbix3.0的監控界面首頁報錯說無交換內存主機“Lack of free swap space on Zabbix server”
解決此問題的步驟如下:
選擇Configuration->Templates(模板),在模板界面中選擇Template OS Linux右側的Triggers(觸發器),在觸發器頁面中打開Lack of free swap space on {HOST.NAME}項目,在新打開的觸發器編輯頁面中修改Expression(表達式)的內容,由原先的
{Template OS Linux:system.swap.size[,pfree].last(0)}<50
修改爲
{Template OS Linux:system.swap.size[,pfree].last(0)}<50 and {Template OS Linux:system.swap.size[,free].last(0)}<>0
此處修改增加了“ and {Template OS Linux:system.swap.size[,free].last(0)}<>0”判斷系統有交換空間,當系統無交換空間即{Template OS Linux:system.swap.size[,free].last(0)}的值爲0時將不會時表達式不成立就不會觸發錯誤提示。保存之後在下一個更新週期內Zabbix之前報告的“Lack of free swap space”問題就會被自動標記爲Resolved(已解決)。

2)zabbix監控界出現“Zabbix poller processes more than 75% busy ”報警
線上部署的zabbix監控環境運行一段時間後,突然出現了報警“Zabbix poller processes more than 75% busy“
其實,Zabbix的監控警報有很多種,比較常見的幾個莫過於內存耗盡,網絡不通,IO太慢還有這個“Zabbix poller processes more than 75% busy”了。一開始的時候因爲這個即不影響使用也持續一會兒就自行解決就沒有多在意。然後隨着數據庫的增大,Zabbix消耗的內存可是越來越多,Poller processes(輪詢)開始天天Busy了.
最後,發現解決這個問題很簡單!
可以增加Zabbix Server啓動時初始化的進程數量,但這樣做直接增加了輪詢的負載量,內存配置充足的情況下完全可以這麼做。

具體編輯Zabbix Server的配置文件/etc/zabbix/zabbix_server.conf,找到配置StartPollers的段落:
### Option: StartPollers
# Number of pre-forked instances of pollers.
#
# Mandatory: no
# Range: 0-1000
# Default:
# StartPollers=5

取消StartPollers前的#號註釋,修改5爲10或者更大【由於線上機器內存64G的,我此處修改成60或80】
修改後,重啓zabbix_server
#pkill -9 zabbix_server
#/usr/local/zabbix/sbin/zabbix-server
過一會兒就發現觸發器裏已經沒有類似的警告了

當然,我們也可以額定時寫個腳本來重啓zabbix_server來降低負載
下面是腳本/root/zabbix-restart.sh
#!/bin/bash
/usr/bin/pkill zabbix_server
/usr/local/zabbix/sbin/zabbix_server
然後crontab做計劃任務
0 3 * * * /bin/bash -x /root/zabbix-restart.sh > /dev/null 2>&1

3)zabbix Too many processes on

解決辦法:將對應的觸發器的閥值設置大點(默認是300,可以改到3000)

直接點擊下面兩行,將它倆的閥值(300、30)分別都調大(比如調大到3000、300)

更新以上修改後,刷新,過段時間這個報錯就會消失了。

4)監控圖裏獲取不到數據
可以先在服務端的命令行禮通過命令:
# /usr/local/zabbix/bin/zabbix_get -s 192.168.1.10 -p 10050 -k "mysql.status[Uptime]"
其中:-s後面跟的是被監控機的ip地址;-k後面跟的是監控項的鍵值,這個可以在zabbix頁面裏對應監控項裏查到。如果在服務端通過以上命令能獲取到數據,那麼在zabbix監控頁面的圖形裏顯示獲取不到數據,可能就是web頁面裏的配置問題了。

5)內存溢出導致zabbix_server服務關閉
138401:20170630:172159.850 using configuration file: /data/zabbix/etc/zabbix_server.conf
138401:20170630:172159.854 current database version (mandatory/optional): 03020000/03020000
138401:20170630:172159.854 required mandatory version: 03020000
138401:20170630:172200.238 __mem_malloc: skipped 0 asked 48 skip_min 4294967295 skip_max 0
138401:20170630:172200.238 [file:strpool.c,line:53] zbx_mem_malloc(): out of memory (requested 42 bytes)
138401:20170630:172200.238 [file:strpool.c,line:53] zbx_mem_malloc(): please increase CacheSize configuration parameter

解決辦法:
打開zabbix_server.conf 找到 Option: CacheSize
把原來的 # CacheSize=8M 前面的#註釋去掉,將8M修改爲1024,這個1024根據服務器性能修改。

# vim /data/zabbix/etc/zabbix_agentd.conf
......
CacheSize=1024M

然後重啓zabbix_server即可

6)zabbix數據庫連接數超額導致連接失敗

mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 152   |
+-----------------+-------+
1 row in set (0.00 sec)
 
默認是152的連接數。修改方法如下:
1)臨時性修改
mysql> set GLOBAL max_connections=1024;
mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1024  |
+-----------------+-------+
1 row in set (0.00 sec)
 
2)永久性修改
在my.cnf文件中配置:
[mysqld]                     //新添加一行如下參數
max_connections=1000
 
重啓mysql服務即可

7)zabbix的web界面中的cpu監控圖中顯示的負載是0.002-0.0014,這顯然是不對的,跟服務器上uptime現實的cpu負載不一致!

解決辦法:
修改模板(Template OS Linux)--監控項--Processor load (1 min average per core)--鍵值:
把 system.cpu.load[percpu,avg1] 改爲 system.cpu.load[all,avg1]

8)zabbix_server.log裏出現如下報錯:

zabbix_server.log裏出現如下報錯:
95213:20180101:154323.271 cannot send list of active checks to "10.0.8.20": host [jumpserver01.kevin.cn] not found
95212:20180101:154323.549 cannot send list of active checks to "10.0.56.21": host [cx-app02.kevin.cn] not found
95216:20180101:154324.768 cannot send list of active checks to "10.0.54.21": host [bl2-app02.kevin.cn] not found
95212:20180101:154325.072 cannot send list of active checks to "10.0.52.22": host [nc-app02.kevin.cn] not found

原因分析:
zabbix_agentd.conf文件中配置的Hostname內容和zabbix的web界面"配置"->"主機"的主機名稱配置不一致導致的,修改成一致內容即可!

9)zabbix_server.log裏出現如下報錯:

95219:20180101:162139.869 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitted
95219:20180101:162140.871 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitted
95219:20180101:162141.874 fping failed: /usr/local/sbin/fping: can't create raw socket (must run as root?) : Operation not permitte

解決辦法:
1)確保zabbix的agent客戶機的zabbix有sudo權限
[root@web01 ~]# chattr -i /etc/sudoers
[root@web01 ~]# chmod 640 /etc/sudoers
[root@web01 ~]# echo "zabbix  ALL=(ALL)      NOPASSWD: ALL" >> /etc/sudoers
[root@web01 ~]# chmod 440 /etc/sudoers
[root@web01 ~]# chattr +i /etc/sudoers
 
2)修改zabbix的server服務器端fping的權限 ,這一步很重要!!
[root@zabbix01 ~]# ll /usr/local/sbin/fping
-rwxr-xr-x 1 root root 67110 12月 11 17:18 /usr/local/sbin/fping
[root@zabbix01 ~]# chmod u+s /usr/local/sbin/fping
 
然後切換到zabbix用戶下進行測試
[root@zabbix01 ~]# su - zabbix
[zabbix@zabbix01 ~]$ /usr/local/sbin/fping -s oa-mob01.kevin.cn
oa-mob01.kevin.cn is alive
 
       1 targets
       1 alive
       0 unreachable
       0 unknown addresses
 
       0 timeouts (waiting for response)
       1 ICMP Echos sent
       1 ICMP Echo Replies received
       0 other ICMP received
 
 0.58 ms (min round trip time)
 0.58 ms (avg round trip time)
 0.58 ms (max round trip time)
        0.001 sec (elapsed real time
 
如果返回 XX.XX.XX.XX is alive,那說明是OK的了!

10)問題說明:在一臺zabbix被監控服務器上(64位centos6.8系統,64G內容)啓動zabbix_agent,發現進程無法啓動,10050端口沒有起來!

啓動zabbix_agent進程沒有報錯,但10050端口沒有正常啓動起來。
[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl ~]# ps -ef|grep zabbix_agent
root 27506 27360 0 11:07 pts/5 00:00:00 grep --color zabbix
[root@ctl etc]# lsof -i:10050

查看/usr/local/zabbix/logs/zabbix_agentd.log日誌,發現報錯如下:
................
27667:20161027:111554.851 cannot allocate shared memory of size 657056: [28] No space left on device
27667:20161027:111554.851 cannot allocate shared memory for collector
..............

原因分析:
這是因爲內核對share memory的限制造成的。

處理過程記錄:
[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 1940588
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

從上面命令結果可以看到:
max total shared memory設置的是2M,max seg size設置的是8M,這顯然不夠allocate(分配)zabbix_agent啓動所使用的內存。

查看目前的共享內存設置,
[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 1987162112
kernel.shmall = 2097152
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0

其中kernel.shmall代表總共能分配的共享內存,這裏是2G,kernel.shmax代表單個段能allocate的內存(以字節爲單位),這裏是2M,所以肯定有問題!

然後查看/etc/sysctl.conf
[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmall = 2097152
kernel.shmmax = 1987162112

顯然在sysctl.conf文件裏設置的kernel.shamll和kernel.shmmax參數的值小了。

--------------------------------------------------------------------------------------------------------------------------------------------------
本機是64位的centos 6.8系統,64G內存,查看其它同系統的被監控服務器發現:
[root@bastion-IDC ~]# cat /etc/sysctl.conf 
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296

[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

即64位的centos6系統(64G)的上面兩個參數的默認值是64G和4G,設置的都是系統能識別的最大內存。
---------------------------------------------------------------------------------------------------------------------------------------------------

現在只需要在本機調大這兩個參數值即可解決問題!
[root@ctl logs]# cat /etc/sysctl.conf
........
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmnb = 65536 
kernel.msgmax = 65536

執行sysctl -p生效
[root@ctl logs]# sysctl -p

再次查看發現已經修改成功了!
[root@ctl logs]# sysctl -a|grep shm
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.shmmni = 4096
kernel.shm_rmid_forced = 0
vm.hugetlb_shm_group = 0
[root@ctl logs]# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 67108864
max total shared memory (kbytes) = 17179869184
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 100
semaphore max value = 32767

------ Messages: Limits --------
max queues system wide = 32768
max size of message (bytes) = 65536
default max size of queue (bytes) = 65536

最後重新啓動zabbix,發現10050端口順利啓動了:
[root@ctl ~]# /usr/local/zabbix/sbin/zabbix_agentd
[root@ctl logs]# ps -ef|grep zabbix
zabbix 27776 1 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd
zabbix 27777 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix 27778 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix 27779 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix 27780 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix 27781 27776 0 11:22 ? 00:00:00 /usr/local/zabbix/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 28188 27360 0 11:48 pts/5 00:00:00 grep --color zabbix
[root@ctl logs]# lsof -i:10050
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
zabbix_ag 27776 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27777 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27778 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27779 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27780 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
zabbix_ag 27781 zabbix 4u IPv4 112357384 0t0 TCP *:zabbix-agent (LISTEN)
[root@ctl logs]#

總結:
其實不止是zabbix程序啓動會碰到這個問題,很多程序出現此錯誤也能使用該方法解決,就是因爲內核對資源的限制問題。

11)zabbix啓動的時候報錯:/usr/local/zabbix/sbin/zabbix_agentd: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory

[root@test ~]# /usr/local/zabbix/sbin/zabbix_agentd -c /usr/local/zabbix/etc/zabbix_agentd.conf
/usr/local/zabbix/sbin/zabbix_agentd: error while loading shared libraries: libpcre.so.0: cannot open shared object file: No such file or directory
 
解決辦法:
[root@test ~]# find / -name libpcre.so*
/usr/lib64/libpcre.so.1.2.0
/usr/lib64/libpcre.so.1
/usr/lib/libpcre.so.1.2.0
/usr/lib/libpcre.so.1
[root@test ~]# ln -s /usr/lib64/libpcre.so.1 /usr/lib64/libpcre.so.0
[root@test ~]# cat /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/lib64/libpcre.so.0
[root@test ~]# ldconfig
[root@test ~]# /usr/local/zabbix/sbin/zabbix_agentd -c /usr/local/zabbix/etc/zabbix_agentd.conf
[root@test ~]# lsof -i:10050
COMMAND     PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
zabbix_ag 21405 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 21406 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 21407 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 21408 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 21409 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 21410 zabbix    4u  IPv4  83991      0t0  TCP *:zabbix-agent (LISTEN)

 

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章