nagios

Nagios官網 http://www.nagios.org

1. Nagios安裝 - 服務端（192.168.1.122）

Centos6默認的yum源裏沒有nagios相關的rpm包，但是我們可以安裝一個epel的擴展源：

yum install -y epel-release

然後安裝nagios相關的包

yum install -y httpd nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

設置登錄nagios後臺的用戶和密碼：htpasswd -c /etc/nagios/passwd nagiosadmin

vim /etc/nagios/nagios.cfg

nagios -v /etc/nagios/nagios.cfg 檢測配置文件

啓動服務：service httpd start; service nagios start

瀏覽器訪問： http://ip/nagios

2. Nagios安裝 - 客戶端（192.168.1.123）

在客戶端機器上

yum install -y epel-release

yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

vim /etc/nagios/nrpe.cfg 找到“allowed_hosts=127.0.0.1” 改爲 “allowed_hosts=127.0.0.1,192.168.1.122” 後面的ip爲服務端ip; 找到” dont_blame_nrpe=0” 改爲 “dont_blame_nrpe=1”

啓動客戶端 /etc/init.d/nrpe start

3. 監控中心（192.168.1.122）添加被監控主機（192.168.1.123）

cd /etc/nagios/conf.d/

vim 192.168.1.123.cfg //加入：

define host{

use linux-server

host_name 192.168.1.123

alias 1.123

address 192.168.1.123

}

define service{

use generic-service

host_name 192.168.1.123

service_description check_ping

check_command check_ping!100.0,20%!200.0,50%

max_check_attempts 5

normal_check_interval 1

}

define service{

use generic-service

host_name 192.168.1.123

service_description check_ssh

check_command check_ssh

max_check_attempts 5 ；當nagios檢測到問題時，一共嘗試檢測5次都有問題纔會告警，如果該數值爲1，那麼檢測到問題立即告警

normal_check_interval 1 ；重新檢測的時間間隔，單位是分鐘，默認是3分鐘

notification_interval 60 ；在服務出現異常後，故障一直沒有解決，nagios再次對使用者發出通知的時間。單位是分鐘。如果你認爲，所有的事件只需要一次通知就夠了，可以把這裏的選項設爲0。

}

define service{

use generic-service

host_name 192.168.1.123

service_description check_http

check_command check_http

max_check_attempts 5

normal_check_interval 1

}

以上服務不依賴於客戶端nrpe服務，我們可以想象，我們在自己電腦上可以使用ping或者telnet探測遠程任何一臺機器是否存活、是否開啓某個端口或服務。而當我們想要檢測客戶端上的某個具體服務的情況時，就需要藉助於nrpe了，比如想知道客戶端機器的負責或磁盤使用情況。

4. 繼續添加服務

服務端vim /etc/nagios/objects/commands.cfg

增加：define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

繼續編輯 vim /etc/nagios/conf.d/192.168.1.123.cfg

增加如下內容：define service{

use generic-service

host_name 192.168.1.123

service_description check_load

check_command check_nrpe!check_load

max_check_attempts 5

normal_check_interval 1

}

define service{

use generic-service

host_name 192.168.1.123

service_description check_disk_sda1

check_command check_nrpe!check_sda1

max_check_attempts 5

normal_check_interval 1

}

define service{

use generic-service

host_name 192.168.1.123

service_description check_disk_sda3

check_command check_nrpe!check_sda3

max_check_attempts 5

normal_check_interval 1

}

說明： check_nrpe!check_load ：這裏的check_nrpe就是在commands.cfg剛剛定義的，check_load是遠程主機上的一個檢測腳本

在遠程主機上vim /etc/nagios/nrpe.cfg 搜索check_load，這行就是在服務端上要執行的腳本了，我們可以手動執行這個腳本

把check_hda1更改一下：/dev/hda1 改爲 /dev/sda1

再加一行command[check_sda2]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda2

客戶端上重啓一下nrpe服務: service nrpe restart

服務端也重啓一下nagios服務: service nagios restart

5. 配置告警

vim /etc/nagios/objects/contacts.cfg //增加：define contact{

contact_name 123

use generic-contact

alias ddb

email [email protected]

}

define contact{

contact_name 456

use generic-contact

alias aaa

email [email protected]

}

define contactgroup{

contactgroup_name common

alias common

members 123,456

}

然後在要需要告警的服務裏面加上contactgroup

define service{

use generic-service

host_name 192.168.1.123

service_description check_load

check_command check_nrpe!check_load

max_check_attempts 5

normal_check_interval 1

contact_groups common

notifications_enabled 1 ；

是否開啓提醒功能。1爲開啓，0爲禁用。一般，這個選項會在主配置文件（nagios.cfg）中定義，效果相同。

notification_period 24x7 ；發送提醒的時間段。非常重要的主機（服務）我定義爲7×24，一般的主機（服務）就定義爲上班時間。如果不在定義的時間段內，無論什麼問題發生，都不會發送提醒。

notification_options:w,u,c,r ；這個是service的狀態。w爲waning， u爲unknown, c爲critical, r爲recover(恢復了），類似的還有一個 host對應的狀態：d,u,r d = 狀態爲DOWN, u = 狀態爲UNREACHABLE , r = 狀態恢復爲OK，需要加入到host的定義配置裏。

}

LVS之NAT模式

nginx代理

禁止指定user_agent

puppet

nginx配置詳解

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結