CentOS Linux 監控安裝之Nagios

CentOS Linux 監控安裝之Nagios

1、Nagios介紹

Nagios是一款開源的免費網絡監視工具，能有效監控Windows、Linux和Unix的主機狀態，交換機路由器等網絡設置。

Nagios的功能是監控服務和主機，但是他自身並不包括這部分功能，所有的監控、檢測功能都是通過各種插件來完成的。

　　啓動Nagios後，它會週期性的自動調用插件去檢測服務器狀態，同時Nagios會維持一個隊列，所有插件返回來的狀態信息都進入隊列，Nagios每次都從隊首開始讀取信息，並進行處理後，把狀態結果通過web顯示出來。

Nagios提供了許多插件，利用這些插件可以方便的監控很多服務狀態。安裝完成後，在nagios主目錄下的/libexec裏放有nagios自帶的可以使用的所有插件，如，check_disk是檢查磁盤空間的插件，check_load是檢查CPU負載的，等等。每一個插件可以通過運行./check_xxx –h 來查看其使用方法和功能。

在監控遠程主機的狀態比如，磁盤、某個端口的服務，就需要使用到nrpe服務。NRPE 總共由兩部分組成：

（1）check_nrpe 插件，位於監控主機上；

（2）NRPE daemon，運行在遠程的Linux主機上(通常就是被監控機)

Nagios定義了4種監控狀態，代表不同的級別，除了OK代表正常不需要關心外，其他的都是需要關注的。

狀態              代碼                   顏色
正常               OK                    綠色
警告              WARNING                  ***
嚴重              CRITICAL                 紅色
未知錯誤         UNKOWN                  深***

2、部署Nagios監控平臺

安裝前的準備工作：

1）、添加防火牆規則

vim /etc/sysconfig/iptables

-A INPUT -m state --state NEW -m tcp -p tcp--dport 80 -j ACCEPT        #web訪問查看監控
-A INPUT -m state --state NEW -m tcp -p tcp--dport 5666 -j ACCEPT      #nrpe通信端口

保存退出，最後重啓防火牆使配置生效

/etc/init.d/iptables restart

2）、關閉SELinux

vim /etc/selinux/config

#SELINUX=enforcing #註釋掉
#SELINUXTYPE=targeted #註釋掉
SELINUX=disabled #增加

保存退出，重啓系統永久生效，使配置立即生效可以執行如下命令：

setenforce 0

3）、監控環境說明：

類型        操作系統         IP地址             軟件
監控服務端    CentOS 6.7 x86_64   192.168.17.10         Apache、php、nagios、nagios-plugins
監控客戶端    CentOS 6.7 x86_64   192.168.17.20         nagios-plugins、nrpe
監控客戶端    Windows 7        192.168.17.1         NSClient++

局域網內有兩臺主機，一臺Linux、一臺Windows，現在需要配置一臺Nagios監控服務器對這兩臺主機進行監控。

以下是在Nagios監控的服務器（192.168.17.10）上操作：

1）、因爲使用yum安裝，需要用到epel的擴展源

yum install -y epel-release

2）、安裝LAMP環境，使用yum安裝（可不需要mysql，根據實際的環境部署，建議使用源碼安裝）

yum install -y  httpd php php-mysql mysql mysql-servermysql-devel php-gd  libjpeg libjpeg-devellibpng libpng-devel

3）、安裝nagios相關的軟件包（nagios插件、nrpe）

yum install -y nagios nagios-pluginsnagios-plugins-all nrpe nagios-plugins-nrpe

4）、設置用於訪問nagios的訪問控制（使用apache的htpasswd工具）

htpasswd -c /etc/nagios/passwd nagiosadmin   #然後輸入兩次密碼nagiosadmin

5）、重啓服務

service httpd start; service nagios start

6）、使用瀏覽器訪問http://ip/nagios（http://192.168.17.10/nagios）

另外，nagios的默認全局配置文件是 /etc/nagios/nagios.cfg ，在裏面定義了一些模版文件，帶#號表示沒有啓用

cfg_file=/etc/nagios/objects/commands.cfg         #定義命令配置文件
cfg_file=/etc/nagios/objects/contacts.cfg       #定義聯繫人和聯繫人組的配置文件
cfg_file=/etc/nagios/objects/timeperiods.cfg     #定義Nagios 監控時間段的配置文件
cfg_file=/etc/nagios/objects/templates.cfg      #定義主機和服務的一個模板配置文件
 
# Definitions for monitoring the local(Linux) host
cfg_file=/etc/nagios/objects/localhost.cfg            #監控本機的配置文件
 
# Definitions for monitoring a Windowsmachine
#cfg_file=/etc/nagios/objects/windows.cfg         #定義Windows的模版文件
 
# Definitions for monitoring arouter/switch
#cfg_file=/etc/nagios/objects/switch.cfg          #定義交換機的模版文件
 
# Definitions for monitoring a networkprinter
#cfg_file=/etc/nagios/objects/printer.cfg          #定義打印機的模版文件

驗證nagios配置文件是否有誤，可以使用如下命令：

nagios -v /etc/nagios/nagios.cfg

3、配置被監控的主機（監控）

1）、配置Linux客戶端

Linux客戶端上需要安裝nagios等相關插件，同時需要開啓防火牆TCP 5666端口

vim /etc/sysconfig/iptables #編輯防火牆配置

-A INPUT -m state --state NEW -m tcp -p tcp--dport 5666 -j ACCEPT

/etc/init.d/iptables restart #重啓防火牆使配置生效

在linux 客戶端上需要安裝的軟件有nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

（1）、安裝nagios相關組件（192.168.171.20）

yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe

（2）、修改nrpe.cfg配置文件

vim /etc/nagios/nrpe.cfg

找到“allowed_hosts=127.0.0.1 ” 改爲“allowed_hosts=127.0.0.1,192.168.17.10” ##即增加服務器的IP
找到“dont_blame_nrpe=0”      改爲“dont_blame_nrpe=1”

2）、配置Windows客戶端

Windows客戶端需要安裝NSClient++，下載地址是：http://www.nsclient.org/，下載軟件後，直接點擊運行安裝即可。

在選擇安裝類型的時候，可以感覺自己的需要選擇安裝，我這裏選擇的是“典型”安裝，在安裝的過程中會有一個配置，主要有以下幾點：

Allowed hosts:這是運行那些主機，在後面添加監控主機的ip（192.168.17.10）,這個也可以安裝完之後修改配置文件。

password：用於通信的密碼

Modules to load：這裏是選擇要加載的模塊，根據實際選擇，需要就勾選。

安裝完成後，NSClient++會以服務的形式運行，可以使用命令：services.msc 打開服務查看NSClient++是否運行，它監聽的端口是TCP 12489

4、在監控服務器上配置監控的客戶機

1）、設置linux 客戶端

（1）、在監控服務器上配置Linux主機（192.168.17.20）的監控，我們可以直接使用現在系統上有的模版修改，把配置文件存放到/etc/nagios/conf.d/目錄，配置文件的名字可以使用主機類型+IP地址命名，比如linux192.168.17.20.cfg

修改如下：

vim/etc/nagios/conf.d/linux192.168.17.10.cfg

# Define a host for the 192.168.17.20machine
 
define host{
       use                    linux-server          
       host_name              192.168.17.20                  
       alias                   17.20
       address                192.168.17.20
       }
 
# Define a service to "ping" thelocal machine
 
define service{
       use                     local-service       
       host_name                192.168.17.20
       service_description        PING
        check_command           check_ping!100.0,20%!500.0,60%
        max_check_attempts          5       #檢查5次才報警
      normal_check_interval        1         #重新檢查時間，默認3分鐘
       }
 
 
# Define a service to check the disk spaceof the root partition
# on the local machine.  Warning if < 20% free, critical if
# < 10% free space on partition.
 
define service{
       use               local-service        
       host_name            192.168.17.20
       service_description     Root Partition
         check_command           check_local_disk!20%!10%!/
         max_check_attempts             5
         normal_check_interval           1
        }
 
 
 
# Define a service to check the number ofcurrently logged in
# users on the local machine.  Warning if > 20 users, critical
# if > 50 users.
 
define service{
       use                 local-service       
       host_name             192.168.17.20
       service_description      Current Users
         check_command         check_local_users!20!50
         max_check_attempts               5
         normal_check_interval             1
       }
 
 
# Define a service to check the number ofcurrently running procs
# on the local machine.  Warning if > 250 processes, critical if
# > 400 users.
 
define service{
       use              local-service       
       host_name           192.168.17.20
       service_description    Total Processes
       check_command          check_local_procs!250!400!RSZDT
       max_check_attempts               5
        normal_check_interval             1
       }
 
 
 
# Define a service to check the load on thelocal machine.
 
define service{
       use                   local-service       
       host_name            192.168.17.20
       service_description       Current Load
    check_command             check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
        max_check_attempts               5
        normal_check_interval             1
       }
 
 
 
# Define a service to check the swap usagethe local machine.
# Critical if less than 10% of swap isfree, warning if less than 20% is free
 
define service{
       use                   local-service        
       host_name             192.168.17.20
       service_description        Swap Usage
       check_command              check_local_swap!20!10
       max_check_attempts              5
      normal_check_interval             1
       }
 
 
 
# Define a service to check SSH on thelocal machine.
# Disable notifications for this service bydefault, as not all users may have SSH enabled.
 
define service{
       use                     local-service       
       host_name                 192.168.17.20
       service_description           SSH
        check_command                check_ssh
       notifications_enabled             0
        max_check_attempts               5
        normal_check_interval             1
       }
 
 
 
# Define a service to check HTTP on thelocal machine.
# Disable notifications for this service bydefault, as not all users may have HTTP enabled.
 
define service{
       use              local-service       
       host_name           192.168.17.20
       service_description        HTTP
        check_command              check_http
       notifications_enabled          0
      max_check_attempts               5
      normal_check_interval             1
       }

在這定義的服務中，需要使用到nrpe檢測客戶機的狀態的有檢測磁盤（check_local_disk）、負載（check_local_load）等，需要在客戶機上的配置文件（/etc/nagios/nrpe.cfg）上有定義這樣的命令，如果沒有，則需要自行編寫。

（2）、自定義監控項目

在nagios中默認的模版是沒有監控內存的，需要自行定義，以下就使用自定的方式通過NRPE來監控遠程服務器上的內存使用率。

a、監控的客戶機下操作

下載監控內存的腳本

cd /usr/lib64/nagios/plugins/           #請根據系統的版本進入響應的目錄
wget   #下載腳本
mv check_mem.pl check_mem
chmod +x check_mem

可以使用如下命令測試腳本是否可用

./check_mem -f -w 30 -c 20   #可用內存爲30%就警告，20%就嚴重警告

b、在監控主機上操作

vim /etc/nagios/objects/commands.cfg #編輯nagios命令配置文件，在後面增加檢查內存的命令

define command{
    command_name    check_nrpe
   command_line   /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
       }
 
## 另外這種寫法也可以：
define command{
      command_name    check_nrpe
      command_line    $USER1$/check_nrpe-H $HOSTADDRESS$ -c $ARG1$
      }

然後在繼續編輯之前監控的那臺linux主機的配置文件：

vim/etc/nagios/conf.d/linux192.168.17.10.cfg #編輯配置文件，增加服務

define service{
       use                   local-service        
       host_name               192.168.17.20
       service_description        Check RAM
       check_command           check_nrpe!check_mem
       notifications_enabled       0
       max_check_attempts         5
       normal_check_interval       1
       }

重啓nagios服務

/etc/init.d/nagios restart

c、在監控客戶機上操作

vim /etc/nagios/nrpe.cfg #增加check_men的命令

command[check_men]=/usr/lib64/nagios/plugins/check_mem–f -w 20 -c 10

重啓nrpe服務

/etc/init.d/nrpe restart

在監控主機上也可以使用命令檢查check_nrpe是否可以返回檢查內存的狀態：

/usr/lib64/nagios/plugins/check_nrpe -H192.168.17.20 -c check_mem

此時，在監控的控制檯上可以看到剛剛配置的監控主機和服務。

2）、設置Windows客戶端

在監控服務器上配置Windows主機（192.168.17.1）的監控，同樣也是直接使用現在系統上有的模版修改（windows模版），把配置文件存放到/etc/nagios/conf.d/目錄，配置文件的名字使用主機類型+IP地址命名，比如windows192.168.17.1.cfg，同時需要在/etc/nagios/nagios.cfg把windows.cfg的配置打開。

找到
“#cfg_file=/etc/nagios/objects/windows.cfg”
修改爲：
cfg_file=/etc/nagios/objects/windows.cfg

vim/etc/nagios/conf.d/windows192.168.17.1.cfg

define host{
         use            windows-server      
         host_name       192.168.17.1  
         alias          My Windows Server        
         address             192.168.17.1  
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description NSClient++ Version
         check_command               check_nt!CLIENTVERSION
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description Uptime
         check_command               check_nt!UPTIME
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description CPU Load
         check_command               check_nt!CPULOAD!-l 5,80,90
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1  
         service_description Memory Usage
         check_command               check_nt!MEMUSE!-w 80 -c 90
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description C:\ Drive Space
         check_command              check_nt!USEDDISKSPACE!-l c -w 80-c 90
         }
 
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description W3SVC
         check_command          check_nt!SERVICESTATE!-d SHOWALL-l W3SVC
         }
 
define service{
         use                     generic-service
         host_name                192.168.17.1
         service_description Explorer
         check_command           check_nt!PROCSTATE!-d SHOWALL -lExplorer.exe
         }

在這個模版裏面，主要是修改host_name，address。

同時需要在/etc/nagios/objects/commands.cfg 配置文件裏面修改配置。

# 找到：
define command{
       command_name    check_nt
       command_line    $USER1$/check_nt-H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
       }
 
# 修改爲：
define command{
       command_name    check_nt
       command_line    $USER1$/check_nt-H $HOSTADDRESS$ -p 12489 -s frAQBc8Wsa1xVPfv -v $ARG1$ $ARG2$
       }

也就是增加：-s password，增加密碼認證，這個密碼是可以在客戶端上修改的。

保存配置文件後，重啓nagios服務。

/etc/init.d/nagios restart

此時在nagios控制檯上會有剛剛添加的Windows監控客戶端。

5、配置郵件告警功能

nagios是可以通過設定一個閥值發出警告的，可以使用這個功能達到發送郵件或者短信給管理員。

1）、測試本機是否安裝sendmail服務，如果沒有請安裝，

yum install -y sendmail
/etc/init.d/sendmail start        #啓動sendmail服務

2）、測試發送郵件，格式：mail –s “主題” 郵箱地址

echo "from balich nagios server"| mail -s "from balich" [email protected]

3）、配置告警

編輯聯繫方式的配置文件，在後面增加配置

vim /etc/nagios/objects/contacts.cfg

define contact{
       contact_name               balich         #聯繫名
       use                   generic-contact       
       alias                balich Admin          
       email               [email protected]       #郵箱地址
       }
                  
define contactgroup{
       contactgroup_name       balichs
       alias               balichAdministrators
       members             balich
       }

然後在編輯需要報警的主機的配置文件，比如：linux192.168.17.20.cfg 這臺主機，需要對某項的服務需要開啓報警。

define service{
       use                     local-service      
       host_name                balich-ha2
       service_description         HTTP
       check_command             check_http
       notifications_enabled         1    #是否開啓提醒功能，1：提醒；0：禁用
notification_interval          5
        max_check_attempts           5
       normal_check_interval         1
       contact_groups            balichs      #定義提醒聯繫人組
       notification_period        24x7     #定義提醒時間
       notification_ options       w,u,c,r   #d定義發送的告警的狀態
       }

notifications_enabled : 是否開啓提醒功能。1爲開啓，0爲禁用。一般，這個選項會在主配置文件（nagios.cfg）中定義，效果相同。

contact_groups: 定義接收提醒的聯繫人組

notification_interval:重複發送提醒信息的最短間隔時間。默認間隔時間是60分鐘。如果這個值設置爲0，將不會發送重複提醒。

notification_period: 發送提醒的時間段。非常重要的主機（服務）我定義爲7×24，一般的主機（服務）就定義爲上班時間。如果不在定義的時間段內，無論什麼問題發生，都不會發送提醒。

notification_options: 這個參數定義了發送提醒包括的情況：d = 狀態爲DOWN, u = 狀態爲UNREACHABLE , r = 狀態恢復爲OK , f = flapping。，n=不發送提醒。

這裏只是定義了web的服務，可以根據需要設置。

重啓nagios服務，把web服務關閉，測試提醒功能。

/etc/init.d/nagios restart

然後測試，郵件報警功能是否可用。

郵件的報警內容：
***** Nagios *****
 
Notification Type: PROBLEM
 
Service: check_http
Host: 17.20
Address: 192.168.17.20
State: CRITICAL
 
Date/Time: Wed Oct 14 12:17:10 CST 2015
 
Additional Info:
 
connect to address 192.168.17.20 and port80: 拒絕連接

至此，nagios監控就安裝完成了。

CentOS Linux 監控安裝之Nagios

【簡寫Mybatis-02】註冊機的實現以及SqlSession處理

手繪二維碼

.NET藉助虛擬網卡實現一個簡單異地組網工具

Linux啓動流程圖解

mysql常用操作

Spark RDD

Spark RDD

Zabbix監控Oracle表空間

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結