Grafana+Prometheus+Exporter +cAdvisor監控服務器和docker運行狀態（一）

1. 摘要

本文主要介紹如何使用 node_exporter 採集 Linux 系統的信息，cAdvisor採集docker的信息，藉助 Prometheus 最終以儀表盤的形式顯示在 Grafana 中。

2. 效果展示

2.1 服務器

2.2 docker容器

3. 介紹

Grafana、Prometheus、Exporter 、cAdvisor 這四個組件的背景資料我就不介紹了，搜一下就會有很多。這裏主要說一下他們四者之間的關係。

3.1 前置知識

在編寫應用程序的時候，通常會記錄 Log 以便事後分析，在很多情況下是產生了問題之後，再去查看 Log ，是一種事後的靜態分析。在很多時候，我們可能需要了解整個系統在當前，或者某一時刻運行的情況，比如當前系統中對外提供了多少次服務，這些服務的響應時間是多少，隨時間變化的情況是什麼樣的，系統出錯的頻率是多少。這些動態的準實時信息對於監控整個系統的運行健康狀況來說很重要。

於是就產生了 metrics 這種數據，它長這樣 https://monitor.lucien.ink/metrics 。

3.2 關係
Exporter 的主要任務是提供 metrics 信息。
而 metrics 大多數人是看不懂的，所以 Prometheus 爲這種格式的信息提供了 Prometheus Query Language (PromQL) ，可以進行一些類似數據庫那樣的聯合查詢、過濾等操作，這樣一來就能提煉出我們想要的東西，類似於內存佔用、負載等。大致的流程就是：從遠端（可以有多個）採集 metrics 信息到本地通過各種 QL 提煉信息。
雖然 PromQL 非常的強大，但是對於大部分人來說是有很高的學習成本的，所以 Grafana 就將各種 PromQL 封裝起來，並將 PromQL 的結果以圖表的形式展示出來。
大概就是生產加工二次加工這樣一種流程。

當然了，Prometheus 和 Grafana 的功能遠不止如此，更強大的是報警功能，但這不是本文的主題。

3.3 Exporter
值得一提的是，Exporter 組件是一類組件，它們的主要作用就是提供 metrics 信息以供加工提煉。

有的組件會自行提供 metrics 信息，比如 Grafana、Prometheus、Etcd 等等，在本文的中給出的 metrics 就是 Grafana 本身產生的。

有的組件不會提供 metrics 信息，比如說我們自己寫的一些程序。

而有的甚至不是組件，比如 Linux 系統本身。

3.4 CAdvisor
CAdvisor是谷歌開發的用於分析運行中容器的資源佔用和性能指標的開源工具。CAdvisor是一個運行時的守護進程，負責收集、聚合、處理和輸出運行中容器的信息。

4. 部署

本文采用的安裝方式皆爲二進制 + systemd 託管的安裝方式，因爲 OpenVZ 等架構的 VPS 不能運行 docker，所以選擇更普適一些的方法。

4.1 下載

node_exporter： https://github.com/prometheus/node_exporter/releases
Prometheus：https://github.com/prometheus/prometheus/releases
Grafana（選擇 Standalone Linux Binaries 版本）：https://grafana.com/grafana/download

4.2 解壓、安裝
新建一個空文件夾，並將下載的 tar.gz 移動至這個空文件夾中。

請保證以下目錄結構：

dir
├── grafana-x.x.x.linux-amd64.tar.gz
├── node_exporter-x.x.x.linux-amd64.tar.gz
└── prometheus-x.x.x.linux-amd64.tar.gz

然後在文件夾中執行：

curl api.pasteme.cn/8413 | bash

可以在 https://pasteme.cn/8413 中查看命令詳情。

至此，所有安裝已經完成了，三個組件對應的 systemd 服務名稱分別是：grafana-server、prometheus、node_exporter。

4.3 驗證

4.3.1 systemctl status xxx
可以用 systemctl status 命令來查看各個組件的運行狀態。

systemctl status node_exporter
systemctl status prometheus
systemctl status grafana-server

4.3.2 查看 metrics
node_exporter、Prometheus、Grafana 的默認端口分別是 9100、9090、3000 ，我們可以通過以下命令來查看 metrics 信息，有輸出就代表正在運行。

curl localhost:9100/metrics
curl localhost:9090/metrics
curl localhost:3000/metrics

4.4 開機自啓
這是 systemd 老生常談的一個話題了。

systemctl enable node_exporter
systemctl enable prometheus
systemctl enable grafana-server

4.5 卸載

curl api.pasteme.cn/8414 | bash

可以在 https://pasteme.cn/8414 中查看命令詳情。

5. 配置

雖然我們已經完成了三個組件的安裝，但此時它們都還是互相獨立的三個組件，我們需要對其進行一些配置。

5.1 prometheus
編輯 /usr/local/prometheus/prometheus.yml

我們會看到如下內容：

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090'] ############### 我們需要修改這裏

將 targets 所在的那一行修改爲以下內容，注意空格縮進，yaml 的格式檢查很嚴格。

  - targets: ['localhost:9100']

這個修改會讓 Prometheus 從 localhost:9100/metrics 進行 metrics 信息的讀取，默認的 9090 是 Prometheus 本身的 metrics 信息。

保存修改過的文件之後重啓一下 prometheus 服務即可。

systemctl restart prometheus

可以用本文提到的方法驗證是否啓動成功，如果沒有的話請檢查 yml 文件的格式。

5.2 Grafana

cd /usr/local/grafana/bin
chmod +x grafana-cli
./grafana-cli plugins install grafana-piechart-panel
systemctl restart grafana-server

注：此餅圖默認安裝在如下目錄，需要cp到grafana目錄纔可使用。

[root@iZuf6ioqjurm6w0x1o7exjZ bin]# ll /var/lib/grafana/plugins/
total 4
drwxr-xr-x 6 root root 4096 Oct 30 17:32 grafana-piechart-panel
[root@iZuf6ioqjurm6w0x1o7exjZ plugins]# cp -r /var/lib/grafana/plugins/grafana-piechart-panel /usr/local/grafana/data/plugins

這裏是爲了安裝一個餅圖的插件。

然後訪問 http://<YOUR_IP>:3000 ，默認的賬號密碼都是 admin。

點擊 Add data source。

選擇 Prometheus。

Http URL 中填入 http://localhost:9090 ，也就是 prometheus 提供的接口。

然後點擊 Save & Test。

然後把鼠標挪到左上角的 + 上，注意是挪上去，然後在彈出的菜單中點擊 Import。

然後我們在這裏可以引入各種大神爲各種 Exporter 寫好的 Dashboard ，可以去 https://grafana.com/dashboards 自行搜尋，在這裏我們用一名國人爲 node_exporter 寫的 Dashboard ，對應的主頁爲 https://grafana.com/dashboards/8919 。

我們在 Grafana.com Dashboard 一欄中填入 8919 ，然後點擊一下旁邊的空白處。

點擊空白處之後會自動導入對應的 Dashboard ，此時會讓你設置數據來源，在 Options prometheus_111 這裏選擇我們剛纔添加的 Prometheus ，然後點擊 Import 就可以了

5.2.3 配置完成
至此，我們就成功地將 Grafana、Prometheus、node_exporter 關聯起來了。

6. 監控多個節點

在完成了本文的、部分之後，僅僅是完成了監控本機的過程，如果要監控其它的節點，需在被監控的節點上安裝相應的 Exporter，下面以本文中提到的 node_exporter 爲例，介紹如何添加節點。

6.1 部署

6.1.1 下載 Exporter

node_exporter：https://github.com/prometheus/node_exporter/releases

6.1.2 解壓、安裝
新建一個空文件夾，並將下載的 tar.gz 移動至這個空文件夾中。

請保證以下目錄結構：

dir
└── node_exporter-x.x.x.linux-amd64.tar.gz

然後在文件夾中執行：

curl api.pasteme.cn/8416 | bash

可以在 https://pasteme.cn/8416 中查看命令詳情。

至此，node_exporter 安裝已經完成了，對應的 systemd 服務名稱分別是 node_exporter。

6.1.3 驗證
參考本文，不再贅述。

6.1.4 開機自啓

systemctl enable node_exporter

6.1.5 卸載

systemctl disable node_exporter
systemctl stop node_exporter
rm -f /lib/systemd/system/node_exporter.service
rm -rf /usr/local/node_exporter

6.2 配置 Prometheus
在監控節點上編輯 Prometheus 的配置文件 /usr/local/prometheus/prometheus.yml。

將 targets 所在的那一行修改爲以下內容，注意空格縮進，yaml 的格式檢查很嚴格。

 - targets: ['localhost:9100', 'addr:9100']

其中 addr 是被監控節點的 IP 或域名。

然後重啓 Prometheus，在 Grafana 的 Dashboard 中就可以看到新的節點了。

systemctl restart prometheus

6.2.1 關於 targets 的說明
可以觀察到，targets 傳入的是一個數組，Prometheus 會收集數組中的每個元素的 metrics ，然後 Grafana 再處理這些數據。

7. 監控docker

7.1 cadvisor配置

在跑有docker容器的服務器上，安裝cadvisor，以便於採集本機器的docker監控信息

docker pull google/cadvisor:v0.30.0
#docker安裝cadvisor
docker run \
  --privileged=true \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=18081:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:v0.30.0

7.2 可能會碰到一個小問題，cadvisor容器稍後就會停止，查看日誌

docker logs cadvisor
#日誌內容如下
I0119 06:48:46.106313 1 manager.go:231] Version: {KernelVersion:3.10.0-514.2.2.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:17.05.0-ce DockerAPIVersion:1.29 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
I0119 06:48:46.188502 1 factory.go:356] Registering Docker factory
I0119 06:48:48.189502 1 factory.go:54] Registering systemd factory
I0119 06:48:48.190978 1 factory.go:86] Registering Raw factory
I0119 06:48:48.192401 1 manager.go:1178] Started watching for new ooms in manager
W0119 06:48:48.192473 1 manager.go:313] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
I0119 06:48:48.200747 1 manager.go:329] Starting recovery of all containers
I0119 06:48:48.410494 1 manager.go:334] Recovery completed
F0119 06:48:48.461768 1 cadvisor.go:156] Failed to start container manager: inotify_add_watch /sys/fs/cgroup/cpuacct,cpu: no such file or directory

接着google問題產生原因，結果有人發現瞭如下情況：

原來是名字弄反了，接着找解決辦法，如下是找到的解決辦法：

問題解決，準備訪問。

7.3 訪問

通過瀏覽器在本地打開，當然，你可能需要一條防火牆規則，下圖爲cAdvisor的web界面，數據實時刷新但是不能存儲。

查看json格式 http://IP:18091/metrics

7.4 配置 prometheus

修改配置文件prometheus.yml，添加以下內容

scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9100','172.19.14.253:9100']

  - job_name: 'docker'
    static_configs:
    - targets: ['172.19.14.253:18081']

重啓 prometheus服務。

systemctl restart prometheus

7.5 配置 grafana

下載模板，並進行配置 https://grafana.com/grafana/dashboards/193

最後：nginx代理grafana配置

upstream grafana_server{
    server  172.19.14.254:3000;
}

server {
    listen   80;
    server_name grafana.ssssss.cn;
    access_log /etc/nginx/log/grafana.access.log;
    error_log /etc/nginx/log/grafana.error.log;

    proxy_set_header X-Forwarded-For $remote_addr;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        add_header Cache-Control  "no-cache";
    
        proxy_pass http://grafana_server;
        limit_rate 256m;
        client_max_body_size 0;
    }
}

附其他配置文件：

1. prometheus

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9100','172.19.14.253:9100']

  - job_name: 'docker'
    static_configs:
    - targets: ['172.19.14.253:18081']

2. grafana配置

#################################### Server ##############################
[server]
# Protocol (http, https, h2, socket)
protocol = http
# The ip address to bind to, empty will bind to all interfaces
http_addr =
# The http port to use
http_port = 3000
# The public facing domain name used to access grafana from a browser
domain = grafana.sssssss.cn
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
enforce_domain = false
# The full public facing url
root_url = %(protocol)s://%(domain)s:%(http_port)s/
#root_url = grafana.sanshaoxingqiu.cn
# Serve Grafana from subpath specified in `root_url` setting. By default it is set to `false` for compatibility reasons.
serve_from_sub_path = false
# Log web requests
router_logging = true
# the path relative working path
static_root_path = public
# enable gzip
enable_gzip = false
# https certs & key file
cert_file =
cert_key =
# Unix socket path
socket = /tmp/grafana.sock

參考：https://www.cnblogs.com/Dev0ps/p/10546276.html
https://blog.lucien.ink/archives/449/
https://www.linuxea.com/1922.html
https://blog.csdn.net/BJUT_bluecat/article/details/84072966

Grafana+Prometheus+Exporter +cAdvisor監控服務器和docker運行狀態（一）

1. 摘要

2. 效果展示

3. 介紹

4. 部署

5. 配置

6. 監控多個節點

7. 監控docker

jenkins 發佈回滾

Grafana+Prometheus+Process-exporter/node_exporter監控服務進程（三）

supervisord部署和使用

Grafana+Prometheus+Alertmanager+自動發現+刪除Job (四)

crontab使用進程鎖flock解決衝突

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結