監控之美--prometheus配置文件動態管理

Prometheus是一套開源的監控、報警解決方案，是由SoundCloud公司開發的，從 2012 年開始編寫代碼，再到 2015 年開源以來，該項目有非常活躍的社區和開發人員，目前在全世界最大的男性交友社區上已經有了1.1w多star；2016 年 Prometheus 成爲繼 k8s 後，成爲第二名 CNCF(Cloud Native Computing Foundation) 成員。

Google SRE的書內也曾提到跟他們BorgMon監控系統相似的開源實現是Prometheus，作爲新一代開源解決方案，很多理念與 Google SRE 運維之道不謀而合。作爲新一代的監控解決方案，現在最常見的用法是與Kubernetes容器管理系統進行結合進行監控，但不要誤解爲它僅僅是一個容器的監控，當你深入瞭解他之後，你會發現他能做很多事情。

這裏我想多說一下，之前一直糾結於選擇Prometheus還是Open-falcon。這兩者都是非常棒的新一代監控解決方案，後者是小米公司開源的，目前包括小米、金山雲、美團、京東金融、趕集網等都在使用Open-Falcon，最大區別在於前者採用的是pull的方式獲取數據，後者使用push的方式，暫且不說這兩種方式的優缺點。簡單說下我喜歡Prometheus的原因，大概有5點吧，1、開箱即用，部署運維非常方便 2、prometheus的社區非常活躍 3、自帶服務發現功能 4、簡單的文本存儲格式，進行二次開發非常方便。 5、最重要的一點，他的報警插件我非常喜歡，帶有分組、報警抑制、靜默提醒機制。這裏並沒有貶低open-falcon的意思，還是那句老話適合自己的纔是最好的。

Consul-template自動刷新配置文件

由於Prometheus是“拉”的方式主動監測，所以需要在server端指定被監控節點的列表。當被監控的節點增多之後，每次增加節點都需要更改配置文件，非常麻煩，我這裏用consul-template+consul動態生成配置文件，這種方式同樣適用於其他需要頻繁更改配置文件的服務。另外一種解決方案是etcd+confd，基本現在主流的動態配置系統分這兩大陣營。consul-template的定位和confd差不多，不過它是consul自家推出的模板系統。

實現

先看下Prometheus的配置文件樣例：

- job_name: 'node-exporter'
static_configs:
- targets: ['172.30.100.10:9100']
labels:
hostname: 'web1'
- targets: ['172.30.100.11:9100']
labels:
hostname: 'web2'
- targets: ['172.30.100.12:9100']
labels:
hostname: 'web3'

每次新加監控節點的時候，只需要添加一個新的targets即可，“hostname”是我自定義的一個label標籤，方便區分。那麼這裏就產生一個問題，當targets的數量達到幾百上千之後，配置文件看起來就會特別冗餘。所以有經驗的運維人就會想到用include的方式，把其他的配置文件包含進來，這樣就把一個大而冗餘的主配置文件，切分成一個個小的配置文件。Prometheus這裏用的方法就是基於文件的服務發現--"file_sd_config"。我這裏在prometheus下面新建了一個conf.d的目錄，包含兩個子配置文件，分別監控linux和windows的機器：

file_sd_config參考樣例

子配置文件可以是YAML或JSON格式，我這裏用的JSON格式，示例如下：

cat conf.d/lnode-discovery.json
[
{
"targets": ["172.30.100.2:9100"],
"labels": {
"hostname": "consul02"
}
},
{
"targets": ["172.30.100.1:9100"],
"labels": {
"hostname": "consul01"
}
}
]

結合服務發現實現文件的動態更新

有了子配置文件，新加監控節點的時候只需要更改子配置文件的內容即可。我們可以預先定義一個子配置文件的模板，用consul-template渲染這個模板，實現文件的動態更新。具體方法如下：

1、下載consul-template

在https://releases.hashicorp.com/consul-template/這裏找到你所需要操作系統版本，下載之後並解壓：

# cd /data/consul_template #軟件安裝目錄
# wget -c https://releases.hashicorp.com/consul-template/0.19.3/consul-template_0.19.3_linux_amd64.zip
# unzip consul-template_0.19.2_linux_amd64.zip
# mkdir templates # 創建consul-template的模板文件目錄

consul-template繼承了consul的簡約風格，解壓之後只有一個二進制軟件包。我們創建一個存放模板文件的目錄，方便以後使用。

2、創建consul-template的配置文件

配置文件的格式遵循：HashiCorp Configuration Language。我的配置文件示例如下：

# cat consul-template.conf
log_level = "warn"
syslog {
# This enables syslog logging.
enabled = true
# This is the name of the syslog facility to log to.
facility = "LOCAL5"
}
consul {
# auth {
# enabled = true
# username = "test"
# password = "test"
# }
address = "172.30.100.45:8500"
# token = "abcd1234"
retry {
enabled = true
attempts = 12
backoff = "250ms"
# If max_backoff is set to 10s and backoff is set to 1s, sleep times
# would be: 1s, 2s, 4s, 8s, 10s, 10s, ...
max_backoff = "3m"
}
}
# This block defines the configuration for a template. Unlike other block
# this block may be specified multiple times to configure multiple templates.
template {
# This is the source file on disk to use as the input template. This is often
# called the "Consul Template template". This option is required if not using
# the `contents` option.
# source = "/path/on/disk/to/template.ctmpl"
source = "/data/consul_template/templates/lnode-discovery.ctmpl"
# This is the destination path on disk where the source template will render.
# If the parent directories do not exist, Consul Template will attempt to
# create them.
# destination = "/path/on/disk/where/template/will/render.txt"
destination = "/data/prometheus/prometheus-1.7.1.linux-amd64/conf.d/lnode-discovery.json"
# This is the optional command to run when the template is rendered. The
# command will only run if the resulting template changes. The command must
# return within 30s (configurable), and it must have a successful exit code.
# Consul Template is not a replacement for a process monitor or init system.
command = ""
# This is the maximum amount of time to wait for the optional command to
# return. Default is 30s.
command_timeout = "60s"
# This option backs up the previously rendered template at the destination
# path before writing a new one. It keeps exactly one backup. This option is
# useful for preventing accidental changes to the data without having a
# rollback strategy.
backup = true
# This is the `minimum(:maximum)` to wait before rendering a new template to
# disk and triggering a command, separated by a colon (`:`). If the optional
# maximum value is omitted, it is assumed to be 4x the required minimum value.
# This is a numeric time with a unit suffix ("5s"). There is no default value.
# The wait value for a template takes precedence over any globally-configured
# wait.
left_delimiter = "{$"
right_delimiter = "$}"
wait {
min = "2s"
max = "20s"
}
}
template {
source = "/data/consul_template/templates/wnode-discovery.ctmpl"
destination = "/data/prometheus/prometheus-1.7.1.linux-amd64/conf.d/wnode-discovery.json"
command = ""
backup = true
command_timeout = "60s"
left_delimiter = "{$"
right_delimiter = "$}"
wait {
min = "2s"
max = "20s"
}
}

主要配置參數：

syslog: 啓用syslog，這樣服務日誌可以記錄到syslog裏。

consul: 這裏需要設置consul服務發現的地址，我這裏無需認證，所以把auth註釋了。consul服務的搭建可以參考我之前的文章。值得一提的是，backoff和max_backoff選項，backoff設置時間間隔，當未從consul獲取到數據時會進行重試，並以2的倍數的時間間隔進行。比如設置250ms，重試5次，那麼每次的時間間隔爲：250ms,500ms,1s,2s,4s，直到達到max_backoff的閥值；如果max_backoff設爲2s，那麼第五次重試的時候還是間隔2s，即250ms,500ms,1s,2s,2s。

template：定義模板文件位置。主要選項是source，destination和command，當backup=true的時候，會備份上一次的配置，並以bak後綴結尾。

source：consul-template的模板文件，用來進行渲染的源文件。
destination：consul-template的模板被渲染之後的文件位置。比如這裏即是我prometheus基於文件發現的子配置文件位置:/data/prometheus/prometheus-1.7.1.linux-amd64/conf.d/下的文件。
command:文件渲染成功之後需要執行的命令。prometheus這裏會自動發現文件的更改，所以我這裏無需任何命令，給註釋掉了。像nginx、haproxy之類的服務，一般更改完配置文件之後都需要重啓，這裏可以設置“nginx -s reload”之類的命令。
command_timeout：設置上一步command命令執行的超時時間。
left_delimiter和right_delimiter：模板文件中分隔符。默認是用“{{}}”設置模板，當產生衝突的時候可以更改這裏的設置。比如我這裏由於用ansible去推送的模板文件，“{{}}”符號與Jinja2的語法產生了衝突，所以改爲了“{$$}”符號。

當有多個模板需要渲染的時候，這裏可以寫多個template。

3、服務啓動

啓動consul-template服務，指定配置文件。

#./consul-template -config ./consul-template.conf

4、模板渲染

根據目標文件的格式去渲染consul-template的模板，比如我這裏的prometheus基於文件的服務發現模板如下：

cat templates/lnode-discovery.ctmpl
[
{$ range tree "prometheus/linux" $}
{
"targets": ["{$ .Value $}"],
"labels": {
"hostname": "{$ .Key $}"
}
},
{$ end $}
{
"targets": ["172.30.100.1:9100"],
"labels": {
"hostname": "consul01"
}
}
]

循環讀取consul的K/V存儲prometheus/linux/目錄下的值，"targets"取的是Key，hostname取的是Key的值。

Consul的K/V存儲示例如下，每次錄入一個數據，即是對應prometheus配置文件裏的"hostname:targets"：

consul K/V示例

這裏有一個小技巧：prometheus的配置文件裏，多個targets是用逗號“,”分割的，而最後的那一個targets後面不能帶逗號，所以我在模板文件裏單獨寫了一個targets，這樣就無需關心這一例外情況。

5、數據在線添加實現配置文件的動態更新

現在在打開consul的ui界面，默認是8500端口，在KEY/VALUE的prometheus/linux/目錄下新加一個consul02、consul03...，最後生成的配置文件格式如下：

至此，prometheus基於文件的服務發現，初步完成。

監控之美--prometheus配置文件動態管理

藍橋15屆stema編程題密碼鎖-動態規劃 C++和Python最後一道題

2021看雪SDC議題回顧 | SaTC：一種全新的物聯網設備漏洞自動化挖掘方法

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

HTTP URL 詳解

得物 ZooKeeper SLA 也可以 99.99%

創新工具：2024年開發者必備的一款表格控件（二）

車牌識別控制檯可快速整合二次開發

KAFKA集羣搭建

vim+python打造自己的IDE插件推薦

Kafka性能調優

zookeeper集羣搭建

log日誌輪轉--logrotate

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結