使用自定義 Grafana 面板監控 Consul

使用 Prometheus和 Grafana監控 Consul，Dashboard 中的基本都是Consul 自身的狀態，除此之外，還需要一些業務相關的監控，比如當前註冊的服務數量，健康和不健康的服務數量，拉取服務請求響應時間等數據

使用已有的 Dashboard

如使用 consul server 這個面板，這個面板數據非常齊全，但是在 Prometheus 中添加了任務之後，發現很多數據都沒有，如集羣中 server的數量 consul_serf_lan_members 這個數據，從 Consul 的 Metrics 中 http://localhost:8500/v1/agent/metrics?format=prometheus拉取也沒有相關的數據，是因爲Consul並沒有提供相應的數據檢測

針對這種問題，可以使用 consul_exporter 這個項目，該項目會通過 Consul 的API 拉取相應的數據，在整理後通過自己的接口提供相應的統計數據

通過 Docker 啓動

docker run --name exporter -d -p 9107:9107 prom/consul-exporter --consul.server=host.docker.internal:8500

檢查數據

curl localhost:9107/metrics

會返回相應的監控數據，這樣就可以將 Consul中未提供的數據添加到 Prometheus中了

自定義監控數據

如果數據仍然不滿足，可以基於consul_exporter 這個項目進行擴展，添加自定義的統計數據；如現在需要統計集羣的響應時間，可以通過統計請求consul的耗時來實現：

添加自定義的統計項

在常量中添加一個新的統計項

    responseTime = prometheus.NewDesc(
        prometheus.BuildFQName(namespace, "", "response_time"),
        "Time spend for a request ",
        []string{"node", "server_ip"}, nil,
    )

實現統計方法

func (e *Exporter) collectResponseTime(ch chan<- prometheus.Metric) bool {
    start := time.Now().Nanosecond()
    serverIp, err := e.client.Status().Leader()
    if err != nil {
        _ = level.Error(e.logger).Log("msg", "Failed to query leader data", "err", err)
        return false
    }
    costTime := time.Now().Nanosecond() - start
    
    ch <- prometheus.MustNewConstMetric(responseTime, prometheus.GaugeValue, float64(costTime), "leader", serverIp)
    
    return true
}

將統計項添加到 Collect 和 Describe中

func (e *Exporter) Describe(ch chan<- *prometheus.Desc) {
    ch <- responseTime
}

func (e *Exporter) Collect(ch chan<- prometheus.Metric) {
    ok = e.collectResponseTime(ch) && ok
}

這樣，就會在啓動後獲取相應的數據，之後在 Prometheus 和 Grafana 中可以看到相應的數據

自定義 Dashboard

自定義的 Dashboard 是通過展示 PromQL 查詢的結果來實現的

如在應用中有錯誤請求的統計，是通過累加錯誤的請求次數實現的，如統計值 consul_response_time

原始數據：

# HELP consul_response_time Time spend for a request
# TYPE consul_response_time gauge
consul_response_time{node="leader",server_ip="172.19.0.2:8300"} 2.238e+06

現在要統計所有的錯誤請求次數，可以在 Prometheus 的查詢面板中查詢：

consul_response_time

這樣，就可以得到相應的錯誤數據，接下來只需要在Grafana中展示就可以

添加看板

添加一個 Dashboard，並添加一個 Panel，在 Panel 的 Metrics 中添加剛纔的查詢語句

執行查詢後，會看到有圖表生成，變量的名稱通過 Legend 字段指定，如這裏是 {instance="host.docker.internal:9107", job="consul-exporter", node="leader", server_ip="172.19.0.2:8300"}，需要顯示IP，即 server_ip 的值，可以設置 Legend 爲 {{server_ip}}，這樣會顯示正確的名稱

其他的顯示單位，顯示效果等以及面板的名稱可以通過旁邊的設置選項進行配置

監控服務信息

可以根據 Consul 和 consul_exporter 對服務狀態進行監控，只需要根據不同的數據進行聚合配置就可以實現

節點信息

sum(consul_health_node_status)

健康節點信息

sum(consul_health_node_status{status="passing"})

不健康節點信息

sum(consul_health_node_status{status!="passing"})

服務信息

count(sum(consul_health_service_status) by (service_name))

實例數量

sum(consul_health_service_status)

健康實例數量

sum(consul_health_service_status{status="passing"})

不健康實例數量

sum(consul_health_service_status{status!="passing"})

響應延時

consul_response_time/1000000

服務狀態

sum(consul_health_service_status{status!="passing"}) by (service_name)

sum(consul_health_service_status) by (service_name)

服務註冊信息

sum(consul_health_service_status)

sum(consul_health_service_status{status="passing"})

sum(consul_health_service_status{status!="passing"})

節點信息

sum(consul_health_node_status)

sum(consul_health_node_status{status="passing"})

sum(consul_health_node_status{status!~"passing"})

最終效果

面板的 JSON文件

根據 Dashboard 的JSON配置文件導入即可快速使用這個 Dashboard

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

使用自定義 Grafana 面板監控 Consul

使用自定義 Grafana 面板監控 Consul

使用已有的 Dashboard

自定義監控數據

自定義 Dashboard

監控服務信息

藍橋15屆stema編程題密碼鎖-動態規劃 C++和Python最後一道題

2021看雪SDC議題回顧 | SaTC：一種全新的物聯網設備漏洞自動化挖掘方法

Kafka存儲機制

aws語音呼叫調用，告警電話

【轉】[C#] WebAPI 防止併發調用二（冥等性）

HTTP URL 詳解

得物 ZooKeeper SLA 也可以 99.99%

創新工具：2024年開發者必備的一款表格控件（二）

車牌識別控制檯可快速整合二次開發

使用自定義 Grafana 面板監控 Consul

Kubernetes 部署 Dashboard

代碼質量工具 Sonarqube 搭建使用

使用 Jib 生成 Java Docker 鏡像

JRebel 安裝激活

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結