K8S Pod該如何監控

原創

2020-06-28 07:34

背景

由於業務容器化的推進，對K8S上運行的業務，也必須做到向之前物理機/虛擬機上一樣，要有完備的監控保障。但是，畢竟K8S不是物理機/虛擬機，由於技術實現方式不同，監控方面也是有一定差別的。
Pod是K8S上調度的最小單元，本文就K8S Pod該如何監控進行說明。

CPU

在使用物理機/虛擬機時，對於CPU的監控通常關注的是CPU使用率、CPU負載等。在K8S場景下，關注的指標有所不同：CPU使用率、CPU受限（CPU Throttled）時間或佔比等。

CPU使用率

K8S場景下，CPU使用率是相對CPU核的使用時間來衡量的。比如，一個應用申請的是1個CPU core，但實際運行時只消耗了0.5，這時的CPU使用率可以計算爲50%。

CPU使用時間相關指標如下：

指標	類型	說明
container_cpu_usage_seconds_total	counter	Cumulative cpu time consumed in seconds(CPU消費累計時間)
container_cpu_system_seconds_total	counter	Cumulative system cpu time consumed in seconds(系統CPU消費的累計時間)
container_cpu_user_seconds_total	counter	Cumulative user cpu time consumed in seconds(用戶CPU消費的累計時間)

需要注意的是：

上面3個指標是對CPU不同的消費時間的統計，並不是CPU使用率。如果計算使用率，需要除以應用申請的CPU配額
由於指標類型爲counter，在計算使用時先做rate或irate

CPU配額可以通過container_spec_cpu_quota除以container_spec_cpu_period來得到，或用CPU Limit值kube_pod_container_resource_limits_cpu_cores也可以

CPU使用率監控圖類似如下：

CPU受限

CPU受限這個說法在物理機/虛擬機場景是沒有的，由於K8S使用絕對公平調度（Completely Fair Scheduler，簡稱CFS），通過配置cgroup帶寬控制（bandwidth control）來限制Pod的CPU資源。帶寬控制組定義了一個週期（cfs_period_us），通常100000微秒（即1/10秒）。還定義了一個配額（cfs_quota_us），表示允許進程在設置的週期長度內所能使用的CPU時間數，兩個文件配合起來設置CPU的使用上限。兩個文件的單位都是微秒（us），cfs_period_us的取值範圍爲 1 毫秒（ms）到 1 秒（s），cfs_quota_us 的取值大於 1ms 即可，如果 cfs_quota_us 的值爲 -1（默認值），表示不受 CPU 時間的限制。我們在編寫K8S資源的yaml文件時，如果將Pod的CPU Limits設置爲100m，表示可使用100/1000個CPU核心，即100000微秒的CPU時間週期中的10000。當容器使用CPU資源達到申請的配額時，CPU使用時間會被限制。

CPU限制相關的指標如下：

指標	類型	說明
container_cpu_cfs_periods_total	counter	Number of elapsed enforcement period intervals(使用的CPU時間週期數)
container_cpu_cfs_throttled_periods_total	counter	Number of throttled period intervals(被限制CPU時間週期數)
container_cpu_cfs_throttled_seconds_total	counter	Total time duration the container has been throttled(記錄CPU被限制的時間)

說明：CPU資源的限制與內存不同。當容器使用的內存超過限制配額後，會被系統加到OOM-Killing候選中。當容器使用CPU資源到達申請配額時，容器不會被系統驅逐或怎麼樣，只是限制CPU使用。

內存

內存監控相關指標如下：

指標	類型	說明
container_memory_cache	gauge	Number of bytes of page cache memory(頁緩存的內存大小)
container_memory_max_usage_bytes	gauge	Maximum memory usage recorded in bytes(最大內存使用記錄)
container_memory_rss	gauge	Size of RSS in bytes(常駐內存大小)
container_memory_swap	gauge	Container swap usage in bytes(swap使用量)
container_memory_usage_bytes	gauge	Current memory usage in bytes, including all memory regardless of when it was accessed(當前內存使用，包括所有的緩存)
container_memory_working_set_bytes	gauge	Current working set in bytes(當前使用的內存，不包括長期沒有訪問的緩存)
container_memory_failcnt	counter	Number of memory usage hits limits(達到使用上限的次數)
container_memory_failures_total	counter	Cumulative count of memory allocation failures(內存分配失敗次數)
container_memory_mapped_file	gauge	Size of memory mapped files in bytes(內存映射文件的大小)

與物理機/虛擬機類似，通常關注的指標是如下幾個：

當前使用內存：container_memory_usage_bytes或container_memory_working_set_bytes
- container_memory_usage_bytes包含了很久沒用的緩存，該值比container_memory_working_set_bytes要大
常駐內存：container_memory_rss
緩存：container_memory_cache
swap：container_memory_swap

監控圖類似如下：

磁盤

對於磁盤，容器層面的磁盤監控指標關注的要相對少一些。如果不使用持久卷，通常不需要關心磁盤可用空間。畢竟，宿主機的磁盤IO監控還是會做的。

指標	類型	說明
container_fs_usage_bytes	gauge	Number of bytes that are consumed by the container on this filesystem(容器磁盤空間使用)
container_fs_writes_bytes_total	counter	Cumulative count of bytes written(磁盤寫入速度)
container_fs_reads_bytes_total	counter	Cumulative count of bytes read(磁盤讀取速度)

監控圖類似如下：

網絡

與物理機/虛擬機場景類似，容器的網絡監控主要關注入向/出向的網絡流量、packet數、drop率等。

指標	類型	說明
container_network_receive_bytes_total	counter	Cumulative count of bytes received(入向流量大小，單位字節)
container_network_receive_packets_dropped_total	counter	Cumulative count of packets dropped while receiving(入向dropped包數)
container_network_receive_packets_total	counter	Cumulative count of packets received(入向packet數)
container_network_transmit_bytes_total	counter	Cumulative count of bytes transmitted(出向流量大小，單位字節)
container_network_transmit_packets_dropped_total	counter	Cumulative count of packets dropped while transmitting(出向dropped包數)
container_network_transmit_packets_total	counter	Cumulative count of packets transmitted(出向packet數)

這幾個網絡指標理解起來比較簡單，直接使用即可。監控圖類似如下：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

K8S Pod該如何監控

背景

CPU

CPU使用率

CPU受限

內存

磁盤

網絡

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

關於Harbor上鏡像刪除

Prometheus中label名不一致的常用解決方案

關於docker中執行docker命令的實踐

K8S Pod該如何監控

Golang基礎知多少

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結