Please enable Javascript to view the contents

如何预估 Kubernetes 集群中监控组件的资源消耗

 ·  ☕ 2 分钟

本文描述的监控指标,仅包含 Kubernetes 基础的指标,不包含业务相关指标,相关组件为 prometheus-server、kube-state-metrics、node-exporter,数据的保存周期为 3 天。

1. 集群中监控相关组件

1
2
3
4
helm -n monitor list

NAME    	NAMESPACE      	REVISION	UPDATED                                	STATUS  	CHART            	APP VERSION
prom-k8s	monitor	        1       	2022-05-12 16:47:53.789549796 +0800 CST	deployed	prometheus-15.0.1	2.32.0     
1
2
3
4
5
6
7
8
kubectl -n monitor get deploy,daemonset

NAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prom-k8s-kube-state-metrics   1/1     1            1           102d
deployment.apps/prom-k8s-prometheus-server    1/1     1            1           102d

NAME                                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prom-k8s-prometheus-node-exporter   20        20        20      20           20          <none>          102d

2. 指标计算方式

  • 集群全部节点数

count(kube_node_created{app_kubernetes_io_instance="prom-k8s"}) by (cluster)

  • 集群版本

sum by (cluster,kubelet_version)(kube_node_info{app_kubernetes_io_instance="prom-k8s"})

  • 运行的 POD 数量

sum(kube_pod_info) by (cluster)

  • 监控组件使用的内存

sum(container_memory_working_set_bytes{image!="", namespace="monitor"}) by (cluster)

  • 监控组件使用的 CPU

sum (rate (container_cpu_usage_seconds_total{namespace="monitor"}[5m])) by (cluster)

3. 统计数据分析

节点数量Kuberntes 版本POD 数量监控占用内存监控占用 CPU
19v1.16.810467.04 GB0.289
22v1.19.1510144.56 GB0.239
28v1.18.209743.23 GB0.187
12v1.16.116002.94 GB0.134
8v1.20.121952.48 GB0.598
9v1.16.96162.11 GB0.116
13v1.16.115581.87 GB0.096
20v1.16.114521.78 GB0.099
14v1.16.113781.57 GB0.064
9v1.19.152841.52 GB0.090
6v1.16.114281.48 GB0.102
12v1.16.83241.36 GB0.052
9v1.16.113821.09 GB0.065
4v1.20.122201.08 GB0.052
12v1.16.115041.04 GB0.060
4v1.20.121561.01 GB0.042
4v1.16.11170937.66 MB0.055
8v1.16.11254914.81 MB0.053
10v1.16.8435907.60 MB0.032
10v1.16.11278852.69 MB0.034
11v1.16.11420839.70 MB0.047
7v1.16.11268815.48 MB0.046
4v1.16.11248804.21 MB0.032
9v1.16.11256784.44 MB0.034
10v1.16.11284776.07 MB0.047
6v1.16.11222745.46 MB0.034
8v1.16.11252709.10 MB0.034
5v1.16.11200678.32 MB0.036
7v1.16.11202642.48 MB0.037
5v1.16.8174640.41 MB0.030
7v1.20.1297624.49 MB0.025
6v1.16.11198590.52 MB0.029
6v1.20.1295578.39 MB0.034
7v1.16.11222569.38 MB0.032
6v1.16.11140560.03 MB0.027
8v1.16.11166557.45 MB0.028
5v1.20.1270494.62 MB0.019
5v1.16.11120455.13 MB0.024
6v1.16.8112449.50 MB0.022
7v1.16.11128448.15 MB0.026
6v1.16.11112441.64 MB0.022
6v1.16.11112437.66 MB0.026
2v1.16.1184326.72 MB0.024
2v1.20.1511208.16 MB0.012

每个集群平均 POD 数量 306 个,平均内存占用 1246 MB,一个 POD 大约占用 40 MB 内存,CPU 消耗基本可以忽略。


微信公众号
作者
微信公众号