Please enable Javascript to view the contents

如何查看 Tekton 的流水线指标

 ·  ☕ 2 分钟

1. 抓取 Tekton Metrics

  • 新增 ConfigMap 配置文件
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-observability
  namespace: tekton-pipelines
  labels:
    app.kubernetes.io/instance: default
    app.kubernetes.io/part-of: tekton-pipelines
data:
    metrics.backend-destination: prometheus
    metrics.taskrun.level: "task"
    metrics.taskrun.duration-type: "histogram"
    metrics.pipelinerun.level: "pipeline"
    metrics.pipelinerun.duration-type: "histogram"
EOF

修改 data 中的配置,会改变上报指标的粒度,甚至会严重影响 Prometheus 的性能,需要谨慎修改。

  • 重启 Tekton
1
kubectl -n tekton-pipelines rollout restart deployment tekton-pipelines-controller
  • [可选] 将 tekton-pipelines-controller 设置为 NodePort 查看 Metrics
1
kubectl -n tekton-pipelines patch svc tekton-pipelines-controller -p '{"spec": {"type": "NodePort"}}'

此时通过 kubectl -n tekton-pipelines get svc tekton-pipelines-controller 可以使用主机 IP:NodePort 的方式进行访问,查看相关指标。如果采用的是集群外的 Prometheus 进行抓取指标,那么可以直接使用 IP:NodePort。

  • 在集群内部,通过 Helm 部署一个 Prometheus 实例

参考 Prometheus、Grafana 搭建 Kubernetes 监控

1
2
3
4
helm -n monitor list

NAME      	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART            	APP VERSION
prometheus	monitor  	1       	2022-03-17 14:39:38.743741 +0800 CST	deployed	prometheus-15.3.0	2.31.1     
  • 设置 Service 让 Prometheus 自动抓取
1
kubectl -n tekton-pipelines edit svc tekton-pipelines-controller
1
2
3
4
5
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"

prometheus.io/path: /metricsprometheus.io/port: "9090" 是默认值,在注解中可以省略。

  • 在 Prometheus 中查看指标

tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, le="+Inf", namespace=“tekton-pipelines”, node=“node4”, pipeline_tekton_dev_release=“v0.24.1”, service=“tekton-pipelines-controller”, version=“v0.24.1”}

上面是一个简单示例,在指标中,有关于命名空间、流水线相关的标签,可以用于过滤。

2. Tekton 暴露了哪些指标

2.1 tekton_pipelines_controller_pipelinerun_duration_seconds_[bucket, sum, count]

tekton_pipelines_controller_pipelinerun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13

p-caa3ljeb23td2d6v8t7g 这条流水线运行了 13 秒

tekton_pipelines_controller_pipelinerun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 494

p-caa3ljeb23td2d6v8t7g 这条流水线总共运行了 494 秒

tekton_pipelines_controller_pipelinerun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline=“p-c8tetchin6qsrnm7bqog”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“p-caa3ljeb23td2d6v8t7g”, status=“success”, version=“v0.24.1”} 13

表示 p-c8tetchin6qsrnm7bqog 这条流水线运行了 13 次。

2.2 tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_[bucket, sum, count]

tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le="+Inf", namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1

表示 pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg 这个 taskrun 启动耗时 1 秒

tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_sum{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 1461438

表示 pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg 这个 taskrun 总共运行了 1461438 秒

tekton_pipelines_controller_pipelinerun_taskrun_duration_seconds_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 18

表示 pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg 这个 taskrun 总共运行了 18 次

2.3 tekton_pipelines_controller_pipelinerun_count

tekton_pipelines_controller_pipelinerun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 7540

在整个集群上,流水线总共成功执行了 7540 次

2.4 tekton_pipelines_controller_running_pipelineruns_count

tekton_pipelines_controller_running_pipelineruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1

在整个集群上,正在运行 1 条流水线

2.5 tekton_pipelines_controller_taskrun_duration_seconds_[bucket, sum, count]

如果直接使用 taskrun 而不是 pipelinerun 运行任务,才会有这些指标。

2.6 tekton_pipelines_controller_taskrun_count

tekton_pipelines_controller_taskrun_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, status=“success”, version=“v0.24.1”} 43423

在整个集群上,taskrun 成功执行了 43423 次

2.7 tekton_pipelines_controller_running_taskruns_count

tekton_pipelines_controller_running_taskruns_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 1

在整个集群上,正在运行 1 个 taskrun 任务

2.8 tekton_pipelines_controller_taskruns_pod_latency

tekton_pipelines_controller_taskruns_pod_latency{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“asimov”, pipeline_tekton_dev_release=“v0.24.1”, pod=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k-pod-sh4vp”, task=“git-clone”, taskrun=“p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k”, version=“v0.24.1”} 3000000000

p-caa3ljeb23td2d6v8t7g-fetch-main-repo-fr62k taskrun 任务创建 Pod 的启动延时为 3000000000,这里的延时是秒级别,因此单位应该是纳秒,也就是 3 秒。

  • tekton_pipelines_controller_cloudevent_count

tekton_pipelines_controller_cloudevent_count{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, namespace=“account”, pipeline=“pay-c9tn0h6b23t28qjnp5mg”, pipeline_tekton_dev_release=“v0.24.1”, pipelinerun=“pay-c9tno3mb23t28qjnp660”, status=“failed”, task=“approve”, taskrun=“pay-c9tno3mb23t28qjnp660-approve-huawei-pzwzg”, version=“v0.24.1”} 0

Tekton 可以与 CloudEvent 集成,将事件发送到 CloudEvent 进行广播。

2.9 tekton_pipelines_controller_client_latency_[bucket, sum, count]

Tekton 中使用 Client 与 Kubernete Apiserver 交互。

tekton_pipelines_controller_client_latency_bucket{app=“tekton-pipelines-controller”, app_kubernetes_io_component=“controller”, app_kubernetes_io_instance=“default”, app_kubernetes_io_name=“controller”, app_kubernetes_io_part_of=“tekton-pipelines”, app_kubernetes_io_version=“v0.24.1”, instance=“x.x.x.x:9090”, job=“kubernetes-service-endpoints”, kubernetes_name=“tekton-pipelines-controller”, kubernetes_namespace=“tekton-pipelines”, kubernetes_node=“node2”, le=“1”, pipeline_tekton_dev_release=“v0.24.1”, version=“v0.24.1”} 11627

le=“0.1” 10019,在 0.1 秒内,处理了 10019 个请求
le=“1” 11627,在 1 秒内,处理了 11627 个请求
le=“10”,在 10 秒内,处理了 11633 个请求

其他相关的指标还有:

tekton_pipelines_controller_client_latency_sum
tekton_pipelines_controller_client_latency_count

3. 参考


微信公众号
作者
微信公众号