监控 – 陈少文的网站

常用的各类资源 Prometheus 告警语句

📅 2022年08月21日 · ☕ 4 分钟

主机主机内存使用率超过阈值 1 - node_memory_MemAvailable_bytes{mode!="idle"} / node_memory_MemTotal_bytes 阈值：0.9 主机 CPU 使用率超过阈值 1 - avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (host_name) 阈值：0.85 主机硬盘使用率超过阈值 1 - avg without (fstype)(node_filesystem_free_bytes{fstype!='rootfs',mountpoint!~'/(run|var|snap).*'} / node_filesystem_size_bytes{fstype!='rootfs',mountpoint!~'/(run|var|snap).*'}) 阈值：0.8 Windows Windows 主机内存使用率超过阈值 1 - 1 * windows_os_physical_memory_free_bytes{job="windows_exporter",mode!="idle"} / windows_cs_physical_memory_bytes 阈值：0.9 Windows 主机 CPU 使用率超过阈值 1 - (avg by (host_ip,host_name) (irate(windows_cpu_time_total{job="windows_exporter",mode="idle"}[1m]))) 阈值：0.85

在 Kubernetes 集群上部署 Elasticsearch 栈

📅 2022年07月06日 · ☕ 3 分钟

如果采用 Logstash 集中接收 Filebeat 的日志输入，容易造成单点瓶颈；如果采用 Kafka 接收 Filebeat 的日志输入，日志的时效性又得不到保障。这里直接将 Filebeat 采集的日志直接输出到 Elasticsearch。 1. 准备工作节点规划这里没有区分 master、data、client 节点，而是

使用 Blackbox Exporter 测试网络连通性

📅 2022年06月25日 · ☕ 3 分钟

如果你需要监控两个主机、主机与外部服务之间的网络状况，那么就可以试一试本文提到的 Blackbox Exporter。 1. 安装 Blackbox 1.1 在主机上部署下载二进制包 1 2 3 4 5 wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.0/blackbox_exporter-0.21.0.linux-amd64.tar.gz tar -xzvf blackbox_exporter-0.21.0.linux-amd64.tar.gz mv blackbox_exporter-0.21.0.linux-amd64/blackbox_exporter /usr/bin/ mkdir /etc/prometheus mv blackbox_exporter-0.21.0.linux-amd64/blackbox.yml /etc/prometheus/ 清理安装包 1 rm -rf blackbox_exporter-0.21.0.linux-amd64* 新建 Systemd 服务 1 vim /usr/lib/systemd/system/blackbox_exporter.service 新增如下内容: [Unit] Description=blackbox_exporter After=network.target [Service] Restart=on-failure ExecStart=/usr/bin/blackbox_exporter \ --config.file=/etc/prometheus/blackbox.yml Restart=on-failure [Install] WantedBy=multi-user.target

如何查看 Tekton 的流水线指标

📅 2022年06月07日 · ☕ 3 分钟

1. 抓取 Tekton Metrics 新增 ConfigMap 配置文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 cat <<EOF | kubectl apply -f - apiVersion: v1 kind: ConfigMap metadata: name: config-observability namespace: tekton-pipelines labels: app.kubernetes.io/instance: default app.kubernetes.io/part-of: tekton-pipelines data: metrics.backend-destination: prometheus metrics.taskrun.level: "task" metrics.taskrun.duration-type: "histogram" metrics.pipelinerun.level: "pipeline" metrics.pipelinerun.duration-type: "histogram" EOF 修改 data 中的配置，会改变上报指标的粒度，甚至会严重影响 Prometheus 的性能，需要谨慎修改。重启 Tekton 1 kubectl -n tekton-pipelines rollout restart deployment tekton-pipelines-controller [可选] 将 tekton-pipelines-controller 设置为 NodePort

如何采集 Kubernetes 对象的 labels 和 annotations

📅 2022年06月02日 · ☕ 2 分钟

1. 为什么需要 kube-status-metrics Kubernetes 的监控主要关注两类指标: 基础性能指标 CPU、内存、磁盘、网络等指标，可以通过 DaemonSet 部署 node-exporter，由 Prometheus 抓取相关指标。资源对象指标 Deployment 的副本数量、Pod 的运行状态等。这些指标需要 kube-status-metrics 轮询 Kubernetes 的 API 查询，并暴露给 Prometheus 才能够看到