Prometheus社区提供了在k8s部署的三种方式:

  1. 部署Prometheus-Operator,自定义部署
  2. 通过kube-Prometheus部署通用监控环境(yaml方式,本文)
  3. 通过kube-prometheus-stack部署(helm方式,简化第二种方式)

后两者都是基于第一种Prometheus-Operator的扩展版,内置了监控规则和模板

本文先使用第二种方式快速部署

开始部署

版本信息

kube-prometheus stack Kubernetes 1.27 Kubernetes 1.28 Kubernetes 1.29 Kubernetes 1.30 Kubernetes 1.31 Kubernetes 1.32 Kubernetes 1.33
release-0.13 x x x x x
release-0.14 x x x x
release-0.15 x x x x
main x x x x

拉取代码

手动下载:下载地址

1
2
git clone https://github.com/prometheus-operator/kube-prometheus.git -b release-0.14
cd kube-prometheus/

部署kube-Prometheus

1
2
kubectl apply --server-side -f manifests/setup
# 如果是1.22以前的版本,使用kubectl create -f manifests/setup

部署Prometheus、alertmanager、grafana、kube-state-metrics、Prometheus-adapter、blackbox-exporter、node-exporter

这是kube-Prometheus基于Prometheus-operator为通用k8s环境提供的一整套监控方案

这整套方案都配置了PrometheusRule告警规则

并且kube-Prometheus为grafana内置了多个dashboard

1
kubectl apply -f manifests/

镜像问题

镜像查询:https://docker.aityp.com/

打tag方式

1
2
3
4
5
6
crictl pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
crictl pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0

# 打tag
ctr -n k8s.io i tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0 registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
ctr -n k8s.io i tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0 registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0

yaml修改镜像地址

kube-prometheus/manifests对应的yaml文件中修改即可


外部访问

这套方案应用了networkPolicy,取消即可

1
kubectl delete -f prometheus-networkPolicy.yaml -f grafana-networkPolicy.yaml -f alertmanager-networkPolicy.yaml

修改service

1
kubectl edit svc -n monitoring prometheus-k8s
1
2
3
type: ClusterIP
# 改为
type: NodePort

持久化存储

Prometheus

修改kube-prometheus/manifests/prometheus-prometheus.yaml文件

1
2
3
4
5
6
7
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-storage # 实际sc名称
resources:
requests:
storage: 50Gi # 可根据需要调整

Alertmanager

与Prometheus一样

修改kube-prometheus/manifests/alertmanager-alertmanager.yaml文件

1
2
3
4
5
6
7
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-storage # 实际sc名称
resources:
requests:
storage: 50Gi # 可根据需要调整

grafana

修改kube-prometheus/manifests/grafana-deployment.yaml文件

1
2
3
4
5
6
7
8
      volumes:
- emptyDir: {}
name: grafana-storage
# 改为
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana

底部新增

1
2
3
4
5
6
7
8
9
10
11
12
13
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana
namespace: monitoring
spec:
storageClassName: nfs-storage # 实际sc名称
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi # 可根据需要调整

应用更改

1
2
3
kubectl apply -f prometheus-prometheus.yaml
kubectl apply -f alertmanager-alertmanager.yaml
kubectl apply -f grafana-deployment.yaml