介绍

在第二篇中(原文链接)介绍了alert manager的安装,这篇文章是基于这个内容的扩展

AlertManager Config

使用Alertmanager CRD 创建AlertManager时,有两种加载配置文件方式

官方文档:https://prometheus-operator.dev/docs/developer/alerting/

1
2
3
4
5
alertmanagerConfiguration:
name: alert-global
alertmanagerConfigSelector:
matchLabels:
alert: alert-config

alertmanagerConfiguration:全局配置

alertmanagerConfigSelector:基于命名空间的配置

非全局会添加基于命名空间的条件,告警规则必须存在这个lable,以下两者生成alertmanager配置的区别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
route:
<<<<<<< 基于命名空间的配置
receiver: "null"
continue: false
routes:
- receiver: monitoring/config-example/email
group_by:
- alertname
matchers:
- namespace="monitoring"
continue: true
group_wait: 10s
group_interval: 5m
repeat_interval: 12h
>>>>>>> 全局配置
receiver: monitoring/alert-global/email
group_by:
- alertname
continue: false
group_wait: 10s
group_interval: 1m
repeat_interval: 1m


inhibit_rules:
<<<<<<< 基于命名空间的配置
- source_matchers:
- namespace="monitoring"
- severity="critical"
target_matchers:
- namespace="monitoring"
- severity="warning"
equal:
- instance
>>>>>>> 全局配置
- source_matchers:
- severity="critical"
target_matchers:
- severity="warning"
equal:
- instance

邮件告警

  1. 创建发件邮箱

  2. 测试邮箱发件(阿里邮箱)

    1
    2
    3
    4
    5
    6
    curl --url 'smtps://smtp.vsoul.cn' \
    --mail-from '[email protected]' \
    --mail-rcpt '[email protected]' \
    --user '[email protected]:YOUR_PASSWORD' \
    --ssl-reqd --insecure \
    -T <(printf 'From: VSoul 通知 <[email protected]>\nTo: [email protected]\nSubject: 测试邮件发送')

创建AlertManagerConfig

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: email
namespace: monitoring
labels: # 创建Prometheus时指定alertmanager labels
alert: alert-config
spec:
# 路由
# 官方API:https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1alpha1.Route
route:
groupBy: ['alertname'] # 告警分组依据,通常按alertname分组,相同名称的告警会被合并
groupWait: 10s # 初始等待时间,同一组的告警首次触发后等待10秒再发送(用于收集同组其他告警)
groupInterval: 5m # 同一组告警的间隔时间(当有新告警触发时,距离上次发送至少5分钟才会再通知)
repeatInterval: 12h # 重复告警的发送间隔(相同告警未解决时,每隔12小时重复通知一次)
receiver: 'email' # 默认接收器名称(需与receivers中的name对应)
# 接收器
# 官方API:https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1alpha1.Receiver
receivers:
- name: 'email'
# 邮箱配置
emailConfigs:
- to: '[email protected]'
from: '[email protected]'
smarthost: 'smtp.qiye.aliyun.com:465'
authUsername: '[email protected]'
authPassword:
name: smtp-secret
key: password
requireTLS: false
sendResolved: true
tlsConfig:
insecureSkipVerify: false
# 抑制规则
# 官方API:https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1alpha1.InhibitRule
inhibitRules:
- sourceMatch:
- name: severity
value: critical
targetMatch:
- name: severity
value: warning
equal:
- instance

---
apiVersion: v1
kind: Secret
metadata:
name: smtp-secret
namespace: monitoring
type: Opaque
stringData:
password: YOUR_PASSWORD

创建告警规则

测试规则

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: test-email-alert
namespace: monitoring
spec:
groups:
- name: test-email
rules:
- alert: TestEmailAlert
expr: vector(1) # 永远为真
for: 0m
labels:
# 如果时基于命名空间的配置需要加上namespace label,因为测试PromQL的条件永远为真,没有指标,所以获取不到namespace的label
namespace: monitor
severity: critical
annotations:
summary: "测试邮件告警"
description: "这是一条来自 PrometheusRule 的测试告警"

收到告警即为成功