prometheus通过alertmanager集成钉钉告警

下载二进制包:Download | Prometheus

前提条件

  • Prometheus
  • alertmanager
  • 设置告警规则
  • 已有监控节点/服务

创建告警机器人

创建群聊

image-20240418114025103

添加机器人

image-20240418114122619

配置安全设置为加签,并记录Webhook和加签密钥

image-20240419143554146

安装dingtalk-webhook

下载地址:Releases · timonwong/prometheus-webhook-dingtalk (github.com)

安装

1
2
tar zxvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /usr/local/prometheus
mv /usr/local/prometheus/prometheus-webhook-dingtalk-2.1.0.linux-amd64 /usr/local/prometheus/dingtalk

修改配置文件

配置告警消息
1
vim /usr/local/prometheus/dingtalk/default.tmpl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}


{{ define "__alert_list" }}{{ range . }}
---

{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}

**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}

**告警级别**: {{ .Labels.severity }}

**告警主机**: {{ .Labels.instance }}

**告警信息**: {{ index .Annotations "description" }}

**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}

{{ define "__resolved_list" }}{{ range . }}
---

{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}

**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}

**告警级别**: {{ .Labels.severity }}

**告警主机**: {{ .Labels.instance }}

**告警信息**: {{ index .Annotations "description" }}

**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}


{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}

{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**====侦测到{{ .Alerts.Firing | len }}个故障====**

{{ template "__alert_list" .Alerts.Firing }}
---

{{ end }}

{{ if gt (len .Alerts.Resolved) 0 }}
**====恢复{{ .Alerts.Resolved | len }}个故障====**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}


{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}
钉钉机器人集成
1
vim /usr/local/prometheus/dingtalk/config.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Request timeout
# timeout: 5s

## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true

## Customizable templates path
templates:
- /usr/local/prometheus/dingtalk/default.tmpl

## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'

## Targets, previously was known as "profiles"
targets:
webhook1:
# token
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxx
# 加签密钥
secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

修改alertmanager配置文件

1
vim /usr/local/prometheus/alertmanager/alertmanager.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
route:
group_by: ['dingtalk']
group_wait: 1s
group_interval: 5m
repeat_interval: 1h
receiver: 'dingtalk.webhook1'
routes:
- receiver: "dingtalk.webhook1"
match_re:
altername: ".*"
receivers:
- name: 'dingtalk.webhook1'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/webhook1/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']

重启alertmanager

1
systemctl restart alertmanager

验证

访问alertmanager地址:http://ip:9093/#/status,验证配置生效。

image-20240419143915353

测试

找一个节点或服务,我这里停掉当前节点的node_exporter服务

1
systemctl stop node_exporter

触发以下告警规则

1
2
3
4
5
6
7
8
- alert: 服务器宕机
expr: up == 0
for: 1s
labels:
severity: 严重告警
annotations:
summary: "{{$labels.instance}} 服务器宕机, 请尽快处理!"
description: "{{$labels.instance}} 服务器node_exporter服务被关闭,当前状态{{ $value }}. "

稍等一下,收到

image-20240419140237250