第一章节:写在前面的话:

当下的需求:当前k8s监控方案大多都是Prometheus+node-exporter+Grafana+AlertManager来实现,然而网上可参考的资料都是将这套监控部署在k8s集群内。

那么现在有个新的需求:监控方案还是用Prometheus+node-exporter+Grafana+AlertManager来实现,由于某些原因想将这套监控部署在k8s集群外。

1、监控部署在k8s集群内 参考这个链接:《https://bbs.huaweicloud.com/blogs/detail/303137》,之前参考这个文档也部署出来了下面的效果图。

2、监控部署在k8s集群外 参考下面笔者写的部署方案。 

第二章节:基础信息

服务名

版本号

下载地址

Kubernetes

1.20.6

使用的是云厂商提供的

Prometheus

2.27.1

https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz

node-exporter

1.1.2

 

Grafana

8.3.1

https://dl.grafana.com/oss/release/grafana-8.3.0.linux-amd64.tar.gz

AlertManager

0.22.2

https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz

prometheus-webhook-dingtalk

1.4.0

https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

 

第三章节:Grafana展示 

1、Grafana 效果展示 1

http://191.10.10.10:3000/d/9CWBz0bik/uk8sjian-kong-da-ping-master-nodezi-yuan-xiang-qing?orgId=1

2、Grafana 效果展示 2

http://191.10.10.10:3000/d/PwMJtdvnz/1-k8s-for-prometheus-dashboard-20211010?orgId=1

 3、Grafana 效果展示 3

 

3、Grafana 效果展示 3

第四章节 Prometheus

1、Prometheus 介绍

 

2、Prometheus 部署

# 下载 

wget -c https://github.com/prometheus/prometheus/releases/download/v2.27.1/prometheus-2.27.1.linux-amd64.tar.gz

# 解压 

tar xvfz prometheus-*.tar.gz

cd prometheus-*

 3、Prometheus 配置文件

cat prometheus.yml

global:

scrape_interval: 15s # By default, scrape targets every 15 seconds.

# Attach these labels to any time series or alerts when communicating with

# external systems (federation, remote storage, Alertmanager).

external_labels:

monitor: 'codelab-monitor'

scrape_configs:

- job_name: 'prometheus_server'

static_configs:

- targets: ['192.168.9.50:9090']

- job_name: 'web_status'

metrics_path: /probe

params:

module: [http_2xx] # Look for a HTTP 200 response.

static_configs:

- targets:

- http://prometheus.io # Target to probe with http.

- https://prometheus.io # Target to probe with https.

- http://example.com:8080 # Target to probe with http on port 8080.

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- target_label: __address__

replacement: 192.168.14.150:9115 # The blackbox exporter's real hostname:port.

- job_name: 'kubernetes-apiservers'

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: endpoints

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

scheme: https

tls_config:

insecure_skip_verify: true

relabel_configs:

# 过滤default下服务名为kubernetes的元数据

- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

separator: ;

regex: default;kubernetes;https

replacement: $1

target_label: __address__

action: keep

- separator: ;

regex: (.*)

target_label: __address__

replacement: 192.168.14.150:6443

action: replace

- job_name: 'kubernetes-scheduler'

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: endpoints

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

scheme: http

tls_config:

insecure_skip_verify: true

relabel_configs:

# 过滤default下服务名为kubernetes的元数据

- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

separator: ;

regex: default;kubernetes;https

replacement: $1

target_label: __address__

action: keep

- source_labels: [__address__]

separator: ;

regex: '(.*):6443'

target_label: __address__

replacement: '${1}:10251'

action: replace

- job_name: 'kubernetes-controller-manager'

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: endpoints

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

scheme: http

tls_config:

insecure_skip_verify: true

relabel_configs:

# 过滤default下服务名为kubernetes的元数据

- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]

separator: ;

regex: default;kubernetes;https

replacement: $1

target_label: __address__

action: keep

- source_labels: [__address__]

separator: ;

regex: '(.*):6443'

target_label: __address__

replacement: '${1}:10252'

action: replace

- job_name: 'kubernetes-node-all'

metrics_path: /metrics

scheme: http

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: node

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

relabel_configs:

- source_labels: [__address__]

regex: '(.*):10250'

replacement: '${1}:9100'

target_label: __address__

action: replace

#- action: labelmap

# regex: __meta_kubernetes_node_label_(.+)

- job_name: kubernetes-node-kubelet

metrics_path: /metrics

scheme: http

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: node

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

relabel_configs:

- source_labels: [__address__]

regex: '(.*):10250'

replacement: '${1}:10255'

target_label: __address__

action: replace

- job_name: kubernetes-cadvisor

metrics_path: /metrics

scheme: https

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: node

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

relabel_configs:

- separator: ;

regex: __meta_kubernetes_node_label_(.+)

replacement: $1

action: labelmap

- separator: ;

regex: (.*)

target_label: __address__

replacement: 192.168.14.150:6443

action: replace

- source_labels: [__meta_kubernetes_node_name]

separator: ;

regex: (.+)

target_label: __metrics_path__

replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor

action: replace

metric_relabel_configs:

- source_labels: [instance]

separator: ;

regex: (.+)

target_label: node

replacement: $1

action: replace

- job_name: 'kubernetes-pods'

scheme: https

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: pod

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

relabel_configs:

- source_labels: [__meta_kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

- source_labels: [__meta_kubernetes_pod_name]

action: replace

target_label: kubernetes_pod_name

- action: labelmap

regex: __meta_kubernetes_pod_label_(.+)

- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]

action: replace

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

target_label: __address__

- source_labels: [__address__]

separator: ;

regex: '.*:(.*)'

target_label: __pod_port__

replacement: $1

action: replace

- source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_pod_name, __pod_port__]

separator: ;

regex: (.*);(.*);(.*)

target_label: __metrics_path__

replacement: /api/v1/namespaces/$1/pods/$2:$3/proxy/metrics

action: replace

- source_labels: [__address__]

separator: ;

regex: (.*)

target_label: __address__

replacement: 192.168.14.150:6443

action: replace

- job_name: 'kubernetes-service-endpoints'

scheme: http

kubernetes_sd_configs:

- api_server: https://192.168.14.150:6443/

role: endpoints

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

tls_config:

insecure_skip_verify: true

tls_config:

insecure_skip_verify: true

bearer_token_file: /data/monitor/prometheus-2.27.1.linux-amd64/uk8s_token

relabel_configs:

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

action: keep

regex: true

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]

action: replace

target_label: __scheme__

regex: (https?)

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]

action: replace

target_label: __metrics_path__

regex: (.+)

- source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port]

action: replace

target_label: __address__

regex: ([^:]+)(?::\d+)?;(\d+)

replacement: $1:$2

- action: labelmap

regex: __meta_kubernetes_service_label_(.+)

- source_labels: [__meta__kubernetes_namespace]

action: replace

target_label: kubernetes_namespace

- source_labels: [__meta__kubernetes_service_name]

action: replace

target_label: kubernetes_service_name

alerting:

alertmanagers:

- static_configs:

- targets: ['192.168.9.50:9093']

rule_files:

- "/data/monitor/prometheus-2.27.1.linux-amd64/rules/*yml"

#- "second_rules.yml"

4、prometheus 配置自启动

cat  /etc/systemd/system/prometheus.service

[Unit]

Description=prometheus

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus --config.file=/data/monitor/prometheus-2.27.1.linux-amd64/prometheus.yml --web.enable-lifecycle --storage.tsdb.path=/data/monitor/prometheus-2.27.1.linux-amd64/data

Restart=on-failure

[Install]

WantedBy=multi-user.target

systemctl daemon-reload

systemctl restart prometheus.service

systemctl enable prometheus.service

systemctl status  prometheus.service 

第五章节

1、node-exporter 介绍

 

2、node-exporter 部署

 

3、node-exporter 配置

 

4、 配置自启动 

第六章节 Grafana

1、Grafana 介绍

 

2、Grafana 部署

wget -c https://dl.grafana.com/oss/release/grafana-8.0.3.linux-amd64.tar.gz

tar -zxvf grafana-8.0.3.linux-amd64.tar.gz

cd grafana-8.0.3/

 

 

3、Grafana 配置

 

4、 配置自启动 

cat /etc/systemd/system/grafana.service

[Unit]

Description=grafana_service

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/monitor/grafana/bin/grafana-server -homepath /data/monitor/grafana

Restart=on-failure

[Install]

WantedBy=multi-user.target

systemctl daemon-reload

systemctl enable grafana.service

systemctl start grafana.service

systemctl status grafana.service

 

 

第七章节 AlertManager

1、AlertManager 介绍

 

2、AlertManager 部署

wget -c https://github.com/prometheus/alertmanager/releases/download/v0.22.2/alertmanager-0.22.2.linux-amd64.tar.gz

tar xf alertmanager-0.22.2.linux-amd64.tar.gz

cd alertmanager-0.22.2.linux-amd64/

3、AlertManager 配置

 

4、 配置自启动 

vim /etc/systemd/system/alertmanager.service

[Unit]

Description=alertmanager

After=network.target

[Service]

Type=simple

User=root

ExecStart=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager --config.file=/data/monitor/alertmanager-0.22.2.linux-amd64/alertmanager.yml --storage.path="/data/monitor/alertmanager-0.22.2.linux-amd64/data/" --log.format=logfmt

Restart=on-failure

[Install]

WantedBy=multi-user.target 

systemctl daemon-reload

systemctl enable alertmanager.service

systemctl restart alertmanager.service

systemctl status  alertmanager.service

第八章节 prometheus-webhook-dingtalk

1、prometheus-webhook-dingtalk 介绍

 

2、prometheus-webhook-dingtalk 部署

 

wget -c https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

tar xf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz

cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/

 

3、prometheus-webhook-dingtalk 配置

cd prometheus-webhook-dingtalk-1.4.0.linux-amd64/

cp config.example.yml config.yml

cat config.yml

## Request timeout

# timeout: 5s

## Customizable templates path

templates:

- /data/monitor/prometheus-webhook-dingtalk-1.4.0/contrib/templates/legacy/default2.tmpl

## You can also override default template using `default_message`

## The following example to use the 'legacy' template from v0.3.0

# default_message:

# title: '{{ template "legacy.title" . }}'

# text: '{{ template "legacy.content" . }}'

## Targets, previously was known as "profiles"

targets:

test:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

webhook_legacy:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

# Customize template content

message:

# Use legacy template

title: '{{ template "legacy.title" . }}'

text: '{{ template "legacy.content" . }}'

webhook_mention_all:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

mention:

all: true

webhook_mention_users:

url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx

mention:

mobiles: ['156xxxx8827', '189xxxx8325']

cat contrib/templates/legacy/template.tmpl

{{ define "ding.link.content2" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}

{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}

{{ define "__text_alert_list" }}{{ range . }}

**Labels**

{{ range .Labels.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}

{{ end }}

**Annotations**

{{ range .Annotations.SortedPairs }}> - {{ .Name }}: {{ .Value | markdown | html }}

{{ end }}

**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})

{{ end }}{{ end }}

{{/* 故障告警 */}}

{{ define "default.__text_alert_list" }}{{ range . }}

**————————————————**

**报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**告警类型:** {{ .Annotations.summary }}

**主机名称:** {{ .Labels.instance}}

**告警详情:** {{ .Annotations.description }}

**————————————————**

{{ end }}

{{ end }}

{{/* 报警恢复 */}}

{{ define "default.__text_resolved_list" }}{{ range . }}

**————————————————**

**报警时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}

**恢复时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}

**主机名称:** {{ .Labels.instance}}

**告警详情:** {{ .Annotations.description }}

**————————————————**

{{ end }}

{{ end }}

{{/* Default */}}

{{ define "default.title" }}{{ template "__subject" . }}{{ end }}

{{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**

{{ if gt (len .Alerts.Firing) 0 -}}

![Firing-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEFTSeNBSwZrpOeKH*wfJrUH*bq5wvFpRL5ZUVtNN73JYtEhtV4He5iNFDbVZLe.S1dtnf6OeIiVqbCOthMY0Pv0!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4)

**故障报警**

{{ template "default.__text_alert_list" .Alerts.Firing }}

{{- end }}

{{ if gt (len .Alerts.Resolved) 0 -}}

![Resolved-img](http://m.qpic.cn/psc?/V51kUUGn0MdYtz4DkTPa4Pbrm40LkcRa/TmEUgtj9EK6.7V8ajmQrEEdthRxYCYVef54h2YlrRZXxd9Y8aCW30HAv53MXawIp2uL7ClzTjC76hjfa5R6buAPPGk9X35.sPY4Z0GWE0Z4!/b&bo=OgJcAAAAAAADF1Y!&rf=viewer_4)

**报警恢复**

{{ template "default.__text_resolved_list" .Alerts.Resolved }}

{{- end }}

{{- end }}

{{/* Legacy */}}

{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}

{{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**

{{ template "__text_alert_list" .Alerts.Firing }}

{{- end }}

{{/* Following names for compatibility */}}

{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}

{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }} 

4、 配置自启动 

cat /etc/systemd/system/prometheus-webhook-dingtalk.service

[Unit]

Description=prometheus-webhook-dingtalk

After=network-online.target

[Service]

Restart=on-failure

ExecStart=/data/monitor/prometheus-webhook-dingtalk-1.4.0/prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxx"

[Install]

WantedBy=multi-user.target

systemctl daemon-reload

systemctl enable prometheus-webhook-dingtalk.service

systemctl restart prometheus-webhook-dingtalk.service

systemctl status prometheus-webhook-dingtalk.service

tail -200f /var/log/messages

 

查看原文