kube-prometheus

很多地方提到Prometheus Operator是kubernetes集群监控的终极解决方案,但是目前Prometheus Operator已经不包含完整功能,完整的解决方案已经变为kube-prometheus。项目地址为:
https://github.com/coreos/kube-prometheus

安装

下载软件

#git clone https://github.com/coreos/kube-prometheus.git

查看清单文件

#cd manifests
#ls
00namespace-namespace.yaml                                         node-exporter-clusterRole.yaml
0prometheus-operator-0alertmanagerCustomResourceDefinition.yaml    node-exporter-daemonset.yaml
0prometheus-operator-0prometheusCustomResourceDefinition.yaml      node-exporter-serviceAccount.yaml
0prometheus-operator-0prometheusruleCustomResourceDefinition.yaml  node-exporter-serviceMonitor.yaml
0prometheus-operator-0servicemonitorCustomResourceDefinition.yaml  node-exporter-service.yaml
0prometheus-operator-clusterRoleBinding.yaml                       prometheus-adapter-apiService.yaml
0prometheus-operator-clusterRole.yaml                              prometheus-adapter-clusterRoleAggregatedMetricsReader.yaml
0prometheus-operator-deployment.yaml                               prometheus-adapter-clusterRoleBindingDelegator.yaml
0prometheus-operator-serviceAccount.yaml                           prometheus-adapter-clusterRoleBinding.yaml
0prometheus-operator-serviceMonitor.yaml                           prometheus-adapter-clusterRoleServerResources.yaml
0prometheus-operator-service.yaml                                  prometheus-adapter-clusterRole.yaml
alertmanager-alertmanager.yaml                                     prometheus-adapter-configMap.yaml
alertmanager-secret.yaml                                           prometheus-adapter-deployment.yaml
alertmanager-serviceAccount.yaml                                   prometheus-adapter-roleBindingAuthReader.yaml
alertmanager-serviceMonitor.yaml                                   prometheus-adapter-serviceAccount.yaml
alertmanager-service.yaml                                          prometheus-adapter-service.yaml
grafana-dashboardDatasources.yaml                                  prometheus-clusterRoleBinding.yaml
grafana-dashboardDefinitions.yaml                                  prometheus-clusterRole.yaml
grafana-dashboardSources.yaml                                      prometheus-prometheus.yaml
grafana-deployment.yaml                                            prometheus-roleBindingConfig.yaml
grafana-serviceAccount.yaml                                        prometheus-roleBindingSpecificNamespaces.yaml
grafana-serviceMonitor.yaml                                        prometheus-roleConfig.yaml
grafana-service.yaml                                               prometheus-roleSpecificNamespaces.yaml
kube-state-metrics-clusterRoleBinding.yaml                         prometheus-rules.yaml
kube-state-metrics-clusterRole.yaml                                prometheus-serviceAccount.yaml
kube-state-metrics-deployment.yaml                                 prometheus-serviceMonitorApiserver.yaml
kube-state-metrics-roleBinding.yaml                                prometheus-serviceMonitorCoreDNS.yaml
kube-state-metrics-role.yaml                                       prometheus-serviceMonitorKubeControllerManager.yaml
kube-state-metrics-serviceAccount.yaml                             prometheus-serviceMonitorKubelet.yaml
kube-state-metrics-serviceMonitor.yaml                             prometheus-serviceMonitorKubeScheduler.yaml
kube-state-metrics-service.yaml                                    prometheus-serviceMonitor.yaml
node-exporter-clusterRoleBinding.yaml                              prometheus-service.yaml

修改prometheus-serviceMonitorKubelet.yaml中的port,由https-metrics改为http-metrics,并将scheme改为http

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
spec:
    port: http-metrics
    scheme: http  #很多资料上没有提到scheme

alertmanager-service.yaml 增加nodeport 30093的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
    nodePort: 30093
  type: NodePort
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP

grafana-service.yaml 增加nodeport 32000的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    app: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    targetPort: http
    nodePort: 32000
  type: NodePort
  selector:
    app: grafana

prometheus-service.yaml 增加nodeport 30090的配置

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9090
    targetPort: web
    nodePort: 30090
  type: NodePort
  selector:
    app: prometheus
    prometheus: k8s
  sessionAffinity: ClientIP

创建资源,过程中会报资源不存在,建议执行两次

#kubectl apply -f .

查看自定义资源crd

#kubectl get crd | grep coreos
alertmanagers.monitoring.coreos.com           2019-06-03T09:17:48Z
prometheuses.monitoring.coreos.com            2019-06-03T09:17:48Z
prometheusrules.monitoring.coreos.com         2019-06-03T09:17:48Z
servicemonitors.monitoring.coreos.com         2019-06-03T09:17:48Z

查看新建的pod

#kubectl -n monitoring  get pods  -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP               NODE            NOMINATED NODE   READINESS GATES
alertmanager-main-0                    2/2     Running   0          16h   10.244.196.134   node01          <none>           <none>
alertmanager-main-1                    2/2     Running   0          15h   10.244.241.204   ingressnode02   <none>           <none>
alertmanager-main-2                    2/2     Running   0          15h   10.244.114.4     node05          <none>           <none>
grafana-69c7b8468d-l8p2b               1/1     Running   0          16h   10.244.17.198    prometheus01    <none>           <none>
kube-state-metrics-65b5ccc84-kwfgh     4/4     Running   0          15h   10.244.17.199    prometheus01    <none>           <none>
node-exporter-62mkc                    2/2     Running   0          16h   22.22.3.235      master02        <none>           <none>
node-exporter-6bsrb                    2/2     Running   0          16h   22.22.3.239      node04          <none>           <none>
node-exporter-8b5h8                    2/2     Running   0          16h   22.22.3.241      prometheus01    <none>           <none>
node-exporter-chssb                    2/2     Running   0          16h   22.22.3.243      ingressnode02   <none>           <none>
node-exporter-dwqkc                    2/2     Running   0          16h   22.22.3.240      node05          <none>           <none>
node-exporter-kf2cr                    2/2     Running   0          16h   22.22.3.242      ingressnode01   <none>           <none>
node-exporter-krsm4                    2/2     Running   0          16h   22.22.3.238      node03          <none>           <none>
node-exporter-lv4gx                    2/2     Running   0          16h   22.22.3.236      node01          <none>           <none>
node-exporter-v5f9v                    2/2     Running   0          16h   22.22.3.234      master01        <none>           <none>
node-exporter-zgsr2                    2/2     Running   0          16h   22.22.3.237      node02          <none>           <none>
prometheus-adapter-6c75d8686d-gq8bn    1/1     Running   0          16h   10.244.17.197    prometheus01    <none>           <none>
prometheus-k8s-0                       3/3     Running   1          16h   10.244.140.68    node02          <none>           <none>
prometheus-k8s-1                       3/3     Running   1          16h   10.244.248.198   node04          <none>           <none>
prometheus-operator-74d449f6b4-q6bjn   1/1     Running   0          16h   10.244.17.196    prometheus01    <none>           <none>

确认网页都能正常打开


image.png

image.png

image.png
配置prometheus

展开Status菜单,查看targets,可以看到只有图中两个监控任务没有对应的目标,这和serviceMonitor资源对象有关


image.png

查看yaml文件prometheus-serviceMonitorKubeScheduler,selector匹配的是service的标签,但是kube-system namespace中并没有k8s-app=kube-scheduler的service

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: kube-scheduler
  name: kube-scheduler
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: http-metrics
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-scheduler

新建prometheus-kubeSchedulerService.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-scheduler
  labels:
    k8s-app: kube-scheduler #与servicemonitor中的selector匹配
spec:
  selector: 
    component: kube-scheduler # 与scheduler的pod标签一直
  ports:
  - name: http-metrics
    port: 10251
    targetPort: 10251
    protocol: TCP

创建service kube-scheduler

#kubectl apply -f prometheus-kubeSchedulerService.yaml 

同理新建prometheus-kubeControllerManagerService.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: kube-system
  name: kube-controller-manager
  labels:
    k8s-app: kube-controller-manager
spec:
  selector:
    component: kube-controller-manager
  ports:
  - name: http-metrics
    port: 10252
    targetPort: 10252
    protocol: TCP

创建service kube-controller-manager

#kubectl apply -f prometheus-kubeControllerManagerService.yaml

确认所有targets变为正常


image.png
配置grafana

使用admin/admin登录并修改密码
可以看到数据源已经与prometheus关联


image.png
自定义监控项

以监控etcd为例


image.png

将需要的etcd证书保存到secret对象etcd-certs中

# kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt  --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key  --from-file=/etc/kubernetes/pki/etcd/ca.crt 
secret/etcd-certs created

修改prometheus资源k8s,在prometheus-prometheus.yaml里面增加secrets

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    beta.kubernetes.io/os: linux
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.7.2

应用prometheus-prometheus.yaml

#kubectl apply -f prometheus-prometheus.yaml 

在pod中查看证书是否导入成功

# kubectl -n monitoring  exec -it prometheus-k8s-0 /bin/sh
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
# ls -l /etc/prometheus/secrets/etcd-certs/
total 0
lrwxrwxrwx    1 root     root            13 Jun  4 09:12 ca.crt -> ..data/ca.crt
lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.crt -> ..data/healthcheck-client.crt
lrwxrwxrwx    1 root     root            29 Jun  4 09:12 healthcheck-client.key -> ..data/healthcheck-client.key
/prometheus $ cat /etc/prometheus/secrets/etcd-certs/ca.crt 
-----BEGIN CERTIFICATE-----
MIIC9zCCAd+gAwIBAgIJAMiN3pOWJVGOMA0GCSqGSIb3DQEBCwUAMBIxEDAOBgNV
BAMMB2V0Y2QtY2EwHhcNMTkwNTI3MDgzNDExWhcNMzkwNTIyMDgzNDExWjASMRAw
DgYDVQQDDAdldGNkLWNhMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA
rG1xQcAwZ67XXG84PzqIIqoqnq/zM3Ru+02PELbzgiZ4MrNPte32vZuj6HK/JDDQ
nEirgnQQxQJ6OxvnDrFVwyxveNI8jrd+FRfuh2ae0NIiqkWk88O42OioACBW6cJA
hILpIcn066+E+t2vh/3TmqMduV8eY5p8VAwRT1B04fJAQVcr0sJh3JXExppbtdWL
Z0T25QTbbbZ/I6oxLMu/NkS171R5l397rSpD2ox0NV0GASoqiitffPznOHBPa1Zs
UwOlQnZlWaBM5XQHFhRQTG/Bxxhe45azmmPT3DGCpATk+/GnYDPnt4TSZiX9gZ6O
beRsGUzPDrX/LOEV/Uv+VQIDAQABo1AwTjAdBgNVHQ4EFgQUxQl8C8RdG+tU2U+T
gy901tOxUNUwHwYDVR0jBBgwFoAUxQl8C8RdG+tU2U+Tgy901tOxUNUwDAYDVR0T
BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOCAQEAica5i0wN9ZuCICQOGwMcuVgadBqV
w4dOyP4EPyD2SKx3YpYREMGXOafYkrX2rWKqsCBqS9xUT34x2DQ4/KuoPY/Ee37h
pJ+/i47sq8pmiHxqQRUACyGA6SqWtcApfW62+O97qHnRtyUcCftKKLYEu3djzTJd
FOn6xPehbFzhL9H4tsiZ+kFaXqWDUbhSCAd/LeJ+dxzmOE+Rd0hsPHIyzdmWUKwe
CTkSaf9X4KPWjBUCqPzB/Td6Mz3HHg8zZo2FgkyI98a7c83rHl3aTfBJEi4LND8x
PTFwgOGNlZXa6OnUmkn/sHvoNc88EqDm/GjPI6xfLr7BSWE4jJCIwWROvg==
-----END CERTIFICATE-----

创建serviceMonitor etcd-k8s prometheus-serviceMonitorEtcd.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: etcd-k8s
  name: etcd-k8s
  namespace: monitoring
spec:
  endpoints:
  - port: port
    interval: 30s
    scheme: https
    #port: https-metrics
    tlsConfig:
      caFile: /etc/prometheus/secrets/etcd-certs/ca.cert
      certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.cert
      keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
      insecureSkipVerify: true
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: etcd

应用prometheus-serviceMonitorEtcd.yaml

#kubectl apply -f prometheus-serviceMonitorEtcd.yaml

创建关联的service,因为etcd是外部的,所以需要手动创建endpoints.prometheus-service-etcd.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: etcd
  name: etcd-k8s
  namespace: kube-system
spec:
  ports:
  - name: port
    port: 2379
    protocol: TCP
  type: ClusterIP
  clusterIP: None
---
apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-k8s
  namespace: kube-system
  labels:
    k8s-app: etcd
subsets:
- addresses:
  - ip: 22.22.3.231
    nodeName: etcd01
  - ip: 22.22.3.232
    nodeName: etcd02
  - ip: 22.22.3.233
    nodeName: etcd03
  ports:
  - name: port
    port: 2379
    protocol: TCP

应用prometheus-service-etcd.yaml

#kubectl apply -f prometheus-service-etcd.yaml
image.png

https://grafana.com/dashboards 找到etcd相关dashboard
https://grafana.com/dashboards/3070

image.png

下载json文件,并导入到grafana,需要修改prometheus为prometheus
image.png

查看dashboard
image.png

  • Prometheus和serviceMonitor的配置错误可能导致pod prometheus-k8s-0和prometheus-k8s-1不正常,从而导致prometheus无法打开,只要将配置修改正确就可以恢复。
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,189评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,577评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,857评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,703评论 1 276
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,705评论 5 366
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,620评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,995评论 3 396
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,656评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,898评论 1 298
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,639评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,720评论 1 330
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,395评论 4 319
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,982评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,953评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,195评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 44,907评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,472评论 2 342

推荐阅读更多精彩内容