<article class=“article fmt article-content”><p>promethus</p><p>基于k8s</p><p>收集数据</p><p>node-exporter</p><p>vi node-exporter-ds.yml</p><p><code></code></p><p>apiVersion: extensions/v1beta1</p><p>kind: DaemonSet</p><p>metadata:</p><p>  name: node-exporter</p><p>  labels:</p><p>    app: node-exporter</p><p>spec:</p><p>  template:</p><p>    metadata:</p><p>      labels:</p><p>        app: node-exporter</p><p>    spec:</p><p>      hostNetwork: true</p><p>      containers:</p><p>      - image: prom/node-exporter</p><p>        name: node-exporter</p><p>        ports:</p><p>        - containerPort: 9100</p><p>        volumeMounts:</p><p>        - mountPath: “/etc/localtime”</p><p>          name: timezone</p><p>      volumes:</p><p>      - name: timezone</p><p>          hostPath:</p><p>            path: /etc/localtime</p><p></p><p>存储,长久卷,创立一个10G的pv,基于nfs</p><p>vi prometheus-pv.yaml</p><p><code></code></p><p>apiVersion: v1</p><p>kind: PersistentVolume</p><p>metadata:</p><p>  name: gwj-pv-prometheus</p><p>  labels:</p><p>    app: gwj-pv</p><p>spec:</p><p>  capacity:</p><p>    storage: 10Gi</p><p>  volumeMode: Filesystem</p><p>  accessModes:</p><p>  - ReadWriteMany</p><p>  persistentVolumeReclaimPolicy: Recycle</p><p>  storageClassName: slow</p><p>  mountOptions:</p><p>  - hard</p><p>  - nfsvers=4.1</p><p>  nfs:</p><p>    path: /storage/gwj-prometheus</p><p>    server: 10.1.99.1</p><p></p><p>长久卷申领,基于刚刚创立的pv,申领一个5G的pvc</p><p>vi prometheus-pvc.yaml</p><p><code></code></p><p>apiVersion: v1</p><p>kind: PersistentVolumeClaim</p><p>metadata:</p><p>  name: gwj-prometheus-pvc</p><p>  namespace: gwj</p><p>spec:</p><p>  accessModes:</p><p>  - ReadWriteMany</p><p>  volumeMode: Filesystem</p><p>  resources:</p><p>    requests:</p><p>      storage: 5Gi</p><p>  selector:</p><p>    matchLabels:</p><p>      app: gwj-pv</p><p>  storageClassName: slow</p><p></p><p>设置prometheus rbac权限</p><p>clusterrole.rbac.authorization.k8s.io/gwj-prometheus-clusterrole created</p><p>serviceaccount/gwj-prometheus created</p><p>clusterrolebinding.rbac.authorization.k8s.io/gwj-prometheus-rolebinding created</p><p>vi prometheus-rbac.yml</p><p><code></code></p><p>apiVersion: rbac.authorization.k8s.io/v1beta1</p><p>kind: ClusterRole</p><p>metadata:</p><p>  name: gwj-prometheus-clusterrole</p><p>rules:</p><ul><li>apiGroups: [""]</li></ul><p>  resources:</p><p>  - nodes</p><p>  - nodes/proxy</p><p>  - services</p><p>  - endpoints</p><p>  - pods</p><p>  verbs: [“get”, “list”, “watch”]</p><ul><li>apiGroups:</li></ul><p>  - extensions</p><p>  resources:</p><p>  - ingresses</p><p>  verbs: [“get”, “list”, “watch”]</p><ul><li>nonResourceURLs: ["/metrics"]</li></ul><p>  verbs: [“get”]</p><hr/><p>apiVersion: v1</p><p>kind: ServiceAccount</p><p>metadata:</p><p>  namespace: gwj</p><p>  name: gwj-prometheus</p><hr/><p>apiVersion: rbac.authorization.k8s.io/v1beta1</p><p>kind: ClusterRoleBinding</p><p>metadata:</p><p>  name: gwj-prometheus-rolebinding</p><p>roleRef:</p><p>  apiGroup: rbac.authorization.k8s.io</p><p>  kind: ClusterRole</p><p>  name: gwj-prometheus-clusterrole</p><p>subjects:</p><ul><li>kind: ServiceAccount</li></ul><p>  name: gwj-prometheus</p><p>  namespace: gwj</p><p></p><p>创立prometheus 配置文件,应用configmap</p><p>vi prometheus-cm.yml</p><p><code></code></p><p>apiVersion: v1</p><p>kind: ConfigMap</p><p>metadata:</p><p>  name: gwj-prometheus-cm</p><p>  namespace: gwj</p><p>data:</p><p>  prometheus.yml: |</p><p>    rule_files:</p><p>    - /etc/prometheus/rules.yml</p><p>    alerting:</p><p>      alertmanagers:</p><p>      - static_configs:</p><p>        - targets: [“gwj-alertmanger-svc:80”]</p><p>    global:</p><p>      scrape_interval: 10s</p><p>      scrape_timeout: 10s</p><p>      evaluation_interval: 10s</p><p>    scrape_configs:</p><p>    - job_name: ‘kubernetes-nodes’</p><p>      scheme: https</p><p>      tls_config:</p><p>        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt</p><p>      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token</p><p>      kubernetes_sd_configs:</p><p>      - role: node</p><p>      relabel_configs:</p><p>      - action: labelmap</p><p>        regex: _meta_kubernetes_node_label(.+)</p><p>      - source_labels: [__meta_kubernetes_node_name]</p><p>        regex: (.+)</p><p>        target_label: <strong>metrics_path</strong></p><p>        replacement: /api/v1/nodes/${1}/proxy/metrics</p><p>      - target_label: <strong>address</strong></p><p>        replacement: kubernetes.default.svc:443</p><p>    - job_name: ‘kubernetes-node-exporter’</p><p>      tls_config:</p><p>        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt</p><p>      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token</p><p>      kubernetes_sd_configs:</p><p>      - role: node</p><p>      relabel_configs:</p><p>      - action: labelmap</p><p>        regex: _meta_kubernetes_node_label(.+)</p><p>      - source_labels: [__meta_kubernetes_role]</p><p>        action: replace</p><p>        target_label: kubernetes_role</p><p>      - source_labels: [address]</p><p>        regex: ‘(.*):10250’</p><p>        replacement: ‘${1}:9100’</p><p>        target_label: <strong>address</strong></p><p>    - job_name: ‘kubernetes-pods’</p><p>      kubernetes_sd_configs:</p><p>      - role: pod</p><p>      relabel_configs:</p><p>      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]</p><p>        action: keep</p><p>        regex: true</p><p>      - source_labels: [address, __meta_kubernetes_pod_annotation_prometheus_io_port]</p><p>        action: replace</p><p>        target_label: <strong>address</strong></p><p>        regex: (<sup id=“fnref-1”>1</sup>+)(?::d+)?;(d+)</p><p>        replacement: $1:$2</p><p>      - action: labelmap</p><p>        regex: _meta_kubernetes_pod_label(.+)</p><p>      - source_labels: [__meta_kubernetes_namespace]</p><p>        action: replace</p><p>        target_label: kubernetes_namespace</p><p>      - source_labels: [__meta_kubernetes_pod_name]</p><p>        action: replace</p><p>        target_label: kubernetes_pod_name</p><p>    - job_name: ‘kubernetes-cadvisor’</p><p>      scheme: https</p><p>      tls_config:</p><p>        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt</p><p>      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token</p><p>      kubernetes_sd_configs:</p><p>      - role: node</p><p>      relabel_configs:</p><p>      - action: labelmap</p><p>        regex: _meta_kubernetes_node_label(.+)</p><p>      - target_label: <strong>address</strong></p><p>        replacement: kubernetes.default.svc:443</p><p>      - source_labels: [__meta_kubernetes_node_name]</p><p>        regex: (.+)</p><p>        target_label: <strong>metrics_path</strong></p><p>        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor</p><p>    - job_name: ‘kubernetes-service-endpoints’</p><p>      kubernetes_sd_configs:</p><p>      - role: endpoints</p><p>      relabel_configs:</p><p>      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]</p><p>        action: keep</p><p>        regex: true</p><p>      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]</p><p>        action: replace</p><p>        target_label: <strong>scheme</strong></p><p>        regex: (https?)</p><p>      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]</p><p>        action: replace</p><p>        target_label: <strong>metrics_path</strong></p><p>        regex: (.+)</p><p>      - source_labels: [address, __meta_kubernetes_service_annotation_prometheus_io_port]</p><p>        action: replace</p><p>        target_label: <strong>address</strong></p><p>        regex: (<sup id=“fnref-1”>1</sup>+)(?::d+)?;(d+)</p><p>        replacement: $1:$2</p><p>      - action: labelmap</p><p>        regex: _meta_kubernetes_service_label(.+)</p><p>      - source_labels: [__meta_kubernetes_namespace]</p><p>        action: replace</p><p>        target_label: kubernetes_namespace</p><p>      - source_labels: [__meta_kubernetes_service_name]</p><p>        action: replace</p><p>        target_label: kubernetes_name</p><p>  rules.yml: |</p><p>    groups:</p><p>    - name: kebernetes_rules</p><p>      rules:</p><p>      - alert: InstanceDown</p><p>        expr: up{job=“kubernetes-node-exporter”} == 0</p><p>        for: 5m</p><p>        labels:</p><p>          severity: page</p><p>        annotations:</p><p>          summary: “Instance {{ $labels.instance }} down”</p><p>          description: “{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."</p><p>      - alert: APIHighRequestLatency</p><p>        expr: api_http_request_latencies_second{quantile=“0.5”} > 1</p><p>        for: 10m</p><p>        annotations:</p><p>          summary: “High request latency on {{ $labels.instance }}"</p><p>          description: “{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"</p><p>      - alert: StatefulSetReplicasMismatch</p><p>        annotations:</p><p>          summary: “Replicas miss match”</p><p>          description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} has not matched the expected number of replicas for longer than 3 minutes.</p><p>        expr: label_join(kube_statefulset_status_replicas_ready != kube_statefulset_replicas, “instance”, “/”, “namespace”, “statefulset”)</p><p>        for: 3m</p><p>        labels:</p><p>          severity: critical</p><p>      - alert: PodFrequentlyRestarting</p><p>        expr: increase(kube_pod_container_status_restarts_total[1h]) > 5</p><p>        for: 5m</p><p>        labels:</p><p>          severity: warning</p><p>        annotations:</p><p>          description: Pod {{ $labels.namespaces }}/{{ $labels.pod }} is was restarted {{ $value }} times within the last hour</p><p>          summary: Pod is restarting frequently</p><p>      - alert: DeploymentReplicasNotUpdated</p><p>        expr: ((kube_deployment_status_replicas_updated != kube_deployment_spec_replicas)</p><p>          or (kube_deployment_status_replicas_available != kube_deployment_spec_replicas))</p><p>          unless (kube_deployment_spec_paused == 1)</p><p>        for: 5m</p><p>        labels:</p><p>          severity: critical</p><p>        annotations:</p><p>          description: Replicas are not updated and available for deployment {{ $labels.namespace }}/{{ $labels.deployment }}</p><p>          summary: Deployment replicas are outdated</p><p>      - alert: DaemonSetRolloutStuck</p><p>        expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100</p><p>        for: 5m</p><p>        labels:</p><p>          severity: critical</p><p>        annotations:</p><p>          description: Only {{ $value }}% of desired pods scheduled and ready for daemonset {{ $labels.namespace }}/{{ $labels.daemonset }}</p><p>          summary: DaemonSet is missing pods</p><p>      - alert: DaemonSetsNotScheduled</p><p>        expr: kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0</p><p>        for: 10m</p><p>        labels:</p><p>          severity: warning</p><p>        annotations:</p><p>          description: ‘{{<code>{{ $value }}</code>}} Pods of DaemonSet {{<code>{{ $labels.namespace }}</code>}}/{{<code>{{ $labels.daemonset }}</code>}} are not scheduled.’</p><p>          summary: Daemonsets are not scheduled correctly</p><p>      - alert: DaemonSetsMissScheduled</p><p>        expr: kube_daemonset_status_number_misscheduled > 0</p><p>        for: 10m</p><p>        labels:</p><p>          severity: warning</p><p>        annotations:</p><p>          description: ‘{{<code>{{ $value }}</code>}} Pods of DaemonSet {{<code>{{ $labels.namespace }}</code>}}/{{<code>{{ $labels.daemonset }}</code>}} are running where they are not supposed to run.’</p><p>          summary: Daemonsets are not scheduled correctly</p><p>      - alert: Node_Boot_Time</p><p>        expr: (node_time_seconds - node_boot_time_seconds) <= 150</p><p>        for: 15s</p><p>        annotations:</p><p>          summary: “机器{{ $labels.instacnce }} 刚刚重启,工夫少于 150s”</p><p>      - alert: Available_Percent</p><p>        expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes <= 0.2</p><p>        for: 15s</p><p>        annotations:</p><p>          summary: “机器{{ $labels.instacnce }} available less than 20%"</p><p>      - alert: FD_Used_Percent</p><p>        expr: (node_filefd_allocated / node_filefd_maximum) >= 0.8</p><p>        for: 15s</p><p>        annotations:</p><p>          summary: “机器{{ $labels.instacnce }} FD used more than 80%"</p><p></p><p>依据刚刚创立的cm的要求,创立alertmanger 用于告警</p><p>vi alertmanger.yml</p><p><code></code></p><hr/><p>kind: Service</p><p>apiVersion: v1</p><p>metadata:</p><p>  name: gwj-alertmanger-svc</p><p>  namespace: gwj</p><p>spec:</p><p>  selector:</p><p>    app: gwj-alert-pod</p><p>  ports:</p><p>    - protocol: TCP</p><p>      port: 80</p><p>      targetPort: 9093</p><hr/><p>apiVersion: apps/v1</p><p>kind: StatefulSet</p><p>metadata:</p><p>  name: gwj-alert-sts</p><p>  namespace: gwj</p><p>  labels:</p><p>    app: gwj-alert-sts</p><p>spec:</p><p>  replicas: 1</p><p>  serviceName: gwj-alertmanger-svc</p><p>  selector:</p><p>    matchLabels:</p><p>      app: gwj-alert-pod</p><p>  template:</p><p>    metadata:</p><p>      labels:</p><p>        app: gwj-alert-pod</p><p>    spec:</p><p>      containers:</p><p>      - image: prom/alertmanager:v0.14.0</p><p>        name: gwj-alert-pod</p><p>        ports:</p><p>        - containerPort: 9093</p><p>          protocol: TCP</p><p>        volumeMounts:</p><p>        - mountPath: “/etc/localtime”</p><p>          name: timezone</p><p>      volumes:</p><p>      - name: timezone</p><p>        hostPath:</p><p>          path: /etc/localtime</p><p></p><p>kubectl apply -f alertmanger.yml</p><p>  service/gwj-alertmanger-svc created</p><p>  statefulset.apps/gwj-alert-sts created</p><p>创立prometheus statefulset来创立prometheus</p><p>service/gwj-prometheus-svc created</p><p>statefulset.apps/gwj-prometheus-sts created</p><p>/prometheus</p><p>pvc: gwj-prometheus-pvc</p><p>/etc/prometheus/</p><p>configMap:</p><p>  name: gwj-prometheus-cm</p><p>vi prometheus-sts.yml</p><p><code></code></p><hr/><p>kind: Service</p><p>apiVersion: v1</p><p>metadata:</p><p>  name: gwj-prometheus-svc</p><p>  namespace: gwj</p><p>  labels:</p><p>    app: gwj-prometheus-svc</p><p>spec:</p><p>  ports:</p><p>  - port: 80</p><p>    targetPort: 9090</p><p>  selector:</p><p>    app: gwj-prometheus-pod</p><hr/><p>apiVersion: apps/v1</p><p>kind: StatefulSet</p><p>metadata:</p><p>  name: gwj-prometheus-sts</p><p>  namespace: gwj</p><p>  labels:</p><p>    app: gwj-prometheus-sts</p><p>spec:</p><p>  replicas: 1</p><p>  serviceName: gwj-prometheus-svc</p><p>  selector:</p><p>    matchLabels:</p><p>      app: gwj-prometheus-pod</p><p>  template:</p><p>    metadata:</p><p>      labels:</p><p>        app: gwj-prometheus-pod</p><p>    spec:</p><p>      containers:</p><p>      - image: prom/prometheus:v2.9.2</p><p>        name: gwj-prometheus-pod</p><p>        ports:</p><p>        - containerPort: 9090</p><p>          protocol: TCP</p><p>        volumeMounts:</p><p>        - mountPath: “/prometheus”</p><p>          name: data</p><p>        - mountPath: “/etc/prometheus/"</p><p>          name: config-volume</p><p>        - mountPath: “/etc/localtime”</p><p>          name: timezone</p><p>        resources:</p><p>          requests:</p><p>            cpu: 100m</p><p>            memory: 100Mi</p><p>          limits:</p><p>            cpu: 500m</p><p>            memory: 2000Mi</p><p>      serviceAccountName: gwj-prometheus</p><p>      volumes:</p><p>      - name: data</p><p>        persistentVolumeClaim:</p><p>          claimName: gwj-prometheus-pvc</p><p>      - name: config-volume</p><p>        configMap:</p><p>          name: gwj-prometheus-cm</p><p>      - name: gwj-prometheus-rule-cm</p><p>        configMap:</p><p>          name: gwj-prometheus-rule-cm</p><p>      - name: timezone</p><p>        hostPath:</p><p>          path: /etc/localtime</p><p></p><p>kubectl apply -f prometheus-sts.yml</p><p>  service/gwj-prometheus-svc created</p><p>  statefulset.apps/gwj-prometheus-sts created</p><p>创立ingress,依据域名散发到不同的service</p><p>vi prometheus-ingress.yml</p><p><code></code></p><hr/><p>apiVersion: extensions/v1beta1</p><p>kind: Ingress</p><p>metadata:</p><p>  namespace: gwj</p><p>  annotations:</p><p>  name: gwj-ingress-prometheus</p><p>spec:</p><p>  rules:</p><p>  - host: gwj.syncbug.com</p><p>    http:</p><p>      paths:</p><p>        - path: /</p><p>          backend:</p><p>            serviceName: gwj-prometheus-svc</p><p>            servicePort: 80</p><p>  - host: gwj-alert.syncbug.com</p><p>    http:</p><p>      paths:</p><p>        - path: /</p><p>          backend:</p><p>            serviceName: gwj-alertmanger-svc</p><p>            servicePort: 80</p><p></p><p>kubectl apply -f prometheus-ingress.yml</p><p>  ingress.extensions/gwj-ingress-prometheus created</p><p>拜访对应的域名</p><p>gwj.syncbug.com</p><p>查看指标对象是否正确</p><p>http://gwj.syncbug.com/targets</p><p>查看配置文件是否正确</p><p>http://gwj.syncbug.com/config</p><p>gwj-alert.syncbug.com</p><p>===grafana</p><p>vi grafana-pv.yaml</p><p><code></code></p><p>apiVersion: v1</p><p>kind: PersistentVolume</p><p>metadata:</p><p>  name: gwj-pv-grafana</p><p>  labels:</p><p>    app: gwj-pv-gra</p><p>spec:</p><p>  capacity:</p><p>    storage: 2Gi</p><p>  volumeMode: Filesystem</p><p>  accessModes:</p><p>  - ReadWriteMany</p><p>  persistentVolumeReclaimPolicy: Recycle</p><p>  storageClassName: slow</p><p>  mountOptions:</p><p>  - hard</p><p>  - nfsvers=4.1</p><p>  nfs:</p><p>    path: /storage/gwj-grafana</p><p>    server: 10.1.99.1</p><p></p><p>vi grafana-pvc.yaml</p><p><code></code></p><p>apiVersion: v1</p><p>kind: PersistentVolumeClaim</p><p>metadata:</p><p>  name: gwj-grafana-pvc</p><p>  namespace: gwj</p><p>spec:</p><p>  accessModes:</p><p>  - ReadWriteMany</p><p>  volumeMode: Filesystem</p><p>  resources:</p><p>    requests:</p><p>      storage: 1Gi</p><p>  selector:</p><p>    matchLabels:</p><p>      app: gwj-pv-gra</p><p>  storageClassName: slow</p><p></p><p>vi grafana-deployment.yaml</p><p><code></code></p><p>apiVersion: extensions/v1beta1</p><p>kind: Deployment</p><p>metadata:</p><p>  labels:</p><p>    name: grafana</p><p>  name: grafana</p><p>  namespace: gwj</p><p>spec:</p><p>  replicas: 1</p><p>  revisionHistoryLimit: 10</p><p>  selector:</p><p>    matchLabels:</p><p>      app: grafana</p><p>  template:</p><p>    metadata:</p><p>      labels:</p><p>        app: grafana</p><p>      name: grafana</p><p>    spec:</p><p>      containers:</p><p>      - env:</p><p>        - name: GF_PATHS_DATA</p><p>          value: /var/lib/grafana/</p><p>        - name: GF_PATHS_PLUGINS</p><p>          value: /var/lib/grafana/plugins</p><p>        image: grafana/grafana:6.2.4</p><p>        imagePullPolicy: IfNotPresent</p><p>        name: grafana</p><p>        ports:</p><p>        - containerPort: 3000</p><p>          name: grafana</p><p>          protocol: TCP</p><p>        volumeMounts:</p><p>        - mountPath: /var/lib/grafana/</p><p>          name: data</p><p>        - mountPath: /etc/localtime</p><p>          name: localtime</p><p>      dnsPolicy: ClusterFirst</p><p>      restartPolicy: Always</p><p>      volumes:</p><p>      - name: data</p><p>        persistentVolumeClaim:</p><p>          claimName: gwj-grafana-pvc</p><p>      - name: localtime</p><p>        hostPath:</p><p>          path: /etc/localtime</p><p></p><p>vi grafana-ingress.yaml</p><p><code></code></p><hr/><p>apiVersion: extensions/v1beta1</p><p>kind: Ingress</p><p>metadata:</p><p>  namespace: gwj</p><p>  annotations:</p><p>  name: gwj-ingress-grafana</p><p>spec:</p><p>  rules:</p><p>  - host: gwj-grafana.syncbug.com</p><p>    http:</p><p>      paths:</p><p>        - path: /</p><p>          backend:</p><p>            serviceName: gwj-grafana-svc</p><p>            servicePort: 80</p><hr/><p>kind: Service</p><p>apiVersion: v1</p><p>metadata:</p><p>  name: gwj-grafana-svc</p><p>  namespace: gwj</p><p>spec:</p><p>  selector:</p><p>    app: grafana</p><p>  ports:</p><p>    - protocol: TCP</p><p>      port: 80</p><p>      targetPort: 3000</p><p></p><p>进入grafana,gwj-grafana.syncbug.com</p><p>默认: admin admin</p><p>输出datasource: http://gwj-prometheus-svc:80</p><p>import模版</p><hr/><ol><li id=“fn-1”>: ↩</li></ol></article>