前言

随着生产环境局部服务曾经切换至Kubernetes,最近的工作都围绕着周边基础设施来进行,最先思考到的就是架构可观测性的欠缺,包含了日志零碎和监控零碎,相比之下我感觉日志零碎更为迫切些。

对于日志这个事件其实可大可小,所有取决于利用的规模,日志的重要度以及复杂度会随着规模的变大而直线回升。日系零碎蕴含的货色其实很多,一方面蕴含日志根底设置的建设,更重要的在于日志的标准的制订和体系的建设。这次除了须要搭建Kubernetes下的日志平台,还想趁着这次机会梳理一下与日志相干的方方面面,来建设较为欠缺的日志体系。

基于开源的日志计划首选就是ELK,未引入容器化之前咱们采纳的也是ELK + Filebeat,全副基于文件采集的形式,那么这次日志环境搭建的思路就是正确的把这套日志计划移入Kubernetes环境,当然基于EFK(Elasticsearch、Fluentd、Kibana)的计划也是 Kubernetes 官网比拟举荐的一种计划。

包含:

  • 利用FileBeats、Fluentd等采集Agent实现容器上的数据对立收集。(Fluentd因为性能、与Kubernetes的无缝集成以及是CNCF会员我的项目等起因代替了Logstash)。
  • 采集的数据能够对接ElasticSearch来做实时的查问检索。
  • 数据的可视化能够应用grafana、kibana等罕用的可视化组件。

以咱们目前的日志要求以及量级,以上计划就足够了,至于定制化的数据荡涤,实时或者离线的数据分析,这些作为后续扩大的储备。

Kubernetes下日志特点

在 Kubernetes 中,日志采集相比传统虚拟机、物理机形式要简单很多,最基本的起因是 Kubernetes 把底层异样屏蔽,提供更加细粒度的资源调度,向上提供稳固、动静的环境。因而日志采集面对的是更加丰盛、动静的环境,须要思考的点也更加的多。

  1. 日志的模式变得更加简单,不仅有物理机/虚拟机上的日志,还有容器的规范输入、容器内的文件、容器事件、Kubernetes 事件等等信息须要采集。
  2. 环境的动态性变强,在 Kubernetes 中,机器的宕机、下线、上线、Pod销毁、扩容/缩容等都是常态,这种状况下日志的存在是刹时的(例如如果 Pod 销毁后该 Pod 日志就不可见了),所以日志数据必须实时采集到服务端。同时还须要保障日志的采集可能适应这种动态性极强的场景;
  3. 日志的品种变多,一个申请从客户端须要通过 CDN、Ingress、Service Mesh、Pod 等多个组件,波及多种基础设施,其中的日志品种减少了很多,例如 K8s 各种零碎组件日志、审计日志、ServiceMesh 日志、Ingress 等;

采集形式

日志的采集形式次要分为被动和被动,被动推送包含业务直写和DockerEngine 推送两种形式,前者在利用中集成日志sdk,和利用强绑定,后者太依赖容器运行时,灵活性也不够,这里都不做思考。

至于被动形式,次要依附在节点上运行日志Agent,通过轮训形式(阿里的logtail貌似是事件告诉形式)采集节点日志。在Kubernetes环境下,也有两种形式:

  • DaemonSet 形式在每个 node 节点上只运行一个日志 agent,采集这个节点上所有的日志。DaemonSet 绝对资源占用要小很多,但扩展性、租户隔离性受限,比拟实用于性能繁多或业务不是很多的集群;
  • Sidecar 形式为每个 POD 独自部署日志 agent,这个 agent 只负责一个业务利用的日志采集。Sidecar 绝对资源占用较多,但灵活性以及多租户隔离性较强,倡议大型的 K8s 集群或作为 PaaS 平台为多个业务方服务的集群应用该形式。

网上找的一个比照表格,在这贴一下:

DockerEngine业务直写DaemonSet形式Sidecar形式
采集日志类型规范输入业务日志规范输入+局部文件文件
部署运维低,原生反对低,只需保护好配置文件即可个别,需保护DaemonSet较高,每个须要采集日志的POD都须要部署sidecar容器
日志分类存储无奈实现业务独立配置个别,可通过容器/门路等映射每个POD可独自配置,灵活性高
多租户隔离弱,日志直写会和业务逻辑竞争资源个别,只能通过配置间隔离强,通过容器进行隔离,可独自分配资源
反对集群规模本地存储无限度,若应用syslog、fluentd会有单点限度无限度取决于配置数无限度
资源占用低,docker
engine提供整体最低,省去采集开销较低,每个节点运行一个容器较高,每个POD运行一个容器
查问便捷性低,只能grep原始日志高,可依据业务特点进行定制较高,可进行自定义的查问、统计高,可依据业务特点进行定制
可定制性高,可自在扩大高,每个POD独自配置
耦合度高,与DockerEngine强绑定,批改须要重启DockerEngine高,采集模块批改/降级须要从新公布业务低,Agent可独立降级个别,默认采集Agent降级对应Sidecar业务也会重启(有一些扩大包能够反对Sidecar热降级)
实用场景测试、POC等非生产场景对性能要求极高的场景日志分类明确、性能较繁多的集群大型、混合型、PAAS型集群

通过比照后,DaemonSet形式最适宜目前的状况。

日志输入形式

和虚拟机/物理机不同,K8s 的容器提供规范输入和文件两种形式。在容器中,规范输入将日志间接输入到 stdout 或 stderr,而 DockerEngine 接管 stdout 和 stderr 文件描述符,将日志接管后依照 DockerEngine 配置的 LogDriver 规定进行解决;日志打印到文件的形式和虚拟机/物理机根本相似,只是日志能够应用不同的存储形式,例如默认存储、EmptyDir、HostVolume、NFS 等。

然而stdout形式并不是不写文件。例如Docker的JSON LogDriver日志输入的过程包含:利用 stdout -> DockerEngine -> LogDriver -> 序列化成 JSON -> 保留到文件,最初由日志的Agent收集。

相比之下:

  1. 文件形式性能会更好一点,因为Stdout形式两头会通过好几个流程。
  2. 文件形式不同的日志能够放入不同的文件,在采集和剖析过程中达到了分类的成果;而Stdout形式输入都在一个流中。
  3. 操作文件的策略会更加多样化,例如同步/异步写入、缓存大小、文件轮转策略、压缩策略、革除策略等,绝对更加灵便。

所以在搭建根底平台时我会先应用stdout形式,文件形式更多会依赖日志体系的具体规定。

EFK日志计划原理

EFK 利用部署在每个节点上的 Fluentd 采集 Kubernetes 节点服务器的 /var/log/var/lib/docker/container 两个目录下的日志,而后传到 Elasticsearch 中。最初,用户通过拜访 Kibana 来查问日志。

具体过程如下:

  1. 创立 Fluentd 并且将 Kubernetes 节点服务器 log 目录挂载进容器。
  2. Fluentd 采集节点服务器 log 目录下的 containers 外面的日志文件。
  3. Fluentd 将收集的日志转换成 JSON 格局。
  4. Fluentd 利用 Exception Plugin 检测日志是否为容器抛出的异样日志,如果是就将异样栈的多行日志合并。
  5. Fluentd 将换行多行日志 JSON 合并。
  6. Fluentd 应用 Kubernetes Metadata Plugin 检测出 Kubernetes 的 Metadata 数据进行过滤,如 Namespace、Pod Name 等。
  7. Fluentd 应用 ElasticSearch Plugin 将整顿完的 JSON 日志输入到 ElasticSearch 中。
  8. ElasticSearch 建设对应索引,长久化日志信息。
  9. Kibana 检索 ElasticSearch 中 Kubernetes 日志相干信息进行展现。

EFK的装置和配置

ECK

对于ECK(Elastic Cloud on Kubernetes)官网是这么阐明的:

Elastic Cloud on Kubernetes 简化了在 Kubernetes 中运行 Elasticsearch 和 Kibana 的作业,包含设置、降级、快照、扩大、高可用性、安全性等。

简而言之就是官网提供的一种新的基于Kubernetes的简便的部署Elasticsearch的形式。ECK形式装置的架构图如下:

Local PV和local-path-provisioner的应用

在装置Elasticsearch之前,是须要扯一下存储问题的,因为Elasticsearch须要存储日志数据,所以它并不是一个无状态利用,须要为它筹备相应的长久化存储,咱们并没有云存储或者nfs,还好Kubernetes提供了Local PV的概念基于本地磁盘来提供长久化存储。然而目前还是有一些局限性:

1、目前Local PV不反对对空间申请治理,须要手动对空间进行配置和治理。
2、默认Local PV的StorageClass的provisioner是kubernetes.io/no-provisioner , 这是因为Local PV不反对Dynamic Provisioning, 所以它没有方法在创立出PVC的时候, 主动创立对应PV。

对于PV、PVC、StorageClass这些概念我就不反复了,下面是什么意思呢?意思是Kubernetes的动态存储分配是通过StorageClass实现的,而Local PV的StorageClass没有对应的provisioner。所以没有方法做到动静提供PV,须要提前将pv创立好,而后再与pvc绑定后能力应用,这无疑是比拟麻烦的,当然如果是学习目标的话,手动创立PV和PVC也未尝不可。所以社区和一些厂商针对Local PV提供了相应的provisioner包,这里以Rancher开源的local-path-provisioner为例。

  • 装置local-path-provisioner

    kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

网络起因下不来的话我能够用上面的

apiVersion: v1kind: Namespacemetadata:  name: local-path-storage---apiVersion: v1kind: ServiceAccountmetadata:  name: local-path-provisioner-service-account  namespace: local-path-storage---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata:  name: local-path-provisioner-rolerules:  - apiGroups: [ "" ]    resources: [ "nodes", "persistentvolumeclaims", "configmaps" ]    verbs: [ "get", "list", "watch" ]  - apiGroups: [ "" ]    resources: [ "endpoints", "persistentvolumes", "pods" ]    verbs: [ "*" ]  - apiGroups: [ "" ]    resources: [ "events" ]    verbs: [ "create", "patch" ]  - apiGroups: [ "storage.k8s.io" ]    resources: [ "storageclasses" ]    verbs: [ "get", "list", "watch" ]---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: local-path-provisioner-bindroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: local-path-provisioner-rolesubjects:  - kind: ServiceAccount    name: local-path-provisioner-service-account    namespace: local-path-storage---apiVersion: apps/v1kind: Deploymentmetadata:  name: local-path-provisioner  namespace: local-path-storagespec:  replicas: 1  selector:    matchLabels:      app: local-path-provisioner  template:    metadata:      labels:        app: local-path-provisioner    spec:      serviceAccountName: local-path-provisioner-service-account      containers:        - name: local-path-provisioner          image: rancher/local-path-provisioner:v0.0.19          imagePullPolicy: IfNotPresent          command:            - local-path-provisioner            - --debug            - start            - --config            - /etc/config/config.json          volumeMounts:            - name: config-volume              mountPath: /etc/config/          env:            - name: POD_NAMESPACE              valueFrom:                fieldRef:                  fieldPath: metadata.namespace      volumes:        - name: config-volume          configMap:            name: local-path-config---apiVersion: storage.k8s.io/v1kind: StorageClassmetadata:  name: local-pathprovisioner: rancher.io/local-pathvolumeBindingMode: WaitForFirstConsumerreclaimPolicy: Delete---kind: ConfigMapapiVersion: v1metadata:  name: local-path-config  namespace: local-path-storagedata:  config.json: |-    {            "nodePathMap":[            {                    "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",                    "paths":["/home/k8s"]            },             {                     "node":"master02",                     "paths":["/opt/local-path-provisioner", "/app/k8s"]             },             {                      "node":"node05",                      "paths":["/opt/local-path-provisioner", "/app/k8s"]             }            ]    }  setup: |-    #!/bin/sh    while getopts "m:s:p:" opt    do        case $opt in            p)            absolutePath=$OPTARG            ;;            s)            sizeInBytes=$OPTARG            ;;            m)            volMode=$OPTARG            ;;        esac    done    mkdir -m 0777 -p ${absolutePath}  teardown: |-    #!/bin/sh    while getopts "m:s:p:" opt    do        case $opt in            p)            absolutePath=$OPTARG            ;;            s)            sizeInBytes=$OPTARG            ;;            m)            volMode=$OPTARG            ;;        esac    done    rm -rf ${absolutePath}  helperPod.yaml: |-    apiVersion: v1    kind: Pod    metadata:      name: helper-pod    spec:      containers:      - name: helper-pod        image: busybox        imagePullPolicy: IfNotPresent
  • 配置Local Path Provisioner

    Local Path Provisioner反对一些配置,具体的能够看官网文档,我就说一点:

    我本身的服务器磁盘挂载的目录各有不同,所以当ElasticSearch在不同节点上运行时,我心愿数据放在不同的中央,Local Path Provisioner会把数据默认放在/opt/local-path-provisioner目录下,能够通过上面这段配置设置,DEFAULT_PATH_FOR_NON_LISTED_NODES参数配置默认的数据目录。

    config.json: |-    {            "nodePathMap":[            {                    "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",                    "paths":["/home/k8s"]            },             {                     "node":"master02",                     "paths":["/opt/local-path-provisioner", "/app/k8s"]             },             {                      "node":"node05",                      "paths":["/opt/local-path-provisioner", "/app/k8s"]             }            ]    }

ECK装置

  • 装置 Operator

    ## 装置kubectl apply -f https://download.elastic.co/downloads/eck/1.5.0/all-in-one.yaml## 删除kubectl delete -f https://download.elastic.co/downloads/eck/1.5.0/all-in-one.yaml

装置胜利后,会主动创立一个 elastic-systemnamespace 以及一个 operatorPod

❯ kubectl get all -n elastic-systemNAME                     READY   STATUS    RESTARTS   AGEpod/elastic-operator-0   1/1     Running   0          53sNAME                             TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGEservice/elastic-webhook-server   ClusterIP   10.0.73.219   <none>        443/TCP   55sNAME                                READY   AGEstatefulset.apps/elastic-operator   1/1     57s
  • 部署ECK

    提供一份ECK的资源文件,对于ECK的配置具体参照文档。

    apiVersion: elasticsearch.k8s.elastic.co/v1kind: Elasticsearchmetadata:  name: quickstart  namespace: elastic-systemspec:  version: 7.12.1  nodeSets:  - name: default    count: 3    volumeClaimTemplates:    - metadata:        name: elasticsearch-data      spec:        accessModes:        - ReadWriteOnce        resources:          requests:            storage: 100Gi        storageClassName: local-path    config:        node.master: true        node.data: true        node.ingest: true        node.store.allow_mmap: false---apiVersion: kibana.k8s.elastic.co/v1kind: Kibanametadata:  name: quickstart  namespace: elastic-systemspec:  version: 7.12.1  count: 1  elasticsearchRef:    name: quickstart  config:     i18n.locale: "zh-CN"

其中storageClassName: local-path是Local Path Provisioner提供的storageClass名称,因为默认Kibana是英文的,通过i18n.locale: "zh-CN"设置为中文。

kubectl apply -f eck.yaml

提交资源文件,部署结束后,可查看 elastic-system 命名空间下曾经部署了 ElasticsearchKibana:

❯ kubectl get all -n elastic-systemNAME                             READY   STATUS    RESTARTS   AGEpod/elastic-es-default-0         1/1     Running   0          10dpod/elastic-es-default-1         1/1     Running   0          10dpod/elastic-es-default-2         1/1     Running   0          10dpod/elastic-operator-0           1/1     Running   1          10dpod/kibana-kb-5bcd9f45dc-hzc9s   1/1     Running   0          10dNAME                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGEservice/elastic-es-default       ClusterIP   None           <none>        9200/TCP   10dservice/elastic-es-http          ClusterIP   172.23.4.246   <none>        9200/TCP   10dservice/elastic-es-transport     ClusterIP   None           <none>        9300/TCP   10dservice/elastic-webhook-server   ClusterIP   172.23.8.16    <none>        443/TCP    10dservice/kibana-kb-http           ClusterIP   172.23.7.101   <none>        5601/TCP   10dNAME                        READY   UP-TO-DATE   AVAILABLE   AGEdeployment.apps/kibana-kb   1/1     1            1           10dNAME                                   DESIRED   CURRENT   READY   AGEreplicaset.apps/kibana-kb-5bcd9f45dc   1         1         1       10dNAME                                  READY   AGEstatefulset.apps/elastic-es-default   3/3     10dstatefulset.apps/elastic-operator     1/1     10d

部署Fluentd

Fluent 在 github 上保护了 fluentd-kubernetes-daemonset 我的项目,能够供咱们参考。

# fluentd-es-ds.yamlapiVersion: v1kind: ServiceAccountmetadata:  name: fluentd-es  namespace: elastic-system  labels:    app: fluentd-es---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata:  name: fluentd-es  labels:    app: fluentd-esrules:- apiGroups:  - ""  resources:  - "namespaces"  - "pods"  verbs:  - "get"  - "watch"  - "list"---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata:  name: fluentd-es  labels:    app: fluentd-essubjects:- kind: ServiceAccount  name: fluentd-es  namespace: elastic-system  apiGroup: ""roleRef:  kind: ClusterRole  name: fluentd-es  apiGroup: ""---apiVersion: apps/v1kind: DaemonSetmetadata:  name: fluentd-es  namespace: elastic-system  labels:    app: fluentd-esspec:  selector:    matchLabels:      app: fluentd-es  template:    metadata:      labels:        app: fluentd-es    spec:      serviceAccount: fluentd-es      serviceAccountName: fluentd-es      tolerations:      - key: node-role.kubernetes.io/master        effect: NoSchedule            containers:      - name: fluentd-es        image: fluent/fluentd-kubernetes-daemonset:v1.11.5-debian-elasticsearch7-1.1        env:        - name:  FLUENT_ELASTICSEARCH_HOST          value: quickstart-es-http        # default user        - name:  FLUENT_ELASTICSEARCH_USER          value: elastic        # is already present from the elasticsearch deployment        - name:  FLUENT_ELASTICSEARCH_PASSWORD          valueFrom:            secretKeyRef:              name: quickstart-es-elastic-user              key: elastic        # elasticsearch standard port        - name:  FLUENT_ELASTICSEARCH_PORT          value: "9200"        # der elastic operator ist https standard        - name: FLUENT_ELASTICSEARCH_SCHEME          value: "https"          # dont need systemd logs for now        - name: FLUENTD_SYSTEMD_CONF          value: disable        # da certs self signt sind muss verify disabled werden        - name:  FLUENT_ELASTICSEARCH_SSL_VERIFY          value: "false"        # to avoid issue https://github.com/uken/fluent-plugin-elasticsearch/issues/525        - name:  FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS          value: "false"        resources:          limits:            memory: 512Mi          requests:            cpu: 100m            memory: 100Mi        volumeMounts:        - name: varlog          mountPath: /var/log        - name: varlibdockercontainers          mountPath: /var/lib/docker/containers          readOnly: true        - name: config-volume          mountPath: /fluentd/etc      terminationGracePeriodSeconds: 30      volumes:      - name: varlog        hostPath:          path: /var/log      - name: varlibdockercontainers        hostPath:          path: /var/lib/docker/containers      - name: config-volume        configMap:          name: fluentd-es-config

fluentd配置资源文件如下

# fluentd-es-configmapkind: ConfigMapapiVersion: v1metadata:  name: fluentd-es-config  namespace: elastic-systemdata:  fluent.conf: |-    # https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/docker-image/v1.11/debian-elasticsearch7/conf/fluent.conf    @include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf"    @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf"    @include kubernetes.conf    @include conf.d/*.conf    <match kubernetes.**>      # https://github.com/kubernetes/kubernetes/issues/23001      @type elasticsearch_dynamic      @id  kubernetes_elasticsearch      @log_level info      include_tag_key true      host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"      port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"      path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}"      scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"      ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"      ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}"      user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}"      password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}"      reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}"      reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}"      reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}"      log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}"      logstash_prefix logstash-${record['kubernetes']['namespace_name']}      logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"      logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}"      index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}"      target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}"      type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}"      include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"      template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}"      template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}"      template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}"      sniffer_class_name "#{ENV['FLUENT_SNIFFER_CLASS_NAME'] || 'Fluent::Plugin::ElasticsearchSimpleSniffer'}"      request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '5s'}"      suppress_type_name "#{ENV['FLUENT_ELASTICSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}"      enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}"      ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}"      ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}"      ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}"      <buffer>        flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"        flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}"        chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}"        queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"        retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"        retry_forever true      </buffer>    </match>    <match **>      @type elasticsearch      @id out_es      @log_level info      include_tag_key true      host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}"      port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}"      path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}"      scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}"      ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}"      ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}"      user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}"      password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}"      reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}"      reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}"      reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}"      log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}"      logstash_prefix "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_PREFIX'] || 'logstash'}"      logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}"      logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}"      index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}"      target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}"      type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}"      include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}"      template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}"      template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}"      template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}"      sniffer_class_name "#{ENV['FLUENT_SNIFFER_CLASS_NAME'] || 'Fluent::Plugin::ElasticsearchSimpleSniffer'}"      request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '5s'}"      suppress_type_name "#{ENV['FLUENT_ELASTICSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}"      enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}"      ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}"      ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}"      ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}"      <buffer>        flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}"        flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}"        chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}"        queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}"        retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}"        retry_forever true      </buffer>    </match>  kubernetes.conf: |-    # https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/docker-image/v1.11/debian-elasticsearch7/conf/kubernetes.conf    <label @FLUENT_LOG>      <match fluent.**>        @type null        @id ignore_fluent_logs      </match>    </label>    <source>      @id fluentd-containers.log      @type tail      path /var/log/containers/*.log      pos_file /var/log/es-containers.log.pos      tag raw.kubernetes.*      read_from_head true      <parse>        @type multi_format        <pattern>          format json          time_key time          time_format %Y-%m-%dT%H:%M:%S.%NZ        </pattern>        <pattern>          format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/          time_format %Y-%m-%dT%H:%M:%S.%N%:z        </pattern>      </parse>    </source>    # Detect exceptions in the log output and forward them as one log entry.    <match raw.kubernetes.**>      @id raw.kubernetes      @type detect_exceptions      remove_tag_prefix raw      message log      stream stream      multiline_flush_interval 5      chunk_limit_size 512m      max_bytes 50000000      max_lines 1000    </match>    # Concatenate multi-line logs    <filter **>      @id filter_concat      @type concat      key message      multiline_end_regexp /\n$/      separator ""    </filter>    # Enriches records with Kubernetes metadata    <filter kubernetes.**>      @id filter_kubernetes_metadata      @type kubernetes_metadata    </filter>    # Fixes json fields in Elasticsearch    <filter kubernetes.**>      @id filter_parser      @type parser      key_name log      reserve_data true      remove_key_name_field true      <parse>        @type multi_format        <pattern>          format json        </pattern>        <pattern>          format none        </pattern>      </parse>    </filter>    <source>      @type tail      @id in_tail_minion      path /var/log/salt/minion      pos_file /var/log/fluentd-salt.pos      tag salt      <parse>        @type regexp        expression /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/        time_format %Y-%m-%d %H:%M:%S      </parse>    </source>    <source>      @type tail      @id in_tail_startupscript      path /var/log/startupscript.log      pos_file /var/log/fluentd-startupscript.log.pos      tag startupscript      <parse>        @type syslog      </parse>    </source>    <source>      @type tail      @id in_tail_docker      path /var/log/docker.log      pos_file /var/log/fluentd-docker.log.pos      tag docker      <parse>        @type regexp        expression /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/      </parse>    </source>    <source>      @type tail      @id in_tail_etcd      path /var/log/etcd.log      pos_file /var/log/fluentd-etcd.log.pos      tag etcd      <parse>        @type none      </parse>    </source>    <source>      @type tail      @id in_tail_kubelet      multiline_flush_interval 5s      path /var/log/kubelet.log      pos_file /var/log/fluentd-kubelet.log.pos      tag kubelet      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_kube_proxy      multiline_flush_interval 5s      path /var/log/kube-proxy.log      pos_file /var/log/fluentd-kube-proxy.log.pos      tag kube-proxy      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_kube_apiserver      multiline_flush_interval 5s      path /var/log/kube-apiserver.log      pos_file /var/log/fluentd-kube-apiserver.log.pos      tag kube-apiserver      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_kube_controller_manager      multiline_flush_interval 5s      path /var/log/kube-controller-manager.log      pos_file /var/log/fluentd-kube-controller-manager.log.pos      tag kube-controller-manager      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_kube_scheduler      multiline_flush_interval 5s      path /var/log/kube-scheduler.log      pos_file /var/log/fluentd-kube-scheduler.log.pos      tag kube-scheduler      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_rescheduler      multiline_flush_interval 5s      path /var/log/rescheduler.log      pos_file /var/log/fluentd-rescheduler.log.pos      tag rescheduler      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_glbc      multiline_flush_interval 5s      path /var/log/glbc.log      pos_file /var/log/fluentd-glbc.log.pos      tag glbc      <parse>        @type kubernetes      </parse>    </source>    <source>      @type tail      @id in_tail_cluster_autoscaler      multiline_flush_interval 5s      path /var/log/cluster-autoscaler.log      pos_file /var/log/fluentd-cluster-autoscaler.log.pos      tag cluster-autoscaler      <parse>        @type kubernetes      </parse>    </source>    # Example:    # 2017-02-09T00:15:57.992775796Z AUDIT: id="90c73c7c-97d6-4b65-9461-f94606ff825f" ip="104.132.1.72" method="GET" user="kubecfg" as="<self>" asgroups="<lookup>" namespace="default" uri="/api/v1/namespaces/default/pods"    # 2017-02-09T00:15:57.993528822Z AUDIT: id="90c73c7c-97d6-4b65-9461-f94606ff825f" response="200"    <source>      @type tail      @id in_tail_kube_apiserver_audit      multiline_flush_interval 5s      path /var/log/kubernetes/kube-apiserver-audit.log      pos_file /var/log/kube-apiserver-audit.log.pos      tag kube-apiserver-audit      <parse>        @type multiline        format_firstline /^\S+\s+AUDIT:/        # Fields must be explicitly captured by name to be parsed into the record.        # Fields may not always be present, and order may change, so this just looks        # for a list of key="\"quoted\" value" pairs separated by spaces.        # Unknown fields are ignored.        # Note: We can't separate query/response lines as format1/format2 because        #       they don't always come one after the other for a given query.        format1 /^(?<time>\S+) AUDIT:(?: (?:id="(?<id>(?:[^"\\]|\\.)*)"|ip="(?<ip>(?:[^"\\]|\\.)*)"|method="(?<method>(?:[^"\\]|\\.)*)"|user="(?<user>(?:[^"\\]|\\.)*)"|groups="(?<groups>(?:[^"\\]|\\.)*)"|as="(?<as>(?:[^"\\]|\\.)*)"|asgroups="(?<asgroups>(?:[^"\\]|\\.)*)"|namespace="(?<namespace>(?:[^"\\]|\\.)*)"|uri="(?<uri>(?:[^"\\]|\\.)*)"|response="(?<response>(?:[^"\\]|\\.)*)"|\w+="(?:[^"\\]|\\.)*"))*/        time_format %Y-%m-%dT%T.%L%Z      </parse>    </source>

这里要说一个问题,如果你的Docker地址在各个节点上都统一,请略过;如果不统一,请看过去

比方我这里因为磁盘挂的目录不统一,所有docker目录也不统一,这个有个问题:

fluentd会采集/var/log/containers目录下的所有日志,我以一个容器日志的举例:

会发现日志文件最终链接的还是Docker下的日志文件,所以如果Docker的目录不是/var/lib/docker,须要调整上述配置,否则会呈现采集不到日志的状况

- name: varlibdockercontainers  mountPath: /var/lib/docker/containers      ## 配置Docker的挂载地址  readOnly: true

若各节点Docker地址不雷同,全副挂载

- name: varlibdockercontainers    mountPath: /var/lib/docker/containers    readOnly: true- name: varlibdockercontainers2  mountPath: /app/docker/containers  readOnly: true    - name: varlibdockercontainers3  mountPath: /home/docker/containers  readOnly: true

其余的配置项能够依据本人的须要批改,提交资源文件

kubectl apply -f fluentd-es-configmap.yamlkubectl apply -f fluentd-es-ds.yaml

部署结束后,可查看 elastic-system 命名空间下曾经部署了 fluentd

❯ kubectl get all -n elastic-systemNAME                             READY   STATUS    RESTARTS   AGEpod/elastic-es-default-0         1/1     Running   0          10dpod/elastic-es-default-1         1/1     Running   0          10dpod/elastic-es-default-2         1/1     Running   0          10dpod/elastic-operator-0           1/1     Running   1          10dpod/fluentd-es-lrmqt             1/1     Running   0          4d6hpod/fluentd-es-rd6xz             1/1     Running   0          4d6hpod/fluentd-es-spq54             1/1     Running   0          4d6hpod/fluentd-es-xc6pv             1/1     Running   0          4d6hpod/kibana-kb-5bcd9f45dc-hzc9s   1/1     Running   0          10dNAME                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGEservice/elastic-es-default       ClusterIP   None           <none>        9200/TCP   10dservice/elastic-es-http          ClusterIP   172.23.4.246   <none>        9200/TCP   10dservice/elastic-es-transport     ClusterIP   None           <none>        9300/TCP   10dservice/elastic-webhook-server   ClusterIP   172.23.8.16    <none>        443/TCP    10dservice/kibana-kb-http           ClusterIP   172.23.7.101   <none>        5601/TCP   10dNAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGEdaemonset.apps/fluentd-es   4         4         4       4            4           <none>          4d6hNAME                        READY   UP-TO-DATE   AVAILABLE   AGEdeployment.apps/kibana-kb   1/1     1            1           10dNAME                                   DESIRED   CURRENT   READY   AGEreplicaset.apps/kibana-kb-5bcd9f45dc   1         1         1       10dNAME                                  READY   AGEstatefulset.apps/elastic-es-default   3/3     10dstatefulset.apps/elastic-operator     1/1     10d

拜访Kibana

Kibana部署当前,默认是ClusterIp形式,并不能拜访到,能够开启hostport或者nodeport来拜访,以kuboard为例,我开启了hostport端口后,通过节点Ip加端口号即可拜访Kibana。

Kibana默认用户名是elastic,密钥须要通过以下命令取得:

❯ kubectl get secret quickstart-es-elastic-user -n elastic-system -o=jsonpath='{.data.elastic}' | base64 --decode; echo02fY4QjAC0C9361i0ftBA4Zo

至此部署过程完结了,至于应用过程,我感觉还是须要联合日志规定再具体说一说,此篇就此结束。