前言
随着生产环境局部服务曾经切换至Kubernetes,最近的工作都围绕着周边基础设施来进行,最先思考到的就是架构可观测性的欠缺,包含了日志零碎和监控零碎,相比之下我感觉日志零碎更为迫切些。
对于日志这个事件其实可大可小,所有取决于利用的规模,日志的重要度以及复杂度会随着规模的变大而直线回升。日系零碎蕴含的货色其实很多,一方面蕴含日志根底设置的建设,更重要的在于日志的标准的制订和体系的建设。这次除了须要搭建Kubernetes下的日志平台,还想趁着这次机会梳理一下与日志相干的方方面面,来建设较为欠缺的日志体系。
基于开源的日志计划首选就是ELK,未引入容器化之前咱们采纳的也是ELK + Filebeat,全副基于文件采集的形式,那么这次日志环境搭建的思路就是正确的把这套日志计划移入Kubernetes环境,当然基于EFK(Elasticsearch、Fluentd、Kibana)的计划也是 Kubernetes 官网比拟举荐的一种计划。
包含:
- 利用FileBeats、Fluentd等采集Agent实现容器上的数据对立收集。(Fluentd因为性能、与Kubernetes的无缝集成以及是CNCF会员我的项目等起因代替了Logstash)。
- 采集的数据能够对接ElasticSearch来做实时的查问检索。
- 数据的可视化能够应用grafana、kibana等罕用的可视化组件。
以咱们目前的日志要求以及量级,以上计划就足够了,至于定制化的数据荡涤,实时或者离线的数据分析,这些作为后续扩大的储备。
Kubernetes下日志特点
在 Kubernetes 中,日志采集相比传统虚拟机、物理机形式要简单很多,最基本的起因是 Kubernetes 把底层异样屏蔽,提供更加细粒度的资源调度,向上提供稳固、动静的环境。因而日志采集面对的是更加丰盛、动静的环境,须要思考的点也更加的多。
- 日志的模式变得更加简单,不仅有物理机/虚拟机上的日志,还有容器的规范输入、容器内的文件、容器事件、Kubernetes 事件等等信息须要采集。
- 环境的动态性变强,在 Kubernetes 中,机器的宕机、下线、上线、Pod销毁、扩容/缩容等都是常态,这种状况下日志的存在是刹时的(例如如果 Pod 销毁后该 Pod 日志就不可见了),所以日志数据必须实时采集到服务端。同时还须要保障日志的采集可能适应这种动态性极强的场景;
- 日志的品种变多,一个申请从客户端须要通过 CDN、Ingress、Service Mesh、Pod 等多个组件,波及多种基础设施,其中的日志品种减少了很多,例如 K8s 各种零碎组件日志、审计日志、ServiceMesh 日志、Ingress 等;
采集形式
日志的采集形式次要分为被动和被动,被动推送包含业务直写和DockerEngine 推送两种形式,前者在利用中集成日志sdk,和利用强绑定,后者太依赖容器运行时,灵活性也不够,这里都不做思考。
至于被动形式,次要依附在节点上运行日志Agent,通过轮训形式(阿里的logtail貌似是事件告诉形式)采集节点日志。在Kubernetes环境下,也有两种形式:
- DaemonSet 形式在每个 node 节点上只运行一个日志 agent,采集这个节点上所有的日志。DaemonSet 绝对资源占用要小很多,但扩展性、租户隔离性受限,比拟实用于性能繁多或业务不是很多的集群;
- Sidecar 形式为每个 POD 独自部署日志 agent,这个 agent 只负责一个业务利用的日志采集。Sidecar 绝对资源占用较多,但灵活性以及多租户隔离性较强,倡议大型的 K8s 集群或作为 PaaS 平台为多个业务方服务的集群应用该形式。
网上找的一个比照表格,在这贴一下:
DockerEngine | 业务直写 | DaemonSet形式 | Sidecar形式 | |
---|---|---|---|---|
采集日志类型 | 规范输入 | 业务日志 | 规范输入+局部文件 | 文件 |
部署运维 | 低,原生反对 | 低,只需保护好配置文件即可 | 个别,需保护DaemonSet | 较高,每个须要采集日志的POD都须要部署sidecar容器 |
日志分类存储 | 无奈实现 | 业务独立配置 | 个别,可通过容器/门路等映射 | 每个POD可独自配置,灵活性高 |
多租户隔离 | 弱 | 弱,日志直写会和业务逻辑竞争资源 | 个别,只能通过配置间隔离 | 强,通过容器进行隔离,可独自分配资源 |
反对集群规模 | 本地存储无限度,若应用syslog、fluentd会有单点限度 | 无限度 | 取决于配置数 | 无限度 |
资源占用 | 低,docker | |||
engine提供 | 整体最低,省去采集开销 | 较低,每个节点运行一个容器 | 较高,每个POD运行一个容器 | |
查问便捷性 | 低,只能grep原始日志 | 高,可依据业务特点进行定制 | 较高,可进行自定义的查问、统计 | 高,可依据业务特点进行定制 |
可定制性 | 低 | 高,可自在扩大 | 低 | 高,每个POD独自配置 |
耦合度 | 高,与DockerEngine强绑定,批改须要重启DockerEngine | 高,采集模块批改/降级须要从新公布业务 | 低,Agent可独立降级 | 个别,默认采集Agent降级对应Sidecar业务也会重启(有一些扩大包能够反对Sidecar热降级) |
实用场景 | 测试、POC等非生产场景 | 对性能要求极高的场景 | 日志分类明确、性能较繁多的集群 | 大型、混合型、PAAS型集群 |
通过比照后,DaemonSet形式最适宜目前的状况。
日志输入形式
和虚拟机/物理机不同,K8s 的容器提供规范输入和文件两种形式。在容器中,规范输入将日志间接输入到 stdout 或 stderr,而 DockerEngine 接管 stdout 和 stderr 文件描述符,将日志接管后依照 DockerEngine 配置的 LogDriver 规定进行解决;日志打印到文件的形式和虚拟机/物理机根本相似,只是日志能够应用不同的存储形式,例如默认存储、EmptyDir、HostVolume、NFS 等。
然而stdout形式并不是不写文件。例如Docker的JSON LogDriver日志输入的过程包含:利用 stdout -> DockerEngine -> LogDriver -> 序列化成 JSON -> 保留到文件,最初由日志的Agent收集。
相比之下:
- 文件形式性能会更好一点,因为Stdout形式两头会通过好几个流程。
- 文件形式不同的日志能够放入不同的文件,在采集和剖析过程中达到了分类的成果;而Stdout形式输入都在一个流中。
- 操作文件的策略会更加多样化,例如同步/异步写入、缓存大小、文件轮转策略、压缩策略、革除策略等,绝对更加灵便。
所以在搭建根底平台时我会先应用stdout形式,文件形式更多会依赖日志体系的具体规定。
EFK日志计划原理
EFK 利用部署在每个节点上的 Fluentd 采集 Kubernetes 节点服务器的 /var/log
和 /var/lib/docker/container
两个目录下的日志,而后传到 Elasticsearch 中。最初,用户通过拜访 Kibana 来查问日志。
具体过程如下:
- 创立 Fluentd 并且将 Kubernetes 节点服务器 log 目录挂载进容器。
- Fluentd 采集节点服务器 log 目录下的 containers 外面的日志文件。
- Fluentd 将收集的日志转换成 JSON 格局。
- Fluentd 利用 Exception Plugin 检测日志是否为容器抛出的异样日志,如果是就将异样栈的多行日志合并。
- Fluentd 将换行多行日志 JSON 合并。
- Fluentd 应用 Kubernetes Metadata Plugin 检测出 Kubernetes 的 Metadata 数据进行过滤,如 Namespace、Pod Name 等。
- Fluentd 应用 ElasticSearch Plugin 将整顿完的 JSON 日志输入到 ElasticSearch 中。
- ElasticSearch 建设对应索引,长久化日志信息。
- Kibana 检索 ElasticSearch 中 Kubernetes 日志相干信息进行展现。
EFK的装置和配置
ECK
对于ECK(Elastic Cloud on Kubernetes)官网是这么阐明的:
Elastic Cloud on Kubernetes 简化了在 Kubernetes 中运行 Elasticsearch 和 Kibana 的作业,包含设置、降级、快照、扩大、高可用性、安全性等。
简而言之就是官网提供的一种新的基于Kubernetes的简便的部署Elasticsearch的形式。ECK形式装置的架构图如下:
Local PV和local-path-provisioner的应用
在装置Elasticsearch之前,是须要扯一下存储问题的,因为Elasticsearch须要存储日志数据,所以它并不是一个无状态利用,须要为它筹备相应的长久化存储,咱们并没有云存储或者nfs,还好Kubernetes提供了Local PV的概念基于本地磁盘来提供长久化存储。然而目前还是有一些局限性:
1、目前Local PV不反对对空间申请治理,须要手动对空间进行配置和治理。
2、默认Local PV的StorageClass的provisioner是kubernetes.io/no-provisioner , 这是因为Local PV不反对Dynamic Provisioning, 所以它没有方法在创立出PVC的时候, 主动创立对应PV。
对于PV、PVC、StorageClass这些概念我就不反复了,下面是什么意思呢?意思是Kubernetes的动态存储分配是通过StorageClass实现的,而Local PV的StorageClass没有对应的provisioner。所以没有方法做到动静提供PV,须要提前将pv创立好,而后再与pvc绑定后能力应用,这无疑是比拟麻烦的,当然如果是学习目标的话,手动创立PV和PVC也未尝不可。所以社区和一些厂商针对Local PV提供了相应的provisioner包,这里以Rancher开源的local-path-provisioner为例。
装置local-path-provisioner
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml
网络起因下不来的话我能够用上面的
apiVersion: v1kind: Namespacemetadata: name: local-path-storage---apiVersion: v1kind: ServiceAccountmetadata: name: local-path-provisioner-service-account namespace: local-path-storage---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: local-path-provisioner-rolerules: - apiGroups: [ "" ] resources: [ "nodes", "persistentvolumeclaims", "configmaps" ] verbs: [ "get", "list", "watch" ] - apiGroups: [ "" ] resources: [ "endpoints", "persistentvolumes", "pods" ] verbs: [ "*" ] - apiGroups: [ "" ] resources: [ "events" ] verbs: [ "create", "patch" ] - apiGroups: [ "storage.k8s.io" ] resources: [ "storageclasses" ] verbs: [ "get", "list", "watch" ]---apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: local-path-provisioner-bindroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: local-path-provisioner-rolesubjects: - kind: ServiceAccount name: local-path-provisioner-service-account namespace: local-path-storage---apiVersion: apps/v1kind: Deploymentmetadata: name: local-path-provisioner namespace: local-path-storagespec: replicas: 1 selector: matchLabels: app: local-path-provisioner template: metadata: labels: app: local-path-provisioner spec: serviceAccountName: local-path-provisioner-service-account containers: - name: local-path-provisioner image: rancher/local-path-provisioner:v0.0.19 imagePullPolicy: IfNotPresent command: - local-path-provisioner - --debug - start - --config - /etc/config/config.json volumeMounts: - name: config-volume mountPath: /etc/config/ env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumes: - name: config-volume configMap: name: local-path-config---apiVersion: storage.k8s.io/v1kind: StorageClassmetadata: name: local-pathprovisioner: rancher.io/local-pathvolumeBindingMode: WaitForFirstConsumerreclaimPolicy: Delete---kind: ConfigMapapiVersion: v1metadata: name: local-path-config namespace: local-path-storagedata: config.json: |- { "nodePathMap":[ { "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES", "paths":["/home/k8s"] }, { "node":"master02", "paths":["/opt/local-path-provisioner", "/app/k8s"] }, { "node":"node05", "paths":["/opt/local-path-provisioner", "/app/k8s"] } ] } setup: |- #!/bin/sh while getopts "m:s:p:" opt do case $opt in p) absolutePath=$OPTARG ;; s) sizeInBytes=$OPTARG ;; m) volMode=$OPTARG ;; esac done mkdir -m 0777 -p ${absolutePath} teardown: |- #!/bin/sh while getopts "m:s:p:" opt do case $opt in p) absolutePath=$OPTARG ;; s) sizeInBytes=$OPTARG ;; m) volMode=$OPTARG ;; esac done rm -rf ${absolutePath} helperPod.yaml: |- apiVersion: v1 kind: Pod metadata: name: helper-pod spec: containers: - name: helper-pod image: busybox imagePullPolicy: IfNotPresent
配置Local Path Provisioner
Local Path Provisioner反对一些配置,具体的能够看官网文档,我就说一点:
我本身的服务器磁盘挂载的目录各有不同,所以当ElasticSearch在不同节点上运行时,我心愿数据放在不同的中央,Local Path Provisioner会把数据默认放在
/opt/local-path-provisioner
目录下,能够通过上面这段配置设置,DEFAULT_PATH_FOR_NON_LISTED_NODES
参数配置默认的数据目录。config.json: |- { "nodePathMap":[ { "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES", "paths":["/home/k8s"] }, { "node":"master02", "paths":["/opt/local-path-provisioner", "/app/k8s"] }, { "node":"node05", "paths":["/opt/local-path-provisioner", "/app/k8s"] } ] }
ECK装置
装置 Operator
## 装置kubectl apply -f https://download.elastic.co/downloads/eck/1.5.0/all-in-one.yaml## 删除kubectl delete -f https://download.elastic.co/downloads/eck/1.5.0/all-in-one.yaml
装置胜利后,会主动创立一个 elastic-system
的 namespace
以及一个 operator
的 Pod
:
❯ kubectl get all -n elastic-systemNAME READY STATUS RESTARTS AGEpod/elastic-operator-0 1/1 Running 0 53sNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/elastic-webhook-server ClusterIP 10.0.73.219 <none> 443/TCP 55sNAME READY AGEstatefulset.apps/elastic-operator 1/1 57s
部署ECK
提供一份ECK的资源文件,对于ECK的配置具体参照文档。
apiVersion: elasticsearch.k8s.elastic.co/v1kind: Elasticsearchmetadata: name: quickstart namespace: elastic-systemspec: version: 7.12.1 nodeSets: - name: default count: 3 volumeClaimTemplates: - metadata: name: elasticsearch-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: local-path config: node.master: true node.data: true node.ingest: true node.store.allow_mmap: false---apiVersion: kibana.k8s.elastic.co/v1kind: Kibanametadata: name: quickstart namespace: elastic-systemspec: version: 7.12.1 count: 1 elasticsearchRef: name: quickstart config: i18n.locale: "zh-CN"
其中storageClassName: local-path
是Local Path Provisioner提供的storageClass名称,因为默认Kibana是英文的,通过i18n.locale: "zh-CN"
设置为中文。
kubectl apply -f eck.yaml
提交资源文件,部署结束后,可查看 elastic-system
命名空间下曾经部署了 Elasticsearch
和 Kibana
:
❯ kubectl get all -n elastic-systemNAME READY STATUS RESTARTS AGEpod/elastic-es-default-0 1/1 Running 0 10dpod/elastic-es-default-1 1/1 Running 0 10dpod/elastic-es-default-2 1/1 Running 0 10dpod/elastic-operator-0 1/1 Running 1 10dpod/kibana-kb-5bcd9f45dc-hzc9s 1/1 Running 0 10dNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/elastic-es-default ClusterIP None <none> 9200/TCP 10dservice/elastic-es-http ClusterIP 172.23.4.246 <none> 9200/TCP 10dservice/elastic-es-transport ClusterIP None <none> 9300/TCP 10dservice/elastic-webhook-server ClusterIP 172.23.8.16 <none> 443/TCP 10dservice/kibana-kb-http ClusterIP 172.23.7.101 <none> 5601/TCP 10dNAME READY UP-TO-DATE AVAILABLE AGEdeployment.apps/kibana-kb 1/1 1 1 10dNAME DESIRED CURRENT READY AGEreplicaset.apps/kibana-kb-5bcd9f45dc 1 1 1 10dNAME READY AGEstatefulset.apps/elastic-es-default 3/3 10dstatefulset.apps/elastic-operator 1/1 10d
部署Fluentd
Fluent 在 github 上保护了 fluentd-kubernetes-daemonset 我的项目,能够供咱们参考。
# fluentd-es-ds.yamlapiVersion: v1kind: ServiceAccountmetadata: name: fluentd-es namespace: elastic-system labels: app: fluentd-es---kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata: name: fluentd-es labels: app: fluentd-esrules:- apiGroups: - "" resources: - "namespaces" - "pods" verbs: - "get" - "watch" - "list"---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata: name: fluentd-es labels: app: fluentd-essubjects:- kind: ServiceAccount name: fluentd-es namespace: elastic-system apiGroup: ""roleRef: kind: ClusterRole name: fluentd-es apiGroup: ""---apiVersion: apps/v1kind: DaemonSetmetadata: name: fluentd-es namespace: elastic-system labels: app: fluentd-esspec: selector: matchLabels: app: fluentd-es template: metadata: labels: app: fluentd-es spec: serviceAccount: fluentd-es serviceAccountName: fluentd-es tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: fluentd-es image: fluent/fluentd-kubernetes-daemonset:v1.11.5-debian-elasticsearch7-1.1 env: - name: FLUENT_ELASTICSEARCH_HOST value: quickstart-es-http # default user - name: FLUENT_ELASTICSEARCH_USER value: elastic # is already present from the elasticsearch deployment - name: FLUENT_ELASTICSEARCH_PASSWORD valueFrom: secretKeyRef: name: quickstart-es-elastic-user key: elastic # elasticsearch standard port - name: FLUENT_ELASTICSEARCH_PORT value: "9200" # der elastic operator ist https standard - name: FLUENT_ELASTICSEARCH_SCHEME value: "https" # dont need systemd logs for now - name: FLUENTD_SYSTEMD_CONF value: disable # da certs self signt sind muss verify disabled werden - name: FLUENT_ELASTICSEARCH_SSL_VERIFY value: "false" # to avoid issue https://github.com/uken/fluent-plugin-elasticsearch/issues/525 - name: FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS value: "false" resources: limits: memory: 512Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true - name: config-volume mountPath: /fluentd/etc terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers - name: config-volume configMap: name: fluentd-es-config
fluentd配置资源文件如下
# fluentd-es-configmapkind: ConfigMapapiVersion: v1metadata: name: fluentd-es-config namespace: elastic-systemdata: fluent.conf: |- # https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/docker-image/v1.11/debian-elasticsearch7/conf/fluent.conf @include "#{ENV['FLUENTD_SYSTEMD_CONF'] || 'systemd'}.conf" @include "#{ENV['FLUENTD_PROMETHEUS_CONF'] || 'prometheus'}.conf" @include kubernetes.conf @include conf.d/*.conf <match kubernetes.**> # https://github.com/kubernetes/kubernetes/issues/23001 @type elasticsearch_dynamic @id kubernetes_elasticsearch @log_level info include_tag_key true host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}" port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}" path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}" scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}" ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}" ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}" user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}" password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}" reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}" reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}" reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}" log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}" logstash_prefix logstash-${record['kubernetes']['namespace_name']} logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}" logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}" index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}" target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}" type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}" include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}" template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}" template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}" template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}" sniffer_class_name "#{ENV['FLUENT_SNIFFER_CLASS_NAME'] || 'Fluent::Plugin::ElasticsearchSimpleSniffer'}" request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '5s'}" suppress_type_name "#{ENV['FLUENT_ELASTICSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}" enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}" ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}" ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}" ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}" <buffer> flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}" flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}" chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}" queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}" retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}" retry_forever true </buffer> </match> <match **> @type elasticsearch @id out_es @log_level info include_tag_key true host "#{ENV['FLUENT_ELASTICSEARCH_HOST']}" port "#{ENV['FLUENT_ELASTICSEARCH_PORT']}" path "#{ENV['FLUENT_ELASTICSEARCH_PATH']}" scheme "#{ENV['FLUENT_ELASTICSEARCH_SCHEME'] || 'http'}" ssl_verify "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERIFY'] || 'true'}" ssl_version "#{ENV['FLUENT_ELASTICSEARCH_SSL_VERSION'] || 'TLSv1_2'}" user "#{ENV['FLUENT_ELASTICSEARCH_USER'] || use_default}" password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD'] || use_default}" reload_connections "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_CONNECTIONS'] || 'false'}" reconnect_on_error "#{ENV['FLUENT_ELASTICSEARCH_RECONNECT_ON_ERROR'] || 'true'}" reload_on_failure "#{ENV['FLUENT_ELASTICSEARCH_RELOAD_ON_FAILURE'] || 'true'}" log_es_400_reason "#{ENV['FLUENT_ELASTICSEARCH_LOG_ES_400_REASON'] || 'false'}" logstash_prefix "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_PREFIX'] || 'logstash'}" logstash_dateformat "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_DATEFORMAT'] || '%Y.%m.%d'}" logstash_format "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_FORMAT'] || 'true'}" index_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_INDEX_NAME'] || 'logstash'}" target_index_key "#{ENV['FLUENT_ELASTICSEARCH_TARGET_INDEX_KEY'] || use_nil}" type_name "#{ENV['FLUENT_ELASTICSEARCH_LOGSTASH_TYPE_NAME'] || 'fluentd'}" include_timestamp "#{ENV['FLUENT_ELASTICSEARCH_INCLUDE_TIMESTAMP'] || 'false'}" template_name "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_NAME'] || use_nil}" template_file "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_FILE'] || use_nil}" template_overwrite "#{ENV['FLUENT_ELASTICSEARCH_TEMPLATE_OVERWRITE'] || use_default}" sniffer_class_name "#{ENV['FLUENT_SNIFFER_CLASS_NAME'] || 'Fluent::Plugin::ElasticsearchSimpleSniffer'}" request_timeout "#{ENV['FLUENT_ELASTICSEARCH_REQUEST_TIMEOUT'] || '5s'}" suppress_type_name "#{ENV['FLUENT_ELASTICSEARCH_SUPPRESS_TYPE_NAME'] || 'true'}" enable_ilm "#{ENV['FLUENT_ELASTICSEARCH_ENABLE_ILM'] || 'false'}" ilm_policy_id "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_ID'] || use_default}" ilm_policy "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY'] || use_default}" ilm_policy_overwrite "#{ENV['FLUENT_ELASTICSEARCH_ILM_POLICY_OVERWRITE'] || 'false'}" <buffer> flush_thread_count "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_THREAD_COUNT'] || '8'}" flush_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_FLUSH_INTERVAL'] || '5s'}" chunk_limit_size "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_CHUNK_LIMIT_SIZE'] || '2M'}" queue_limit_length "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_QUEUE_LIMIT_LENGTH'] || '32'}" retry_max_interval "#{ENV['FLUENT_ELASTICSEARCH_BUFFER_RETRY_MAX_INTERVAL'] || '30'}" retry_forever true </buffer> </match> kubernetes.conf: |- # https://github.com/fluent/fluentd-kubernetes-daemonset/blob/master/docker-image/v1.11/debian-elasticsearch7/conf/kubernetes.conf <label @FLUENT_LOG> <match fluent.**> @type null @id ignore_fluent_logs </match> </label> <source> @id fluentd-containers.log @type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos tag raw.kubernetes.* read_from_head true <parse> @type multi_format <pattern> format json time_key time time_format %Y-%m-%dT%H:%M:%S.%NZ </pattern> <pattern> format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ time_format %Y-%m-%dT%H:%M:%S.%N%:z </pattern> </parse> </source> # Detect exceptions in the log output and forward them as one log entry. <match raw.kubernetes.**> @id raw.kubernetes @type detect_exceptions remove_tag_prefix raw message log stream stream multiline_flush_interval 5 chunk_limit_size 512m max_bytes 50000000 max_lines 1000 </match> # Concatenate multi-line logs <filter **> @id filter_concat @type concat key message multiline_end_regexp /\n$/ separator "" </filter> # Enriches records with Kubernetes metadata <filter kubernetes.**> @id filter_kubernetes_metadata @type kubernetes_metadata </filter> # Fixes json fields in Elasticsearch <filter kubernetes.**> @id filter_parser @type parser key_name log reserve_data true remove_key_name_field true <parse> @type multi_format <pattern> format json </pattern> <pattern> format none </pattern> </parse> </filter> <source> @type tail @id in_tail_minion path /var/log/salt/minion pos_file /var/log/fluentd-salt.pos tag salt <parse> @type regexp expression /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/ time_format %Y-%m-%d %H:%M:%S </parse> </source> <source> @type tail @id in_tail_startupscript path /var/log/startupscript.log pos_file /var/log/fluentd-startupscript.log.pos tag startupscript <parse> @type syslog </parse> </source> <source> @type tail @id in_tail_docker path /var/log/docker.log pos_file /var/log/fluentd-docker.log.pos tag docker <parse> @type regexp expression /^time="(?<time>[^)]*)" level=(?<severity>[^ ]*) msg="(?<message>[^"]*)"( err="(?<error>[^"]*)")?( statusCode=($<status_code>\d+))?/ </parse> </source> <source> @type tail @id in_tail_etcd path /var/log/etcd.log pos_file /var/log/fluentd-etcd.log.pos tag etcd <parse> @type none </parse> </source> <source> @type tail @id in_tail_kubelet multiline_flush_interval 5s path /var/log/kubelet.log pos_file /var/log/fluentd-kubelet.log.pos tag kubelet <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_kube_proxy multiline_flush_interval 5s path /var/log/kube-proxy.log pos_file /var/log/fluentd-kube-proxy.log.pos tag kube-proxy <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_kube_apiserver multiline_flush_interval 5s path /var/log/kube-apiserver.log pos_file /var/log/fluentd-kube-apiserver.log.pos tag kube-apiserver <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_kube_controller_manager multiline_flush_interval 5s path /var/log/kube-controller-manager.log pos_file /var/log/fluentd-kube-controller-manager.log.pos tag kube-controller-manager <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_kube_scheduler multiline_flush_interval 5s path /var/log/kube-scheduler.log pos_file /var/log/fluentd-kube-scheduler.log.pos tag kube-scheduler <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_rescheduler multiline_flush_interval 5s path /var/log/rescheduler.log pos_file /var/log/fluentd-rescheduler.log.pos tag rescheduler <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_glbc multiline_flush_interval 5s path /var/log/glbc.log pos_file /var/log/fluentd-glbc.log.pos tag glbc <parse> @type kubernetes </parse> </source> <source> @type tail @id in_tail_cluster_autoscaler multiline_flush_interval 5s path /var/log/cluster-autoscaler.log pos_file /var/log/fluentd-cluster-autoscaler.log.pos tag cluster-autoscaler <parse> @type kubernetes </parse> </source> # Example: # 2017-02-09T00:15:57.992775796Z AUDIT: id="90c73c7c-97d6-4b65-9461-f94606ff825f" ip="104.132.1.72" method="GET" user="kubecfg" as="<self>" asgroups="<lookup>" namespace="default" uri="/api/v1/namespaces/default/pods" # 2017-02-09T00:15:57.993528822Z AUDIT: id="90c73c7c-97d6-4b65-9461-f94606ff825f" response="200" <source> @type tail @id in_tail_kube_apiserver_audit multiline_flush_interval 5s path /var/log/kubernetes/kube-apiserver-audit.log pos_file /var/log/kube-apiserver-audit.log.pos tag kube-apiserver-audit <parse> @type multiline format_firstline /^\S+\s+AUDIT:/ # Fields must be explicitly captured by name to be parsed into the record. # Fields may not always be present, and order may change, so this just looks # for a list of key="\"quoted\" value" pairs separated by spaces. # Unknown fields are ignored. # Note: We can't separate query/response lines as format1/format2 because # they don't always come one after the other for a given query. format1 /^(?<time>\S+) AUDIT:(?: (?:id="(?<id>(?:[^"\\]|\\.)*)"|ip="(?<ip>(?:[^"\\]|\\.)*)"|method="(?<method>(?:[^"\\]|\\.)*)"|user="(?<user>(?:[^"\\]|\\.)*)"|groups="(?<groups>(?:[^"\\]|\\.)*)"|as="(?<as>(?:[^"\\]|\\.)*)"|asgroups="(?<asgroups>(?:[^"\\]|\\.)*)"|namespace="(?<namespace>(?:[^"\\]|\\.)*)"|uri="(?<uri>(?:[^"\\]|\\.)*)"|response="(?<response>(?:[^"\\]|\\.)*)"|\w+="(?:[^"\\]|\\.)*"))*/ time_format %Y-%m-%dT%T.%L%Z </parse> </source>
这里要说一个问题,如果你的Docker地址在各个节点上都统一,请略过;如果不统一,请看过去
比方我这里因为磁盘挂的目录不统一,所有docker目录也不统一,这个有个问题:
fluentd会采集/var/log/containers目录下的所有日志,我以一个容器日志的举例:
会发现日志文件最终链接的还是Docker下的日志文件,所以如果Docker的目录不是/var/lib/docker
,须要调整上述配置,否则会呈现采集不到日志的状况
- name: varlibdockercontainers mountPath: /var/lib/docker/containers ## 配置Docker的挂载地址 readOnly: true
若各节点Docker地址不雷同,全副挂载
- name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true- name: varlibdockercontainers2 mountPath: /app/docker/containers readOnly: true - name: varlibdockercontainers3 mountPath: /home/docker/containers readOnly: true
其余的配置项能够依据本人的须要批改,提交资源文件
kubectl apply -f fluentd-es-configmap.yamlkubectl apply -f fluentd-es-ds.yaml
部署结束后,可查看 elastic-system
命名空间下曾经部署了 fluentd
❯ kubectl get all -n elastic-systemNAME READY STATUS RESTARTS AGEpod/elastic-es-default-0 1/1 Running 0 10dpod/elastic-es-default-1 1/1 Running 0 10dpod/elastic-es-default-2 1/1 Running 0 10dpod/elastic-operator-0 1/1 Running 1 10dpod/fluentd-es-lrmqt 1/1 Running 0 4d6hpod/fluentd-es-rd6xz 1/1 Running 0 4d6hpod/fluentd-es-spq54 1/1 Running 0 4d6hpod/fluentd-es-xc6pv 1/1 Running 0 4d6hpod/kibana-kb-5bcd9f45dc-hzc9s 1/1 Running 0 10dNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEservice/elastic-es-default ClusterIP None <none> 9200/TCP 10dservice/elastic-es-http ClusterIP 172.23.4.246 <none> 9200/TCP 10dservice/elastic-es-transport ClusterIP None <none> 9300/TCP 10dservice/elastic-webhook-server ClusterIP 172.23.8.16 <none> 443/TCP 10dservice/kibana-kb-http ClusterIP 172.23.7.101 <none> 5601/TCP 10dNAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGEdaemonset.apps/fluentd-es 4 4 4 4 4 <none> 4d6hNAME READY UP-TO-DATE AVAILABLE AGEdeployment.apps/kibana-kb 1/1 1 1 10dNAME DESIRED CURRENT READY AGEreplicaset.apps/kibana-kb-5bcd9f45dc 1 1 1 10dNAME READY AGEstatefulset.apps/elastic-es-default 3/3 10dstatefulset.apps/elastic-operator 1/1 10d
拜访Kibana
Kibana部署当前,默认是ClusterIp形式,并不能拜访到,能够开启hostport或者nodeport来拜访,以kuboard为例,我开启了hostport端口后,通过节点Ip加端口号即可拜访Kibana。
Kibana默认用户名是elastic,密钥须要通过以下命令取得:
❯ kubectl get secret quickstart-es-elastic-user -n elastic-system -o=jsonpath='{.data.elastic}' | base64 --decode; echo02fY4QjAC0C9361i0ftBA4Zo
至此部署过程完结了,至于应用过程,我感觉还是须要联合日志规定再具体说一说,此篇就此结束。