前言
收集日志的组件多不胜数,有 ELK 久负盛名组合中的 logstash, 也有 EFK 组合中的 filebeat, 更有 cncf 新贵 fluentd, 另外还有大数据领域使用比较多的 flume。本次主要说另外一种,和 fluentd 一脉相承的 fluent bit。
Fluent Bit 是一个开源和多平台的 Log Processor and Forwarder,它允许您从不同的来源收集数据 / 日志,统一并将它们发送到多个目的地。它与 Docker 和 Kubernetes 环境完全兼容。Fluent Bit 用 C 语言编写,具有可插拔的架构,支持大约 30 个扩展。它快速轻便,通过 TLS 为网络运营提供所需的安全性。
之所以选择 fluent bit,看重了它的高性能。下面是官方贴出的一张与 fluentd 对比图:
Fluentd
Fluent Bit
Scope
Containers / Servers
Containers / Servers
Language
C & Ruby
C
Memory
~40MB
~450KB
Performance
High Performance
High Performance
Dependencies
Built as a Ruby Gem, it requires a certain number of gems.
Zero dependencies, unless some special plugin requires them.
Plugins
More than 650 plugins available
Around 35 plugins available
License
Apache License v2.0
Apache License v2.0
在已经拥有的插件满足需求和场景的前提下,fluent bit 无疑是一个很好的选择。
fluent bit 简介
在使用的这段时间之后,总结以下几点优点:
支持 routing, 适合多 output 的场景。比如有些业务日志,或写入到 es 中,供查询。或写入到 hdfs 中,供大数据进行分析。
fliter 支持 lua。对于那些对 c 语言 hold 不住的团队,可以用 lua 写自己的 filter。
output 除了官方已经支持的十几种,还支持用 golang 写 output。例如:fluent-bit-kafka-output-plugin
k8s 日志收集
k8s 日志分析
主要讲 kubeadm 部署的 k8s 集群。日志主要有:
kubelet 和 etcd 的日志,一般采用 systemd 部署,自然而然就是要支持 systemd 格式日志的采集。filebeat 并不支持该类型。
kube-apiserver 等组件 stderr 和 stdout 日志,这个一般输出的格式取决于 docker 的日志驱动,一般为 json-file。
业务落盘的日志。支持 tail 文件的采集组件都满足。这点不在今天的讨论范围之内。
部署方案
fluent bit 采取 DaemonSet 部署。如下图:
部署 yaml
—
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: “Elasticsearch”
spec:
ports:
– port: 9200
protocol: TCP
targetPort: db
selector:
k8s-app: elasticsearch-logging
—
# RBAC authn and authz
apiVersion: v1
kind: ServiceAccount
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
—
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
rules:
– apiGroups:
– “”
resources:
– “services”
– “namespaces”
– “endpoints”
verbs:
– “get”
—
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: kube-system
name: elasticsearch-logging
labels:
k8s-app: elasticsearch-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
subjects:
– kind: ServiceAccount
name: elasticsearch-logging
namespace: kube-system
apiGroup: “”
roleRef:
kind: ClusterRole
name: elasticsearch-logging
apiGroup: “”
—
# Elasticsearch deployment itself
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch-logging
namespace: kube-system
labels:
k8s-app: elasticsearch-logging
version: v6.3.0
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
spec:
serviceName: elasticsearch-logging
replicas: 2
selector:
matchLabels:
k8s-app: elasticsearch-logging
version: v6.3.0
template:
metadata:
labels:
k8s-app: elasticsearch-logging
version: v6.3.0
kubernetes.io/cluster-service: “true”
spec:
serviceAccountName: elasticsearch-logging
containers:
– image: k8s.gcr.io/elasticsearch:v6.3.0
name: elasticsearch-logging
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
ports:
– containerPort: 9200
name: db
protocol: TCP
– containerPort: 9300
name: transport
protocol: TCP
volumeMounts:
– name: elasticsearch-logging
mountPath: /data
env:
– name: “NAMESPACE”
valueFrom:
fieldRef:
fieldPath: metadata.namespace
# Elasticsearch requires vm.max_map_count to be at least 262144.
# If your OS already sets up this number to a higher value, feel free
# to remove this init container.
initContainers:
– image: alpine:3.6
command: [“/sbin/sysctl”, “-w”, “vm.max_map_count=262144”]
name: elasticsearch-logging-init
securityContext:
privileged: true
volumeClaimTemplates:
– metadata:
name: elasticsearch-logging
annotations:
volume.beta.kubernetes.io/storage-class: gp2
spec:
accessModes:
– “ReadWriteOnce”
resources:
requests:
storage: 10Gi
—
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Path /var/log/journal
DB /var/log/flb_host.db
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
K8S-Logging.Parser On
K8S-Logging.Exclude On
[FILTER]
Name kubernetes
Match host.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
Use_Journal On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^]*) [^]* (?<user>[^]*) \[(?<time>[^\]]*)\] “(?<method>\S+)(?: +(?<path>[^\”]*?)(?: +\S*)?)?” (?<code>[^]*) (?<size>[^]*)(?: “(?<referer>[^\”]*)” “(?<agent>[^\”]*)”)?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^]*) [^]* (?<user>[^]*) \[(?<time>[^\]]*)\] “(?<method>\S+)(?: +(?<path>[^]*) +\S*)?” (?<code>[^]*) (?<size>[^]*)(?: “(?<referer>[^\”]*)” “(?<agent>[^\”]*)”)?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?(\[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^]*) (?<host>[^]*) (?<user>[^]*) \[(?<time>[^\]]*)\] “(?<method>\S+)(?: +(?<path>[^\”]*?)(?: +\S*)?)?” (?<code>[^]*) (?<size>[^]*)(?: “(?<referer>[^\”]*)” “(?<agent>[^\”]*)”)?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
# Command | Decoder | Field | Optional Action
# =============|==================|=================
Decode_Field_As escaped log
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^]* {1,2}[^]* [^]*) (?<host>[^]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
—
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: “true”
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: “true”
annotations:
prometheus.io/scrape: “true”
prometheus.io/port: “2020”
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
– name: fluent-bit
image: fluent/fluent-bit:1.0.0
imagePullPolicy: Always
ports:
– containerPort: 2020
env:
– name: FLUENT_ELASTICSEARCH_HOST
value: “elasticsearch-logging”
– name: FLUENT_ELASTICSEARCH_PORT
value: “9200”
volumeMounts:
– name: varlog
mountPath: /var/log
– name: varlibdockercontainers
mountPath: /data/docker/containers
readOnly: true
– name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
– name: varlog
hostPath:
path: /var/log
– name: varlibdockercontainers
hostPath:
path: /data/docker/containers
– name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
– key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
– kind: ServiceAccount
name: fluent-bit
namespace: kube-system
—
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
– apiGroups: [“”]
resources:
– namespaces
– pods
verbs: [“get”, “list”, “watch”]
—
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
spec:
replicas: 1
selector:
matchLabels:
k8s-app: kibana-logging
template:
metadata:
labels:
k8s-app: kibana-logging
annotations:
seccomp.security.alpha.kubernetes.io/pod: ‘docker/default’
spec:
containers:
– name: kibana-logging
image: docker.elastic.co/kibana/kibana-oss:6.3.2
resources:
# need more cpu upon initialization, therefore burstable class
limits:
cpu: 1000m
requests:
cpu: 100m
env:
– name: ELASTICSEARCH_URL
value: http://elasticsearch-logging:9200
ports:
– containerPort: 5601
name: ui
protocol: TCP
—
apiVersion: v1
kind: Service
metadata:
name: kibana-logging
namespace: kube-system
labels:
k8s-app: kibana-logging
kubernetes.io/cluster-service: “true”
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: “Kibana”
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
ports:
– port: 5601
protocol: TCP
targetPort: ui
selector:
k8s-app: kibana-logging
type: LoadBalancer
—
总结
真实场景的日志收集比较复杂,在日志量大的情况下,一般要引入 kafka。此外关于注意日志的 lograte。一般来说,docker 是支持该功能的。可以通过下面的配置解决:
cat > /etc/docker/daemon.json <<EOF
{
“log-opts”: {
“max-size”: “100m”,
“max-file”: “3”
}
}
EOF
在 k8s 中运行的业务日志,不仅要考虑清除过时的日志,还要考虑新增 pod 的日志的收集。这个时候, 往往需要在 fluent bit 上面再包一层逻辑,获取需要收集的日志路径。比如 log-pilot。