关于linux:搞懂日志采集利器-Filebeat-并不难

之前也介绍过：超强干货！通过 filebeat、logstash、rsyslog 几种形式采集 nginx 日志。本文应用的 Filebeat 是 7.7.0 的版本，文章将从如下几个方面阐明：

首先 Filebeat 是 Beats 中的一员。

Beats 在是一个轻量级日志采集器，其实 Beats 家族有 6 个成员，晚期的 ELK 架构中应用 Logstash 收集、解析日志，然而 Logstash 对内存、CPU、io 等资源耗费比拟高。相比 Logstash，Beats 所占零碎的 CPU 和内存简直能够忽略不计。

目前 Beats 蕴含六种工具：

Packetbeat：网络数据（收集网络流量数据）
Metricbeat：指标（收集零碎、过程和文件系统级别的 CPU 和内存应用状况等数据）
Filebeat：日志文件（收集文件数据）
Winlogbeat：Windows 事件日志（收集 Windows 事件日志数据）
Auditbeat：审计数据（收集审计日志）
Heartbeat：运行工夫监控（收集零碎运行时的数据）

Filebeat 是用于转发和集中日志数据的轻量级传送工具。Filebeat 监督您指定的日志文件或地位，收集日志事件，并将它们转发到 Elasticsearch 或 Logstash 进行索引。

Filebeat 的工作形式如下：启动 Filebeat 时，它将启动一个或多个输出，这些输出将在为日志数据指定的地位中查找。对于 Filebeat 所找到的每个日志，Filebeat 都会启动收集器。每个收集器都读取单个日志以获取新内容，并将新日志数据发送到 libbeat，libbeat 将汇集事件，并将汇集的数据发送到为 Filebeat 配置的输入。

因为 Logstash 是 JVM 跑的，资源耗费比拟大，所以起初作者又用 Golang 写了一个性能较少然而资源耗费也小的轻量级的 logstash-forwarder。不过作者只是一个人，退出 http://elastic.co 公司当前，因为 ES 公司自身还收买了另一个开源我的项目 Packetbeat，而这个我的项目专门就是用 Golang 的，有整个团队，所以 ES 公司罗唆把 logstash-forwarder 的开发工作也合并到同一个 Golang 团队来搞，于是新的我的项目就叫 Filebeat 了。

Filebeat 构造：由两个组件形成，别离是 inputs（输出）和 harvesters（收集器），这些组件一起工作来跟踪文件并将事件数据发送到您指定的输入，harvester 负责读取单个文件的内容。harvester 逐行读取每个文件，并将内容发送到输入。为每个文件启动一个 harvester。harvester 负责关上和敞开文件，这意味着文件描述符在 harvester 运行时放弃关上状态。如果在收集文件时删除或重命名文件，Filebeat 将持续读取该文件。这样做的副作用是，磁盘上的空间始终保留到 harvester 敞开。默认状况下，Filebeat 放弃文件关上，直到达到 close_inactive。

敞开 harvester 能够会产生的后果：

文件处理程序敞开，如果 harvester 仍在读取文件时被删除，则开释底层资源。
只有在 scan_frequency 完结之后，才会再次启动文件的收集。
如果该文件在 harvester 敞开时被挪动或删除，该文件的收集将不会持续。

一个 input 负责管理 harvesters 和寻找所有起源读取。如果 input 类型是 log，则 input 将查找驱动器上与定义的门路匹配的所有文件，并为每个文件启动一个 harvester。每个 input 在它本人的 Go 过程中运行，Filebeat 以后反对多种输出类型。每个输出类型能够定义屡次。日志输出查看每个文件，以查看是否须要启动 harvester、是否曾经在运行 harvester 或是否能够疏忽该文件。

Filebeat 保留每个文件的状态，并常常将状态刷新到磁盘中的注册表文件中。该状态用于记住 harvester 读取的最初一个偏移量，并确保发送所有日志行。如果无法访问输入（如 Elasticsearch 或 Logstash），Filebeat 将跟踪最初发送的行，并在输入再次可用时持续读取文件。当 Filebeat 运行时，每个输出的状态信息也保留在内存中。当 Filebeat 重新启动时，来自注册表文件的数据用于重建状态，Filebeat 在最初一个已知地位持续每个 harvester。对于每个输出，Filebeat 都会保留它找到的每个文件的状态。因为文件能够重命名或挪动，文件名和门路不足以标识文件。对于每个文件，Filebeat 存储惟一的标识符，以检测文件是否以前被捕捉。

Filebeat 保障事件将至多传递到配置的输入一次，并且不会失落数据。是因为它将每个事件的传递状态存储在注册表文件中。在已定义的输入被阻止且未确认所有事件的状况下，Filebeat 将持续尝试发送事件，直到输入确认已接管到事件为止。如果 Filebeat 在发送事件的过程中敞开，它不会期待输入确认所有事件后再敞开。当 Filebeat 重新启动时，将再次将 Filebeat 敞开前未确认的所有事件发送到输入。这样能够确保每个事件至多发送一次，但最终可能会有反复的事件发送到输入。通过设置 shutdown_timeout 选项，能够将 Filebeat 配置为在关机前期待特定工夫。

本文采纳压缩包的形式装置，Linux 版本，filebeat-7.7.0-linux-x86_64.tar.gz。

curl-L-Ohttps://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gz
tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

配置示例文件：filebeat.reference.yml（蕴含所有未过期的配置项）

配置文件：filebeat.yml

详情见官网：https://www.elastic.co/guide/…

export   #导出
run      #执行（默认执行）test     #测试配置
keystore #秘钥存储
modules  #模块配置管理
setup    #设置初始环境
例如：./filebeat test config #用来测试配置文件是否正确

反对的输出组件：

Multilinemessages，Azureeventhub，CloudFoundry，Container，Docker，GooglePub/Sub，HTTPJSON，Kafka，Log，MQTT，NetFlow，Office 365 Management Activity API，Redis，s3，Stdin，Syslog，TCP，UDP（最罕用的就是 Log）

反对的输入组件：

Elasticsearch，Logstash，Kafka，Redis，File，Console，ElasticCloud，Changetheoutputcodec（最罕用的就是 Elasticsearch，Logstash）

keystore 次要是避免敏感信息被泄露，比方明码等，像 ES 的明码，这里能够生成一个 key 为 ES_PWD，值为 ES 的 password 的一个对应关系，在应用 ES 的明码的时候就能够应用 ${ES_PWD} 应用。

 创立一个存储明码的 keystore：filebeat keystore create
而后往其中增加键值对，例如：filebeatk eystore add ES_PWD
应用笼罩原来键的值：filebeat key store add ES_PWD–force
删除键值对：filebeat key store remove ES_PWD
查看已有的键值对：filebeat key store list

例如：前期就能够通过 ${ES_PWD} 应用其值，例如：

output.elasticsearch.password:"${ES_PWD}"

详情见官网：https://www.elastic.co/guide/…

type: log #input 类型为 log
enable: true #示意是该 log 类型配置失效
paths：#指定要监控的日志，目前依照 Go 语言的 glob 函数解决。没有对配置目录做递归解决，比方配置的如果是：- /var/log/* /*.log  #则只会去 /var/log 目录的所有子目录中寻找以 ".log" 结尾的文件，而不会寻找 /var/log 目录下以 ".log" 结尾的文件。recursive_glob.enabled: #启用全局递归模式，例如 /foo/** 包含 /foo, /foo/*, /foo/*/*
encoding：# 指定被监控的文件的编码类型，应用 plain 和 utf- 8 都是能够解决中文日志的
exclude_lines: ['^DBG'] #不蕴含匹配正则的行
include_lines: ['^ERR', '^WARN']  #蕴含匹配正则的行
harvester_buffer_size: 16384 #每个 harvester 在获取文件时应用的缓冲区的字节大小
max_bytes: 10485760 #单个日志音讯能够领有的最大字节数。max_bytes 之后的所有字节都被抛弃而不发送。默认值为 10MB (10485760)
exclude_files: ['.gz$']  #用于匹配心愿 Filebeat 疏忽的文件的正则表达式列表
ingore_older: 0 #默认为 0，示意禁用，能够配置 2h，2m 等，留神 ignore_older 必须大于 close_inactive 的值. 示意疏忽超过设置值未更新的
文件或者文件素来没有被 harvester 收集
close_* #close_ * 配置选项用于在特定规范或工夫之后敞开 harvester。敞开 harvester 意味着敞开文件处理程序。如果在 harvester 敞开
后文件被更新，则在 scan_frequency 过后，文件将被从新拾取。然而，如果在 harvester 敞开时挪动或删除文件，Filebeat 将无奈再次接管文件，并且 harvester 未读取的任何数据都将失落。close_inactive  #启动选项时，如果在制订工夫没有被读取，将敞开文件句柄
读取的最初一条日志定义为下一次读取的起始点，而不是基于文件的批改工夫
如果敞开的文件发生变化，一个新的 harverster 将在 scan_frequency 运行后被启动
倡议至多设置一个大于读取日志频率的值，配置多个 prospector 来实现针对不同更新速度的日志文件
应用外部工夫戳机制，来反映记录日志的读取，每次读取到最初一行日志时开始倒计时应用 2h 5m 来示意
close_rename #当选项启动，如果文件被重命名和挪动，filebeat 敞开文件的解决读取
close_removed #当选项启动，文件被删除时，filebeat 敞开文件的解决读取这个选项启动后，必须启动 clean_removed
close_eof #适宜只写一次日志的文件，而后 filebeat 敞开文件的解决读取
close_timeout #当选项启动时，filebeat 会给每个 harvester 设置预约义工夫，不论这个文件是否被读取，达到设定工夫后，将被敞开
close_timeout 不能等于 ignore_older, 会导致文件更新时，不会被读取如果 output 始终没有输入日志事件，这个 timeout 是不会被启动的，至多要要有一个事件发送，而后 haverter 将被敞开
设置 0 示意不启动
clean_inactived #从注册表文件中删除先前播种的文件的状态
设置必须大于 ignore_older+scan_frequency，以确保在文件仍在收集时没有删除任何状态
配置选项有助于减小注册表文件的大小，特地是如果每天都生成大量的新文件
此配置选项也可用于避免在 Linux 上重用 inode 的 Filebeat 问题
clean_removed #启动选项后，如果文件在磁盘上找不到，将从注册表中革除 filebeat
如果敞开 close removed 必须敞开 clean removed
scan_frequency #prospector 查看指定用于播种的门路中的新文件的频率, 默认 10s
tail_files：# 如果设置为 true，Filebeat 从文件尾开始监控文件新增内容，把新增的每一行文件作为一个事件顺次发送，而不是从文件开始处从新发送所有内容。symlinks：# 符号链接选项容许 Filebeat 除惯例文件外, 能够收集符号链接。收集符号链接时，即便报告了符号链接的门路，Filebeat 也会关上并读取原始文件。backoff：#backoff 选项指定 Filebeat 如何踊跃地抓取新文件进行更新。默认 1s，backoff 选项定义 Filebeat 在达到 EOF 之后
再次查看文件之间期待的工夫。max_backoff：#在达到 EOF 之后再次查看文件之前 Filebeat 期待的最长工夫
backoff_factor：#指定 backoff 尝试等待时间几次，默认是 2
harvester_limit：#harvester_limit 选项限度一个 prospector 并行启动的 harvester 数量，间接影响文件关上数
tags #列表中增加标签，用过过滤，例如：tags: ["json"]
fields #可选字段，抉择额定的字段进行输入能够是标量值，元组，字典等嵌套类型
默认在 sub-dictionary 地位
filebeat.inputs:
fields:
app_id: query_engine_12
fields_under_root #如果值为 ture，那么 fields 存储在输入文档的顶级地位
multiline.pattern #必须匹配的 regexp 模式
multiline.negate #定义下面的模式匹配条件的动作是 否定的，默认是 false
如果模式匹配条件 '^b'，默认是 false 模式，示意讲依照模式匹配进行匹配 将不是以 b 结尾的日志行进行合并
如果是 true，示意将不以 b 结尾的日志行进行合并
multiline.match # 指定 Filebeat 如何将匹配行组合成事件, 在之前或者之后，取决于下面所指定的 negate
multiline.max_lines #能够组合成一个事件的最大行数，超过将抛弃，默认 500
multiline.timeout #定义超时工夫，如果开始一个新的事件在超时工夫内没有发现匹配，也将发送日志，默认是 5s
max_procs #设置能够同时执行的最大 CPU 数。默认值为零碎中可用的逻辑 CPU 的数量。name #为该 filebeat 指定名字，默认为主机的 hostname

filebeat.yml 配置：

#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
  # Change to true to enable this input configuration.
  enabled: true
  # Paths that should be crawled and fetched. Glob based paths.
  paths:  #配置多个日志门路
    -/var/logs/es_aaa_index_search_slowlog.log
    -/var/logs/es_bbb_index_search_slowlog.log
    -/var/logs/es_ccc_index_search_slowlog.log
    -/var/logs/es_ddd_index_search_slowlog.log
    #- c:programdataelasticsearchlogs*
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']
  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']
  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']
  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1
  ### Multiline options
  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation
  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^[
  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false
  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after
#================================ Outputs =====================================
#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts #配多个 logstash 应用负载平衡机制
  hosts: ["192.168.110.130:5044","192.168.110.131:5044","192.168.110.132:5044","192.168.110.133:5044"]  
  loadbalance: true  #应用了负载平衡
  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"
  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

./filebeat -e #启动 filebeat

Logstash 的配置

input {
  beats {port => 5044}
}
output {
  elasticsearch {hosts => ["http://192.168.110.130:9200"] #这里能够配置多个
    index => "query-%{yyyyMMdd}" 
  }
}

filebeat.yml 的配置：

###################### Filebeat Configuration Example #########################
# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html
# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.
#=========================== Filebeat inputs =============================
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
  # Change to true to enable this input configuration.
  enabled: true
  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    -/var/logs/es_aaa_index_search_slowlog.log
    -/var/logs/es_bbb_index_search_slowlog.log
    -/var/logs/es_ccc_index_search_slowlog.log
    -/var/logs/es_dddd_index_search_slowlog.log
    #- c:programdataelasticsearchlogs*
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']
  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']
  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']
  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1
  ### Multiline options
  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation
  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^[
  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false
  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after
#============================= Filebeat modules ===============================
filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml
  # Set to true to enable config reloading
  reload.enabled: false
  # Period on which files under path should be checked for changes
  #reload.period: 10s
#==================== Elasticsearch template setting ==========================
#================================ General =====================================
# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
name: filebeat222
# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]
# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging
#cloud.auth:
#================================ Outputs =====================================
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["192.168.110.130:9200","92.168.110.131:9200"]
  # Protocol - either `http` (default) or `https`.
  #protocol: "https"
  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"
  password: "${ES_PWD}"   #通过 keystore 设置明码

./filebeat -e #启动 Filebeat

查看 Elasticsearch 集群，有一个默认的索引名字 filebeat-%{[beat.version]}-%{+yyyy.MM.dd}

官网：https://www.elastic.co/guide/…

这里我应用 Elasticsearch 模式来解析 ES 的慢日志查问，操作步骤如下，其余的模块操作也一样：

前提：装置好 Elasticsearch 和 Kibana 两个软件，而后应用 Filebeat。

具体的操作官网有：https://www.elastic.co/guide/…

第一步，配置 filebeat.yml 文件：

#============================== Kibana =====================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:
  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "192.168.110.130:5601"  #指定 kibana
  username: "elastic"   #用户
  password: "${ES_PWD}"  #明码，这里应用了 keystore，避免明文明码
  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:
#================================ Outputs =====================================
# Configure what output to use when sending the data collected by the beat.
#-------------------------- Elasticsearch output ------------------------------
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["192.168.110.130:9200","192.168.110.131:9200"]
  # Protocol - either `http` (default) or `https`.
  #protocol: "https"
  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  username: "elastic"  #es 的用户
  password: "${ES_PWD}" # es 的明码
  #这里不能指定 index，因为我没有配置模板，会主动生成一个名为 filebeat-%{[beat.version]}-%{+yyyy.MM.dd} 的索引

第二步，配置 Elasticsearch 的慢日志门路：

cd filebeat-7.7.0-linux-x86_64/modules.d
vim elasticsearch.yml：

第三步，失效 ES 模块：

./filebeat modules elasticsearch
查看失效的模块：./filebeat modules list

第四步，初始化环境：

./filebeat setup -e

第五步，启动 Filebeat：

./filebeat -e

查看 Elasticsearch 集群，如下图所示，把慢日志查问的日志都主动解析进去了：

到这里，Elasticsearch 这个 module 就试验胜利了。

作者：一寸 HUI
原文：https://www.cnblogs.com/zsql/…

关于linux:搞懂日志采集利器-Filebeat-并不难

Filebeat 简介

Filebeat 和 Beats 的关系

Filebeat 是什么

Filebeat 工作的流程图

Filebeat 和 Logstash 的关系

Filebeat 原理介绍

Filebeat 的形成

Filebeat 如何保留文件的状态

Filebeat 何如保障至多一次数据生产

Filebeat 装置

压缩包形式装置

根本命令

输入输出

keystore 的应用

filebeat.yml 配置（Log 输出类型为例）

实例一：Logstash 作为输入

实例二：Elasticsearch 作为输入

Filebeat 模块

第一步，配置 filebeat.yml 文件：

第二步，配置 Elasticsearch 的慢日志门路：

第三步，失效 ES 模块：

第四步，初始化环境：

第五步，启动 Filebeat：

Just My Socks（注册教程内含优惠码）

关于linux:搞懂日志采集利器-Filebeat-并不难

Filebeat 简介

Filebeat 和 Beats 的关系

Filebeat 是什么

Filebeat 工作的流程图

Filebeat 和 Logstash 的关系

Filebeat 原理介绍

Filebeat 的形成

Filebeat 如何保留文件的状态

Filebeat 何如保障至多一次数据生产

Filebeat 装置

压缩包形式装置

根本命令

输入输出

keystore 的应用

filebeat.yml 配置（Log 输出类型为例）

实例一：Logstash 作为输入

实例二：Elasticsearch 作为输入

Filebeat 模块

第一步，配置 filebeat.yml 文件：

第二步，配置 Elasticsearch 的慢日志门路：

第三步，失效 ES 模块：

第四步，初始化环境：

第五步，启动 Filebeat：

Just My Socks（注册教程 内含优惠码）

Just My Socks（注册教程内含优惠码）