关于linux:搞懂日志采集利器-Filebeat-并不难

之前也介绍过：超强干货！通过filebeat、logstash、rsyslog 几种形式采集 nginx 日志。本文应用的Filebeat是7.7.0的版本，文章将从如下几个方面阐明：

Filebeat简介

Filebeat和Beats的关系

首先Filebeat是Beats中的一员。

Beats在是一个轻量级日志采集器，其实Beats家族有6个成员，晚期的ELK架构中应用Logstash收集、解析日志，然而Logstash对内存、CPU、io等资源耗费比拟高。相比Logstash，Beats所占零碎的CPU和内存简直能够忽略不计。

目前Beats蕴含六种工具：

Packetbeat：网络数据（收集网络流量数据）
Metricbeat：指标（收集零碎、过程和文件系统级别的CPU和内存应用状况等数据）
Filebeat：日志文件（收集文件数据）
Winlogbeat：Windows事件日志（收集Windows事件日志数据）
Auditbeat：审计数据（收集审计日志）
Heartbeat：运行工夫监控（收集零碎运行时的数据）

Filebeat 是什么

Filebeat是用于转发和集中日志数据的轻量级传送工具。Filebeat监督您指定的日志文件或地位，收集日志事件，并将它们转发到Elasticsearch或 Logstash进行索引。

Filebeat的工作形式如下：启动Filebeat时，它将启动一个或多个输出，这些输出将在为日志数据指定的地位中查找。对于Filebeat所找到的每个日志，Filebeat都会启动收集器。每个收集器都读取单个日志以获取新内容，并将新日志数据发送到libbeat，libbeat将汇集事件，并将汇集的数据发送到为Filebeat配置的输入。

Filebeat 工作的流程图

Filebeat和Logstash的关系

因为Logstash是JVM跑的，资源耗费比拟大，所以起初作者又用Golang写了一个性能较少然而资源耗费也小的轻量级的logstash-forwarder。不过作者只是一个人，退出http://elastic.co公司当前，因为ES公司自身还收买了另一个开源我的项目Packetbeat，而这个我的项目专门就是用Golang的，有整个团队，所以ES公司罗唆把logstash-forwarder的开发工作也合并到同一个Golang团队来搞，于是新的我的项目就叫Filebeat了。

Filebeat 原理介绍

Filebeat 的形成

Filebeat构造：由两个组件形成，别离是inputs（输出）和harvesters（收集器），这些组件一起工作来跟踪文件并将事件数据发送到您指定的输入，harvester负责读取单个文件的内容。harvester逐行读取每个文件，并将内容发送到输入。为每个文件启动一个harvester。harvester负责关上和敞开文件，这意味着文件描述符在harvester运行时放弃关上状态。如果在收集文件时删除或重命名文件，Filebeat将持续读取该文件。这样做的副作用是，磁盘上的空间始终保留到harvester敞开。默认状况下，Filebeat放弃文件关上，直到达到close_inactive。

敞开harvester能够会产生的后果：

文件处理程序敞开，如果harvester仍在读取文件时被删除，则开释底层资源。
只有在scan_frequency完结之后，才会再次启动文件的收集。
如果该文件在harvester敞开时被挪动或删除，该文件的收集将不会持续。

一个input负责管理harvesters和寻找所有起源读取。如果input类型是log，则input将查找驱动器上与定义的门路匹配的所有文件，并为每个文件启动一个harvester。每个input在它本人的Go过程中运行，Filebeat以后反对多种输出类型。每个输出类型能够定义屡次。日志输出查看每个文件，以查看是否须要启动harvester、是否曾经在运行harvester或是否能够疏忽该文件。

Filebeat如何保留文件的状态

Filebeat保留每个文件的状态，并常常将状态刷新到磁盘中的注册表文件中。该状态用于记住harvester读取的最初一个偏移量，并确保发送所有日志行。如果无法访问输入（如Elasticsearch或Logstash），Filebeat将跟踪最初发送的行，并在输入再次可用时持续读取文件。当Filebeat运行时，每个输出的状态信息也保留在内存中。当Filebeat重新启动时，来自注册表文件的数据用于重建状态，Filebeat在最初一个已知地位持续每个harvester。对于每个输出，Filebeat都会保留它找到的每个文件的状态。因为文件能够重命名或挪动，文件名和门路不足以标识文件。对于每个文件，Filebeat存储惟一的标识符，以检测文件是否以前被捕捉。

Filebeat何如保障至多一次数据生产

Filebeat保障事件将至多传递到配置的输入一次，并且不会失落数据。是因为它将每个事件的传递状态存储在注册表文件中。在已定义的输入被阻止且未确认所有事件的状况下，Filebeat将持续尝试发送事件，直到输入确认已接管到事件为止。如果Filebeat在发送事件的过程中敞开，它不会期待输入确认所有事件后再敞开。当Filebeat重新启动时，将再次将Filebeat敞开前未确认的所有事件发送到输入。这样能够确保每个事件至多发送一次，但最终可能会有反复的事件发送到输入。通过设置shutdown_timeout选项，能够将Filebeat配置为在关机前期待特定工夫。

Filebeat 装置

压缩包形式装置

本文采纳压缩包的形式装置，Linux版本，filebeat-7.7.0-linux-x86_64.tar.gz。

curl-L-Ohttps://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.7.0-linux-x86_64.tar.gztar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

配置示例文件：filebeat.reference.yml（蕴含所有未过期的配置项）

配置文件：filebeat.yml

根本命令

详情见官网：https://www.elastic.co/guide/...

export   #导出run      #执行（默认执行）test     #测试配置keystore #秘钥存储modules  #模块配置管理setup    #设置初始环境例如：./filebeat test config #用来测试配置文件是否正确

输入输出

反对的输出组件：

Multilinemessages，Azureeventhub，CloudFoundry，Container，Docker，GooglePub/Sub，HTTPJSON，Kafka，Log，MQTT，NetFlow，Office 365 Management Activity API，Redis，s3，Stdin，Syslog，TCP，UDP（最罕用的就是Log）

反对的输入组件：

Elasticsearch，Logstash，Kafka，Redis，File，Console，ElasticCloud，Changetheoutputcodec（最罕用的就是Elasticsearch，Logstash）

keystore的应用

keystore次要是避免敏感信息被泄露，比方明码等，像ES的明码，这里能够生成一个key为ES_PWD，值为ES的password的一个对应关系，在应用ES的明码的时候就能够应用${ES_PWD}应用。

创立一个存储明码的keystore：filebeat keystore create而后往其中增加键值对，例如：filebeatk eystore add ES_PWD应用笼罩原来键的值：filebeat key store add ES_PWD–force删除键值对：filebeat key store remove ES_PWD查看已有的键值对：filebeat key store list

例如：前期就能够通过${ES_PWD}应用其值，例如：

output.elasticsearch.password:"${ES_PWD}"

filebeat.yml配置（Log输出类型为例）

详情见官网：https://www.elastic.co/guide/...

type: log #input类型为logenable: true #示意是该log类型配置失效paths：     #指定要监控的日志，目前依照Go语言的glob函数解决。没有对配置目录做递归解决，比方配置的如果是：- /var/log/* /*.log  #则只会去/var/log目录的所有子目录中寻找以".log"结尾的文件，而不会寻找/var/log目录下以".log"结尾的文件。recursive_glob.enabled: #启用全局递归模式，例如/foo/**包含/foo, /foo/*, /foo/*/*encoding：#指定被监控的文件的编码类型，应用plain和utf-8都是能够解决中文日志的exclude_lines: ['^DBG'] #不蕴含匹配正则的行include_lines: ['^ERR', '^WARN']  #蕴含匹配正则的行harvester_buffer_size: 16384 #每个harvester在获取文件时应用的缓冲区的字节大小max_bytes: 10485760 #单个日志音讯能够领有的最大字节数。max_bytes之后的所有字节都被抛弃而不发送。默认值为10MB (10485760)exclude_files: ['.gz$']  #用于匹配心愿Filebeat疏忽的文件的正则表达式列表ingore_older: 0 #默认为0，示意禁用，能够配置2h，2m等，留神ignore_older必须大于close_inactive的值.示意疏忽超过设置值未更新的文件或者文件素来没有被harvester收集close_* #close_ *配置选项用于在特定规范或工夫之后敞开harvester。 敞开harvester意味着敞开文件处理程序。 如果在harvester敞开后文件被更新，则在scan_frequency过后，文件将被从新拾取。 然而，如果在harvester敞开时挪动或删除文件，Filebeat将无奈再次接管文件，并且harvester未读取的任何数据都将失落。close_inactive  #启动选项时，如果在制订工夫没有被读取，将敞开文件句柄读取的最初一条日志定义为下一次读取的起始点，而不是基于文件的批改工夫如果敞开的文件发生变化，一个新的harverster将在scan_frequency运行后被启动倡议至多设置一个大于读取日志频率的值，配置多个prospector来实现针对不同更新速度的日志文件应用外部工夫戳机制，来反映记录日志的读取，每次读取到最初一行日志时开始倒计时应用2h 5m 来示意close_rename #当选项启动，如果文件被重命名和挪动，filebeat敞开文件的解决读取close_removed #当选项启动，文件被删除时，filebeat敞开文件的解决读取这个选项启动后，必须启动clean_removedclose_eof #适宜只写一次日志的文件，而后filebeat敞开文件的解决读取close_timeout #当选项启动时，filebeat会给每个harvester设置预约义工夫，不论这个文件是否被读取，达到设定工夫后，将被敞开close_timeout 不能等于ignore_older,会导致文件更新时，不会被读取如果output始终没有输入日志事件，这个timeout是不会被启动的，至多要要有一个事件发送，而后haverter将被敞开设置0 示意不启动clean_inactived #从注册表文件中删除先前播种的文件的状态设置必须大于ignore_older+scan_frequency，以确保在文件仍在收集时没有删除任何状态配置选项有助于减小注册表文件的大小，特地是如果每天都生成大量的新文件此配置选项也可用于避免在Linux上重用inode的Filebeat问题clean_removed #启动选项后，如果文件在磁盘上找不到，将从注册表中革除filebeat如果敞开close removed 必须敞开clean removedscan_frequency #prospector查看指定用于播种的门路中的新文件的频率,默认10stail_files：#如果设置为true，Filebeat从文件尾开始监控文件新增内容，把新增的每一行文件作为一个事件顺次发送，而不是从文件开始处从新发送所有内容。symlinks：#符号链接选项容许Filebeat除惯例文件外,能够收集符号链接。收集符号链接时，即便报告了符号链接的门路，Filebeat也会关上并读取原始文件。backoff： #backoff选项指定Filebeat如何踊跃地抓取新文件进行更新。默认1s，backoff选项定义Filebeat在达到EOF之后再次查看文件之间期待的工夫。max_backoff： #在达到EOF之后再次查看文件之前Filebeat期待的最长工夫backoff_factor： #指定backoff尝试等待时间几次，默认是2harvester_limit：#harvester_limit选项限度一个prospector并行启动的harvester数量，间接影响文件关上数tags #列表中增加标签，用过过滤，例如：tags: ["json"]fields #可选字段，抉择额定的字段进行输入能够是标量值，元组，字典等嵌套类型默认在sub-dictionary地位filebeat.inputs:fields:app_id: query_engine_12fields_under_root #如果值为ture，那么fields存储在输入文档的顶级地位multiline.pattern #必须匹配的regexp模式multiline.negate #定义下面的模式匹配条件的动作是 否定的，默认是false如果模式匹配条件'^b'，默认是false模式，示意讲依照模式匹配进行匹配 将不是以b结尾的日志行进行合并如果是true，示意将不以b结尾的日志行进行合并multiline.match # 指定Filebeat如何将匹配行组合成事件,在之前或者之后，取决于下面所指定的negatemultiline.max_lines #能够组合成一个事件的最大行数，超过将抛弃，默认500multiline.timeout #定义超时工夫，如果开始一个新的事件在超时工夫内没有发现匹配，也将发送日志，默认是5smax_procs #设置能够同时执行的最大CPU数。默认值为零碎中可用的逻辑CPU的数量。name #为该filebeat指定名字，默认为主机的hostname

实例一：Logstash作为输入

filebeat.yml配置：

#=========================== Filebeat inputs =============================filebeat.inputs:# Each - is an input. Most options can be set at the input level, so# you can use different inputs for various configurations.# Below are the input specific configurations.- type: log  # Change to true to enable this input configuration.  enabled: true  # Paths that should be crawled and fetched. Glob based paths.  paths:  #配置多个日志门路    -/var/logs/es_aaa_index_search_slowlog.log    -/var/logs/es_bbb_index_search_slowlog.log    -/var/logs/es_ccc_index_search_slowlog.log    -/var/logs/es_ddd_index_search_slowlog.log    #- c:programdataelasticsearchlogs*  # Exclude lines. A list of regular expressions to match. It drops the lines that are  # matching any regular expression from the list.  #exclude_lines: ['^DBG']  # Include lines. A list of regular expressions to match. It exports the lines that are  # matching any regular expression from the list.  #include_lines: ['^ERR', '^WARN']  # Exclude files. A list of regular expressions to match. Filebeat drops the files that  # are matching any regular expression from the list. By default, no files are dropped.  #exclude_files: ['.gz$']  # Optional additional fields. These fields can be freely picked  # to add additional information to the crawled log files for filtering  #fields:  #  level: debug  #  review: 1  ### Multiline options  # Multiline can be used for log messages spanning multiple lines. This is common  # for Java Stack Traces or C-Line Continuation  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [  #multiline.pattern: ^[  # Defines if the pattern set under pattern should be negated or not. Default is false.  #multiline.negate: false  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern  # that was (not) matched before or after or as long as a pattern is not matched based on negate.  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash  #multiline.match: after#================================ Outputs =====================================#----------------------------- Logstash output --------------------------------output.logstash:  # The Logstash hosts #配多个logstash应用负载平衡机制  hosts: ["192.168.110.130:5044","192.168.110.131:5044","192.168.110.132:5044","192.168.110.133:5044"]    loadbalance: true  #应用了负载平衡  # Optional SSL. By default is off.  # List of root certificates for HTTPS server verifications  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]  # Certificate for SSL client authentication  #ssl.certificate: "/etc/pki/client/cert.pem"  # Client Certificate Key  #ssl.key: "/etc/pki/client/cert.key"

./filebeat -e #启动filebeat

Logstash的配置

input {  beats {    port => 5044     }}output {  elasticsearch {    hosts => ["http://192.168.110.130:9200"] #这里能够配置多个    index => "query-%{yyyyMMdd}"   }}

实例二：Elasticsearch作为输入

filebeat.yml的配置：

###################### Filebeat Configuration Example ########################## This file is an example configuration file highlighting only the most common# options. The filebeat.reference.yml file from the same directory contains all the# supported options with more comments. You can use it as a reference.## You can find the full configuration reference here:# https://www.elastic.co/guide/en/beats/filebeat/index.html# For more available modules and options, please see the filebeat.reference.yml sample# configuration file.#=========================== Filebeat inputs =============================filebeat.inputs:# Each - is an input. Most options can be set at the input level, so# you can use different inputs for various configurations.# Below are the input specific configurations.- type: log  # Change to true to enable this input configuration.  enabled: true  # Paths that should be crawled and fetched. Glob based paths.  paths:    -/var/logs/es_aaa_index_search_slowlog.log    -/var/logs/es_bbb_index_search_slowlog.log    -/var/logs/es_ccc_index_search_slowlog.log    -/var/logs/es_dddd_index_search_slowlog.log    #- c:programdataelasticsearchlogs*  # Exclude lines. A list of regular expressions to match. It drops the lines that are  # matching any regular expression from the list.  #exclude_lines: ['^DBG']  # Include lines. A list of regular expressions to match. It exports the lines that are  # matching any regular expression from the list.  #include_lines: ['^ERR', '^WARN']  # Exclude files. A list of regular expressions to match. Filebeat drops the files that  # are matching any regular expression from the list. By default, no files are dropped.  #exclude_files: ['.gz$']  # Optional additional fields. These fields can be freely picked  # to add additional information to the crawled log files for filtering  #fields:  #  level: debug  #  review: 1  ### Multiline options  # Multiline can be used for log messages spanning multiple lines. This is common  # for Java Stack Traces or C-Line Continuation  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [  #multiline.pattern: ^[  # Defines if the pattern set under pattern should be negated or not. Default is false.  #multiline.negate: false  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern  # that was (not) matched before or after or as long as a pattern is not matched based on negate.  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash  #multiline.match: after#============================= Filebeat modules ===============================filebeat.config.modules:  # Glob pattern for configuration loading  path: ${path.config}/modules.d/*.yml  # Set to true to enable config reloading  reload.enabled: false  # Period on which files under path should be checked for changes  #reload.period: 10s#==================== Elasticsearch template setting ==========================#================================ General =====================================# The name of the shipper that publishes the network data. It can be used to group# all the transactions sent by a single shipper in the web interface.name: filebeat222# The tags of the shipper are included in their own field with each# transaction published.#tags: ["service-X", "web-tier"]# Optional fields that you can specify to add additional information to the# output.#fields:#  env: staging#cloud.auth:#================================ Outputs =====================================#-------------------------- Elasticsearch output ------------------------------output.elasticsearch:  # Array of hosts to connect to.  hosts: ["192.168.110.130:9200","92.168.110.131:9200"]  # Protocol - either `http` (default) or `https`.  #protocol: "https"  # Authentication credentials - either API key or username/password.  #api_key: "id:api_key"  username: "elastic"  password: "${ES_PWD}"   #通过keystore设置明码

./filebeat -e #启动Filebeat

查看Elasticsearch集群，有一个默认的索引名字filebeat-%{[beat.version]}-%{+yyyy.MM.dd}

Filebeat模块

官网：https://www.elastic.co/guide/...

这里我应用Elasticsearch模式来解析ES的慢日志查问，操作步骤如下，其余的模块操作也一样：

前提：装置好Elasticsearch和Kibana两个软件，而后应用Filebeat。

具体的操作官网有：https://www.elastic.co/guide/...

第一步，配置filebeat.yml文件：

#============================== Kibana =====================================# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.# This requires a Kibana endpoint configuration.setup.kibana:  # Kibana Host  # Scheme and port can be left out and will be set to the default (http and 5601)  # In case you specify and additional path, the scheme is required: http://localhost:5601/path  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601  host: "192.168.110.130:5601"  #指定kibana  username: "elastic"   #用户  password: "${ES_PWD}"  #明码，这里应用了keystore，避免明文明码  # Kibana Space ID  # ID of the Kibana Space into which the dashboards should be loaded. By default,  # the Default Space will be used.  #space.id:#================================ Outputs =====================================# Configure what output to use when sending the data collected by the beat.#-------------------------- Elasticsearch output ------------------------------output.elasticsearch:  # Array of hosts to connect to.  hosts: ["192.168.110.130:9200","192.168.110.131:9200"]  # Protocol - either `http` (default) or `https`.  #protocol: "https"  # Authentication credentials - either API key or username/password.  #api_key: "id:api_key"  username: "elastic"  #es的用户  password: "${ES_PWD}" # es的明码  #这里不能指定index，因为我没有配置模板，会主动生成一个名为filebeat-%{[beat.version]}-%{+yyyy.MM.dd}的索引

第二步，配置Elasticsearch的慢日志门路：

cd filebeat-7.7.0-linux-x86_64/modules.dvim elasticsearch.yml：

第三步，失效ES模块：

./filebeat modules elasticsearch查看失效的模块：./filebeat modules list

第四步，初始化环境：

./filebeat setup -e

第五步，启动Filebeat：

./filebeat -e

查看Elasticsearch集群，如下图所示，把慢日志查问的日志都主动解析进去了：

到这里，Elasticsearch这个module就试验胜利了。

作者：一寸HUI
原文：https://www.cnblogs.com/zsql/...