乐趣区

logstash-深入学习

因为我是以 elk stock 结构为目标,所以我会以 elasticsearch + redis + logstash + kibana 为中心来写下面的内容。

通过《logstash 最佳实践》学习

管道配置文件主体内容

管理配置文件主要用来发起任务的。输入(input)、处理(filter)、输出(output)。

input

这个主要指定监听那些文件或输出。我的 elk stock 架构中,只有文件和 redis 两个类型

input {
    # redis
    redis {
        host => "127.0.0.1"
        port => 6379
        password => "123456"
        key => "logstash-queue"
        data_type => "list"
        db => 0
    }

    # 文件
    file {
        type => "nginx-access"
        path => "/usr/local/nginx/logs/access.log"
        start_position => beginning
        sincedb_path => "/var/log/logstash/sincedb/nginx"
        codec => multiline {
            pattern => "^\d+"
            negate => true
            what => "previous"
        }
    }
}

note: input.file.codec 这个日志内容如果会出现多行,可以通过 ^d+ 进行分割,换行会被转成 n

fileter

常用匹配方式 grok(正则匹配)

logstash-7.4.0/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns 目录下面是预定义的正则匹配。使用方法如 %{IPORHOST:client}

如果没有办法满足,可以自己写正则去匹配。验证正则是否正确可以通过 kibana 里的开发工具 (Dev) > Grok 调试器(Grok Debugger) 来验证。
也可以通过 http://grokdebug.herokuapp.com/ 验证。

filter {if [type] == "nginx-access" {
        grok {
            match => {"message" => "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"
            }
        }
    } else if [type] == "nginx-error" {
        grok {match => ["message" , "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[-]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: (?<clientip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"]
        }
    }
}

优化方案

  1. 直接传入 json 格式的日志

直接传入日志可以省日志内容匹配部分资源占用。但是并不是所有的软件日志都能配置。有些鸡肋。

output

output {
    redis {
        host => "127.0.0.1"
        port => 6379
        password => "123456"
        key => "logstash-queue"
        data_type => "list"
        db => 4
    }
    elasticsearch {hosts => ["http://localhost:9200"]
        index => "logstash-%{+YYYY.MM.dd}"
    }
}

导入 es 时设置字段类型

es 里支持全文索引,但是默认是支持英文的。不符合我们的需求,我们需要借用 ik 分词插件才能达到要求。

参考

FQA

1、一条数据有很多行的处理办法

使用 input.codec 进行合并, 以 nginx 默认格式日志为例。

2019/09/23 10:39:01 [error] 4130#0: *1 FastCGI sent in stderr: "PHP message: PHP Warning:  require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP   1. {main}() /var/www/study/tp5-study/public/index.php:0
PHP message: PHP Fatal error:  require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP   1. {main}() /var/www/study/tp5-study/public/index.php:0"while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request:"GET /favicon.ico HTTP/1.1", upstream:"fastcgi://127.0.0.1:9000", host:"tp5.study.me", referrer:"http://tp5.study.me/"2019/09/23 10:40:14 [error] 4130#0: *7 FastCGI sent in stderr:"PHP message: PHP Warning:  require(/var/www/study/tp5-study/public/../thinkphp/base.php): failed to open stream: No such file or directory in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP   1. {main}() /var/www/study/tp5-study/public/index.php:0
PHP message: PHP Fatal error:  require(): Failed opening required '/var/www/study/tp5-study/public/../thinkphp/base.php' (include_path='.:') in /var/www/study/tp5-study/public/index.php on line 16
PHP message: PHP Stack trace:
PHP message: PHP   1. {main}() /var/www/study/tp5-study/public/index.php:0"while reading response header from upstream, client: 192.168.33.1, server: tp5.study.me, request:"GET /favicon.ico HTTP/1.1", upstream:"fastcgi://127.0.0.1:9000", host:"tp5.study.me"

以上内容可知,第条日志的开头都是由日期组成的。所以我们以数字开头的进行日志分割。即可

input {
    stdin {
        codec => multiline {
            pattern => "^\d+"
            negate => true
            what => "previous"
        }
    }
}

2、日志默认会带着一个 message,这个 message 是未匹配数据的日志。已经把内容提出来了,就没有必要存在原始数据。

filter {
        grok {match => ["message" , "(?<timestamp>%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}[-]%{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:message}(?:, client: (?<clientip>%{IP}|%{HOSTNAME}))(?:, server: %{IPORHOST:server}?)(?:, request: %{QS:request})?(?:, upstream: (?<upstream>\"%{URI}\"|%{QS}))?(?:, host: %{QS:request_host})?(?:, referrer: \"%{URI:referrer}\")?"]
            overwrite => ["message"]
        }
}

通过 overwrite 进行重写。overwrite 必需在 filter.grok 里

3、日志抓取中都有 @timestamp,我希望旧数据的时间写到这个时间里去

注:这个是 logstash 自带的东西,不推荐修改,所以用 timestamp 来代替,不同的是这个是匹配得到的时间

退出移动版