Logstash收集nginx日誌並grok進行文本過濾

簡介

grok作爲一個logstash的過濾插件,支持根據正則表達式解析文本日誌行,拆成字段message結構化後再存儲,方便kibana的搜索和統計。

nginx日誌格式

.....

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    keepalive_timeout  65;

    #gzip  on;

    include /etc/nginx/conf.d/*.conf;
}

[root@centos6 nginx]# cat /var/log/nginx/access.log 查看日誌輸出內容:

192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.132 - - [08/Jul/2019:12:53:45 +0800] "GET /saudgsg/bujguj HTTP/1.0" 200 1201 "-" "ApacheBench/2.3" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /indexfsd HTTP/1.1" 200 1201 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KH
TML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /favicon.ico HTTP/1.1" 200 1320 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "GET /favicon.ico HTTP/1.1" 200 1320 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/base/searchIndexImage.htm?v=1&device=10 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows NT 1
0.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/base/getArticleList.htm?v=1&device=10 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows NT 10.
0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"
192.168.10.1 - - [08/Jul/2019:12:54:36 +0800] "POST /bs/thirdparthy/getShareUrl.htm?t=1562561715347 HTTP/1.1" 502 575 "-" "Mozilla/5.0 (Windows
NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0" "-"

編寫文本過濾器

logstash中默認存在一部分正則表達式來讓我們套用,在如下的文件中我們可以看到:

/usr/local/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns

其中最基本的定義是在grok-patterns中,但是某些正則不適合我們的nginx字段,此時就需要我們來自定義,然後grok通過patterns_dir來調用即可。 這裏截取部分的文本內容供參考文本寫法:

我這裏編寫了一個符合這臺nginx服務器的日誌過濾器,如果正則表達式不太熟的同學可以看下正則表達式-語法

[root@centos6 patterns]# vim nginx-access 

NGINXACCESS %{IP:clientip} - (%{USERNAME:user}|-) \[%{HTTPDATE:timestamp}\] \"%{WORD:request_verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion
}\" %{NUMBER:status:int} %{NUMBER:body_sent:int} \"-\" \"%{GREEDYDATA:agent}\" \"-\"

編寫logstash配置文件

logstash基本格式 input >> codec >> filter >> codec >> output ,codec用於文字編碼格式轉換

[root@centos6 bin]# vim nginx_access.conf 
input {
    file {
        path => "/var/log/nginx/access.log"  #日誌文件路徑
    }
}

filter {
    grok {
        patterns_dir => "/usr/local/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns"  #模塊文件路徑
        match => { "message" => "%{NGINXACCESS}" }    #使用過濾的方法
        remove_field => "message"  #過濾後丟棄原有信息
    }
}

output {
    stdout {
        codec=>rubydebug   #屏幕輸出調試
    }
}

 [root@centos6 bin]# ./logstash -f nginx_access.conf 啓動logstash日誌收集,並打開瀏覽器對nginx訪問。輸出內容如下:

左邊爲編寫過濾器時自定義的文本名稱和一些logstash自帶參數,右邊爲日誌文本過濾分段夠的內容。

 

調式無誤後對配置文件進一步修改,輸出到elasticsearch:

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章