關於logstash、elk讀取日誌問題

最近遇到一個案例

遊戲服的進程會產生大量日誌,日誌內容爲json行,需要按小時分割,現在logstash採集的時候總是會遇到解析失敗的情況,報錯如下:

[2020-06-17T12:13:19,489][ERROR][logstash.codecs.json     ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: incompatible json object type=java.lang.String , only hash map or arrays are supported>, :data=>"\"#distinct_id\":\"xxxxxxxxxx\",\"#type\":\"track\",\"#ip\":\"113.7.22.200\",\"#time\":\"2020-06-17 12:12:19\",\"#event_name\":\"xxxxxxxxx\",\"#account_no\":\"xxxxxxxx\",\"properties\":{\"role_uid\":xxxxxxxxx,\"current_server\":xxx,\"create_server\":xxxx,\"channel\":\"xxxxx\",\"role_name\":\"xx\",\"role_create_time\":\"2020-03-10 18:57:31\"}"}

通過讀取官方文檔,才知道logstash是有兩種讀取日誌的方式,一種是完整的讀取一次,一種是不停的查看文件的末尾行。 

 Tail modeedit

In this mode the plugin aims to track changing files and emit new content as it’s appended to each file. In this mode, files are seen as a never ending stream of content and EOF has no special significance. The plugin always assumes that there will be more content. When files are rotated, the smaller or zero size is detected, the current position is reset to zero and streaming continues. A delimiter must be seen before the accumulated characters can be emitted as a line.

Read modeedit

In this mode the plugin treats each file as if it is content complete, that is, a finite stream of lines and now EOF is significant. A last delimiter is not needed because EOF means that the accumulated characters can be emitted as a line. Further, EOF here means that the file can be closed and put in the "unwatched" state - this automatically frees up space in the active window. This mode also makes it possible to process compressed files as they are content complete. Read mode also allows for an action to take place after processing the file completely.

In the past attempts to simulate a Read mode while still assuming infinite streams was not ideal and a dedicated Read mode is an improvement.

而我採用的是Tail mode,不停的追加讀取的方式。

會不會有可能發生這種情況呢? 進程在寫日誌,一整個json行還沒寫完,logstash就開始讀取日誌了。那麼就會出現logstash讀取到的不是一個完整的json行,而出現解析失敗的情況。這種情況下怎麼辦?答案是添加一個配置,設置stat_interval 間隔。

input {
  file {
    path => "/data/xxxx/*"
    stat_interval => 3800 # 我這裏因爲是一個小時分割一次,我設置等待3800s也就是1個小時多幾分鐘後讀取,可根據自己的情況設定
    codec => json {
    }
    start_position => "beginning"
    sincedb_write_interval => 10
  }
}

目前看起來是改善了很多。

不過既然是設置等待3800s,也就是讀取已經停止更新的文件,那麼可以改爲Read mode。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章