fluentd學習——tail(輸入插件)

tail(輸入插件)

http://docs.fluentd.org/articles/in_tail

tail Input Plugin

The in_tail Input plugin allows Fluentd to read events from the tail of text files. Its behavior is similar to the tail -F command.

in_tail 輸入插件允許fluentd從文本文件的尾部讀事件。它的行爲類似於tail-f 命令。

Example Configuration

in_tail is included in Fluentd’s core. No additional installation process is required. 不需要額外的安裝過程。

<source>
  type tail
  path /var/log/httpd-access.log
  pos_file /var/log/td-agent/httpd-access.log.pos
  tag apache.access
  format apache2
</source>
  Please see the Config File article for the basic structure and syntax of the configuration file.
請參閱  Config File 的基本結構和文章語法的配置文件。

How it Works

  • When Fluentd is first configured with in_tail, it will start reading from the tail of that log, not the beggining.
  • Once the log is rotated, Fluentd starts reading the new file from the beggining. It keeps track of the current inode number.
  • If td-agent restarts, it starts reading from the last position td-agent read before the restart. This position is recorded in the position file specified by the pos_file parameter.
  • 當Fluentd首先配置in_tail插件時,它將開始從尾部的日誌閱讀,而不是beggining。
  • 一旦日誌是動(更新),Fluentd開始從beggining閱讀新文件。它跟蹤當前的inode號。
  • 如果  td-agent 重新啓動時,在重啓之前它從 td-agent最後一個位置開始閱讀。這個位置是記錄在指定的位置文件文件pos參數。(說明爲什麼pos的重要性,它必須有)

Parameters

type (required)

The value must be tail.

path (required)

The paths to read. Multiple paths can be specified, separated by ‘,’.

路徑讀取。可以指定多個路徑,”、“分離。(這就可以說明,你可以同時收集多個log日誌,而不用在重新起一個source)

tag (required)

The tag of the event.  事件tag

format (required)指定日誌的格式

The format of the log. Itis the name of a template or regexp surrounded by ‘/’.

該日誌的格式。它是模板的名稱或是正則表達式‘/’包圍。

The regexp must have at least one named capture (?<NAME>PATTERN). If the regexp has a capture named ‘time’, it is used as the time of the event. You can specify the time format using the time_format parameter. If the regexp has a capture named ‘tag’, the tag parameter + the captured tag is used as the tag of the event.

正則表達式必須至少有一個名叫捕獲(? <名稱>模式)。如果正則表達式有一個捕捉名爲“time”,它是用作事件的時間。你可以使用時間格式參數指定時間格式。如果正則表達式有一個捕捉名爲“tag”, tag 參數+捕獲的 tag 是作爲標記的事件。

The following templates are supported:

以下模板支持:

  • regexp
  • 正則表達式

The regexp for the format parameter can be specified. Fluentular is a great website to test your regexp for Fluentd configuration.

格式參數的正則表達式可以指定。 Fluentular 是一個偉大的網站來測試你的regexp Fluentd配置。

  • apache2

Reads apache’s log file for the following fields: host, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

讀取日誌文件apache的爲以下字段:主機、用戶、時間、方法、路徑、代碼、大小、推薦人和代理。這個模板類似於如下配置:

format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
  • syslog 系統記錄

Reads syslog’s output file (e.g. /var/log/syslog) for the following fields: time, host, ident, and message. This template is analogous to the following configuration:

讀取syslog的輸出文件(例如,/ var / log / syslog)對下列字段:時間、主機,識別,和消息。這個模板類似於如下配置:

format /^(?<time>[^ ]* [^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?[^\:]*\: *(?<message>.*)$/
time_format %b %d %H:%M:%S
  • tsv or csv

If you use tsv or csv format, please also specify the keys parameter.

如果你使用tsv或csv格式,也請指定鍵參數。

format tsv
keys key1, key2, key3
time_key key2

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

如果你指定  time_key 參數,它將被用來確定記錄的時間戳。時間戳是當Fluentd讀取記錄是默認情況下使用的。

format csv
keys key1, key2, key3
time_key key3
  • json

One JSON map, per line. This is the most straight forward format :).

format json

The time_key parameter can also be specified.

format json
time_key key3

pos_file (highly recommended)

pos文件(強烈推薦)

This parameter is highly recommended. Fluentd will record the position it last read into this file.

這個參數是高度推薦。Fluentd將記錄它上次讀到這個文件的位置。

pos_file /var/log/td-agent/tmp/access.log.pos

time_format  時間格式

The format of the time field. This parameter is required only if the format includes a ‘time’ capture and it cannot be parsed automatically. Please see Time#strftime for additional information.

時間字段的格式。這個參數是必需的,只是如果格式包含一個“時間”捕獲和它不能自動解析。請看看 Time#strftime瞭解更多信息。

rotate_wait  循環等待  rotating 我感覺翻譯成(更新)更適合

in_tail actually does a bit more than tail -F itself. When rotating a file, some data may still need to be written to the old file as opposed to the new one.

in_tai確實有點超過tail - f本身。當 rotating 一個文件,一些不是新的數據可能仍然需要寫入舊文件。

in_tail takes care of this by keeping a reference to the old file (even after it has been rotated) for some time before transitioning completely to the new file. This helps prevent data designated for the old file from getting lost. By default, this time interval is 5 seconds.

in_tail通過保持一個參考(即使它已更新)對於在完全轉變成新文件之前的一些時間來保護這個舊的文件。這有助於防止數據被指定爲丟失舊文件。默認情況下,這個時間間隔是5秒

The rotate_wait parameter accepts a single integer representing the number of seconds you want this time interval to be.

這個 rotate_wait 參數接受一個整數代表你想要間隔的時間秒數。
 
 
關於正則表達式:我利用自己配置的機器上收集 .log 文件的記錄匹配的正則:
在客戶端fluentd配置文件——fluent.conf
 
.log 數據——源數據
[2013-03-29 07:21:55.483292] router - pid=14615 tid=7a93 fid=5354  DEBUG -- Request body: {"host":"api.vcap.me","stats":[{"response_latency":0,"request_tags":"BAh7BjoOY29tcG9uZW50SSIUQ2xvdWRDb250cm9sbGVyBjoGRVQ=","response_codes":{"responses_2xx":2},"response_samples":2}]}
 
匹配正則:
format /\[(?<time>.*)\] (?<name>[^ ]*) - (?<pid>[^ ]*) (?<tid>[^ ]*) (?<fid>[^ ]*)  (?<level>[^ ]*) -- (?<info>[^ ].*)$/
 time_format %Y-%m-%d %H:%M:%S
 
在mongodb 數據表中查詢結果:
{ "_id" : ObjectId("516d31c415bb53374d000004"), "name" : "router", "pid" : "pid=14615", "tid" : "tid=7a93", "fid" : "fid=5354", "level" : "DEBUG", "info" : "Request body: {\"host\":\"api.vcap.me\",\"stats\":[{\"response_latency\":0,\"request_tags\":\"BAh7BjoOY29tcG9uZW50SSIUQ2xvdWRDb250cm9sbGVyBjoGRVQ=\",\"response_codes\":{\"responses_2xx\":2},\"response_samples\":2}]}", "time" : ISODate("2013-04-23T11:21:55Z") }
 



發佈了29 篇原創文章 · 獲贊 10 · 訪問量 21萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章