Logstash7.4實現Kafka消息、Beats、MySQL的數據收集、解析、轉換和ElasticSearch存儲的應用場景

ElasticSearch是個是一個分佈式、可擴展、實時的搜索與數據分析引擎,如何將海量數據源高效可靠的寫入到ElasticSearch是個無法避免的

Logstash概念與原理

Logstash 是開源的服務器端數據處理管道,能夠同時從多個來源動態地採集、轉換和傳輸數據到ElasticSearch的索引中,進而對數據進行分詞、檢索與分析,不受格式或複雜度的影響,它提供了豐富的過濾器庫,如能利用 Grok 從非結構化數據中派生出結構,從 IP 地址解碼出地理座標,匿名化或排除敏感字段,並簡化整體處理過程

Logstash應用場景

1、Logstash直接作爲客戶端數據源收集器,對數據進行解析轉換和存儲(Logstash較爲重量級,消耗資源較多)

2、通過Beats收集客戶端數據,Logstash對Beats的數據進行進一步收集、分析和轉換

3、訂閱Kaka消息,對數據進行解析、轉換

解決方案:

1、數據源(MySQL數據,)——Logstash——輸出(輸出到ElasticSearch、文件、kafka、Redis…)

2、數據源——Beats(如FileBeats)——Logstash——輸出

3、數據源——Beats——Kafka(Redis)——Logstash——輸出

4、Kafia(Redis)——Logstash——輸出

Logstash實現kafka消息訂閱、解析與ElasticSearch存儲

 

Logstash實現FileBeat數據收集、清洗與ElasticSearch存儲

 

Logstash實現MySQL數據收集、解析與ElasticSearch存儲

 

Logstash的過濾器插件庫

Plugin

Description

Github repository

aggregate

Aggregates information from several events originating with a single task

logstash-filter-aggregate

alter

Performs general alterations to fields that the mutate filter does not handle

logstash-filter-alter

bytes

Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes

logstash-filter-bytes

cidr

Checks IP addresses against a list of network blocks

logstash-filter-cidr

cipher

Applies or removes a cipher to an event

logstash-filter-cipher

clone

Duplicates events

logstash-filter-clone

csv

Parses comma-separated value data into individual fields

logstash-filter-csv

date

Parses dates from fields to use as the Logstash timestamp for an event

logstash-filter-date

de_dot

Computationally expensive filter that removes dots from a field name

logstash-filter-de_dot

dissect

Extracts unstructured event data into fields using delimiters

logstash-filter-dissect

dns

Performs a standard or reverse DNS lookup

logstash-filter-dns

drop

Drops all events

logstash-filter-drop

elapsed

Calculates the elapsed time between a pair of events

logstash-filter-elapsed

elasticsearch

Copies fields from previous log events in Elasticsearch to current events

logstash-filter-elasticsearch

environment

Stores environment variables as metadata sub-fields

logstash-filter-environment

extractnumbers

Extracts numbers from a string

logstash-filter-extractnumbers

fingerprint

Fingerprints fields by replacing values with a consistent hash

logstash-filter-fingerprint

geoip

Adds geographical information about an IP address

logstash-filter-geoip

grok

Parses unstructured event data into fields

logstash-filter-grok

http

Provides integration with external web services/REST APIs

logstash-filter-http

i18n

Removes special characters from a field

logstash-filter-i18n

java_uuid

Generates a UUID and adds it to each processed event

core plugin

jdbc_static

Enriches events with data pre-loaded from a remote database

logstash-filter-jdbc_static

jdbc_streaming

Enrich events with your database data

logstash-filter-jdbc_streaming

json

Parses JSON events

logstash-filter-json

json_encode

Serializes a field to JSON

logstash-filter-json_encode

kv

Parses key-value pairs

logstash-filter-kv

memcached

Provides integration with external data in Memcached

logstash-filter-memcached

metricize

Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric

logstash-filter-metricize

metrics

Aggregates metrics

logstash-filter-metrics

mutate

Performs mutations on fields

logstash-filter-mutate

prune

Prunes event data based on a list of fields to blacklist or whitelist

logstash-filter-prune

range

Checks that specified fields stay within given size or length limits

logstash-filter-range

ruby

Executes arbitrary Ruby code

logstash-filter-ruby

sleep

Sleeps for a specified time span

logstash-filter-sleep

split

Splits multi-line messages into distinct events

logstash-filter-split

syslog_pri

Parses the PRI (priority) field of a syslog message

logstash-filter-syslog_pri

threats_classifier

Enriches security logs with information about the attacker’s intent

logstash-filter-threats_classifier

throttle

Throttles the number of events

logstash-filter-throttle

tld

Replaces the contents of the default message field with whatever you specify in the configuration

logstash-filter-tld

translate

Replaces field contents based on a hash or YAML file

logstash-filter-translate

truncate

Truncates fields longer than a given length

logstash-filter-truncate

urldecode

Decodes URL-encoded fields

logstash-filter-urldecode

useragent

Parses user agent strings into fields

logstash-filter-useragent

uuid

Adds a UUID to events

logstash-filter-uuid

xml

Parses XML into fields

logstash-filter-xml

grok,能通過正則解析和結構化任何文本,Grok 目前是Logstash最好的方式對非結構化日誌數據解析成結構化和可查詢化。此外,Logstash還可以重命名、刪除、替換和修改事件字段,當然也包括完全丟棄事件,如debug事件。還有很多的複雜功能可供選擇,

Flume側重數據的傳輸,使用者需非常清楚整個數據的路由,相對來說其更可靠,channel是用於持久化目的的,數據必須確認傳輸到下一個目的地,纔會刪除;

Logstash側重數據的預處理,日誌字段經過預處理之後再進行解析

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章