ElasticSearch是個是一個分佈式、可擴展、實時的搜索與數據分析引擎,如何將海量數據源高效可靠的寫入到ElasticSearch是個無法避免的
Logstash概念與原理
Logstash 是開源的服務器端數據處理管道,能夠同時從多個來源動態地採集、轉換和傳輸數據到ElasticSearch的索引中,進而對數據進行分詞、檢索與分析,不受格式或複雜度的影響,它提供了豐富的過濾器庫,如能利用 Grok 從非結構化數據中派生出結構,從 IP 地址解碼出地理座標,匿名化或排除敏感字段,並簡化整體處理過程
Logstash應用場景
1、Logstash直接作爲客戶端數據源收集器,對數據進行解析轉換和存儲(Logstash較爲重量級,消耗資源較多)
2、通過Beats收集客戶端數據,Logstash對Beats的數據進行進一步收集、分析和轉換
3、訂閱Kaka消息,對數據進行解析、轉換
解決方案:
1、數據源(MySQL數據,)——Logstash——輸出(輸出到ElasticSearch、文件、kafka、Redis…)
2、數據源——Beats(如FileBeats)——Logstash——輸出
3、數據源——Beats——Kafka(Redis)——Logstash——輸出
4、Kafia(Redis)——Logstash——輸出
Logstash實現kafka消息訂閱、解析與ElasticSearch存儲
Logstash實現FileBeat數據收集、清洗與ElasticSearch存儲
Logstash實現MySQL數據收集、解析與ElasticSearch存儲
Logstash的過濾器插件庫
Plugin |
Description |
Github repository |
Aggregates information from several events originating with a single task |
||
Performs general alterations to fields that the |
||
Parses string representations of computer storage sizes, such as "123 MB" or "5.6gb", into their numeric value in bytes |
||
Checks IP addresses against a list of network blocks |
||
Applies or removes a cipher to an event |
||
Duplicates events |
||
Parses comma-separated value data into individual fields |
||
Parses dates from fields to use as the Logstash timestamp for an event |
||
Computationally expensive filter that removes dots from a field name |
||
Extracts unstructured event data into fields using delimiters |
||
Performs a standard or reverse DNS lookup |
||
Drops all events |
||
Calculates the elapsed time between a pair of events |
||
Copies fields from previous log events in Elasticsearch to current events |
||
Stores environment variables as metadata sub-fields |
||
Extracts numbers from a string |
||
Fingerprints fields by replacing values with a consistent hash |
||
Adds geographical information about an IP address |
||
Parses unstructured event data into fields |
||
Provides integration with external web services/REST APIs |
||
Removes special characters from a field |
||
Generates a UUID and adds it to each processed event |
||
Enriches events with data pre-loaded from a remote database |
||
Enrich events with your database data |
||
Parses JSON events |
||
Serializes a field to JSON |
||
Parses key-value pairs |
||
Provides integration with external data in Memcached |
||
Takes complex events containing a number of metrics and splits these up into multiple events, each holding a single metric |
||
Aggregates metrics |
||
Performs mutations on fields |
||
Prunes event data based on a list of fields to blacklist or whitelist |
||
Checks that specified fields stay within given size or length limits |
||
Executes arbitrary Ruby code |
||
Sleeps for a specified time span |
||
Splits multi-line messages into distinct events |
||
Parses the |
||
Enriches security logs with information about the attacker’s intent |
||
Throttles the number of events |
||
Replaces the contents of the default message field with whatever you specify in the configuration |
||
Replaces field contents based on a hash or YAML file |
||
Truncates fields longer than a given length |
||
Decodes URL-encoded fields |
||
Parses user agent strings into fields |
||
Adds a UUID to events |
||
Parses XML into fields |
grok,能通過正則解析和結構化任何文本,Grok 目前是Logstash最好的方式對非結構化日誌數據解析成結構化和可查詢化。此外,Logstash還可以重命名、刪除、替換和修改事件字段,當然也包括完全丟棄事件,如debug事件。還有很多的複雜功能可供選擇,
Flume側重數據的傳輸,使用者需非常清楚整個數據的路由,相對來說其更可靠,channel是用於持久化目的的,數據必須確認傳輸到下一個目的地,纔會刪除;
Logstash側重數據的預處理,日誌字段經過預處理之後再進行解析