1. 同步數據到Elastic幾種方式
目前要把kafka中的數據傳輸到elasticsearch大概有以下幾種方法:
1) logstash
2) flume
3) spark streaming
4) kafka connect
5)開發程序消費kafka寫入elasticsearch
本文介紹如何使用Logstash將Kafka中的數據寫入到ElasticSearch,這裏Kafka、logstash、elasticsearch安裝就詳述了。
Logstash工作的流程由三部分組成:
input:輸入(即source),表示從那裏採集數據
filter:過濾,logstash對數據的ETL就是在這個裏面進行。
output:輸出(即sink),表示數據輸出地方。
注意:input需要logstash-input-kafka
插件,該插件logstash默認自帶。
2. logstash配置
1) input輸入
input {
kafka{
bootstrap_servers => ["172.20.34.22:9092"] #broker
client_id => "test" #客戶端id
group_id => "logstash-es" #消費組ID
auto_offset_reset => "latest" #偏移量
consumer_threads => 1 #消費線程數,不大於分區個數
decorate_events => "true" #如果只用了單個logstash,希望訂閱多個主題在es中爲不同主題創建不同的索引,此屬性會將當前topic、offset、group、partition等信息也帶到message中
topics => ["test01","test02"] #消費主題
type => "kafka-to-elas" #類型,區分輸出不同索引
codec => "json" #ES格式爲json,如果不加,整條數據變成一個字符串存儲到message字段裏面
}
}
說明:
decorate_events:此屬性會將當前topic、offset、group、partition等信息也帶到message中,可以達到訂閱多個主題在es中爲不同主題創建不同的索引。
codec => "json":表示會將消息格式爲json,如果不加這個參數,整條數據變成一個字符串存儲到message字段裏面。
2) filter過濾
filter{
#爲每個主題構建對應的[@metadata][index]
if [@metadata][kafka][topic] == "test01" {
mutate {
add_field => {"[@metadata][index]" => "kafka-test01-%{+YYYY.MM.dd}"}
}
}
if [@metadata][kafka][topic] == "test02" {
mutate {
add_field => {"[@metadata][index]" => "kafka-test02-%{+YYYY.MM.dd}"}
}
}
#移除多餘的字段
mutate {
remove_field => ["kafka"]
}
}
說明:根據業務需求進行ETL數據處理。
這裏,我爲每個主題構建對應的[@metadata][index]
,並在接下來output中引用。
3) output輸出
output {
#stdout {
# codec => rubydebug
#}
if [type] == "kafka-to-elastic" {
elasticsearch {
hosts => ["172.20.32.241:9200"]
index => "%{[@metadata][index]}"
timeout => 300
}
}
}
3. 實例
場景:
消費kafka中多個主題數據(json格式),通過logstash採集並根據不同主題輸出到Elasticsearch中不同的索引中。
運行測試(這裏,我爲了方便測試打開了stdout輸出到屏幕):
主題test01對應的索引爲 kafka-test01-2020.05.08
數據格式:
{
"_index": "kafka-data-2020.05.08",
"_type": "doc",
"_id": "knUr9nEBZ5SvKbknPKgD",
"_version": 1,
"_score": 1,
"_source": {
"SrcDataId": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
"@timestamp": "2020-05-08T21:22:39.903Z",
"SrcDataTime": "20200508085539",
"VendorID": "hikvision",
"type": "kafka-to-elas",
"DeviceModelID": "d1eddcbb86d84164b28f244efe155751",
"DataSource": "ICESDK",
"Data": {
"PassTime": "20200508085539",
"Direction": "0",
"MotorVehicleID": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
"PlateFileFormat": "jpeg",
"VehicleColor": "-1",
"VehicleClass": "1",
"MarkTime": "20200508085539",
"AppearTime": "20200508085539",
"PlateDeviceID": "11011835011321002022",
"MotorVehicleStoragePath": "http://192.168.1.171:17999/ICESDKPic/20200508/8/Plate_1588899348077133.jpeg",
"PlateEventSort": "16",
"DeviceID": "",
"TollgateID": "",
"PlateStoragePath": "http://192.168.1.171:17999/ICESDKPic/20200508/8/Plate_1588899348077133.jpeg",
"SourceID": "12",
"PlateNo": "遼LJY888",
"PlateShotTime": "20200508085539"
},
"DeviceId": "",
"ServerID": "0C68F9AA79A14D95A644598D1D7D1623",
"ProcessID": "3447a347ebbe433a86cfbfd0a5c5be68",
"DataType": "MotorVehicle",
"@version": "1"
}
}
主題test02對應的索引爲 kafka-test02-2020.05.08
數據格式:
{
"_index": "kafka-test02-2020.05.08",
"_type": "doc",
"_id": "lnVD9nEBZ5SvKbknzah0",
"_version": 1,
"_score": 1,
"_source": {
"PlateDeviceID": "11011835011321002022",
"DeviceID": "",
"MotorVehicleID": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
"VehicleClass": "1",
"type": "kafka-to-elastic",
"PlateNo": "遼LJY888",
"PassTime": "20200508085539",
"AppearTime": "20200508085539",
"PlateEventSort": "16",
"PlateShotTime": "20200508085539",
"Direction": "0",
"MotorVehicleStoragePath": "http:\192.168.1.171:17999\ICESDKPic\20200508//8/Plate_1588899348077133.jpeg",
"PlateStoragePath": "http:\192.168.1.171:17999\ICESDKPic\20200508//8/Plate_1588899348077133.jpeg",
"@timestamp": "2020-05-08T21:49:30.588Z",
"PlateFileFormat": "jpeg",
"TollgateID": "",
"VehicleColor": "-1",
"SourceID": "12",
"MarkTime": "20200508085539",
"@version": "1"
}
}
最後,我們可以在head插件中查看輸出的數據。