Logstash消費kafka同步數據到Elasticsearch

1. 同步數據到Elastic幾種方式

目前要把kafka中的數據傳輸到elasticsearch大概有以下幾種方法:

1) logstash

2) flume

3) spark streaming

4) kafka connect

5)開發程序消費kafka寫入elasticsearch

本文介紹如何使用Logstash將Kafka中的數據寫入到ElasticSearch,這裏Kafka、logstash、elasticsearch安裝就詳述了。

Logstash工作的流程由三部分組成:

input:輸入(即source),表示從那裏採集數據

filter:過濾,logstash對數據的ETL就是在這個裏面進行。

output:輸出(即sink),表示數據輸出地方。

注意:input需要logstash-input-kafka插件,該插件logstash默認自帶。

2. logstash配置

1) input輸入

input {
    kafka{
        bootstrap_servers => ["172.20.34.22:9092"] #broker
        client_id => "test"              #客戶端id
        group_id => "logstash-es"        #消費組ID
        auto_offset_reset => "latest"    #偏移量
        consumer_threads => 1			#消費線程數,不大於分區個數
        decorate_events => "true"		#如果只用了單個logstash,希望訂閱多個主題在es中爲不同主題創建不同的索引,此屬性會將當前topic、offset、group、partition等信息也帶到message中
        topics => ["test01","test02"]   #消費主題
        type => "kafka-to-elas"         #類型,區分輸出不同索引
        codec => "json"                 #ES格式爲json,如果不加,整條數據變成一個字符串存儲到message字段裏面
     }
}

說明:

decorate_events:此屬性會將當前topic、offset、group、partition等信息也帶到message中,可以達到訂閱多個主題在es中爲不同主題創建不同的索引。

codec => "json":表示會將消息格式爲json,如果不加這個參數,整條數據變成一個字符串存儲到message字段裏面。

2) filter過濾

filter{
   #爲每個主題構建對應的[@metadata][index]
   if [@metadata][kafka][topic] == "test01" {
      mutate {
         add_field => {"[@metadata][index]" => "kafka-test01-%{+YYYY.MM.dd}"}	
      }
   } 
   
   if [@metadata][kafka][topic] == "test02" {
      mutate {
         add_field => {"[@metadata][index]" => "kafka-test02-%{+YYYY.MM.dd}"}
      }
   }
   
   #移除多餘的字段
   mutate {
      remove_field => ["kafka"]
   }
}

說明:根據業務需求進行ETL數據處理。

這裏,我爲每個主題構建對應的[@metadata][index],並在接下來output中引用。

3) output輸出

output {
	#stdout {  
	#	codec => rubydebug
	#}

	if [type] == "kafka-to-elastic" {
		elasticsearch {
			hosts => ["172.20.32.241:9200"]
			index => "%{[@metadata][index]}"
			timeout => 300
		}
	}
}

3. 實例

場景:

消費kafka中多個主題數據(json格式),通過logstash採集並根據不同主題輸出到Elasticsearch中不同的索引中。

運行測試(這裏,我爲了方便測試打開了stdout輸出到屏幕):

主題test01對應的索引爲 kafka-test01-2020.05.08

數據格式: 

{
	"_index": "kafka-data-2020.05.08",
	"_type": "doc",
	"_id": "knUr9nEBZ5SvKbknPKgD",
	"_version": 1,
	"_score": 1,
	"_source": {
		"SrcDataId": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
		"@timestamp": "2020-05-08T21:22:39.903Z",
		"SrcDataTime": "20200508085539",
		"VendorID": "hikvision",
		"type": "kafka-to-elas",
		"DeviceModelID": "d1eddcbb86d84164b28f244efe155751",
		"DataSource": "ICESDK",
		"Data": {
			"PassTime": "20200508085539",
			"Direction": "0",
			"MotorVehicleID": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
			"PlateFileFormat": "jpeg",
			"VehicleColor": "-1",
			"VehicleClass": "1",
			"MarkTime": "20200508085539",
			"AppearTime": "20200508085539",
			"PlateDeviceID": "11011835011321002022",
			"MotorVehicleStoragePath": "http://192.168.1.171:17999/ICESDKPic/20200508/8/Plate_1588899348077133.jpeg",
			"PlateEventSort": "16",
			"DeviceID": "",
			"TollgateID": "",
			"PlateStoragePath": "http://192.168.1.171:17999/ICESDKPic/20200508/8/Plate_1588899348077133.jpeg",
			"SourceID": "12",
			"PlateNo": "遼LJY888",
			"PlateShotTime": "20200508085539"
		},
		"DeviceId": "",
		"ServerID": "0C68F9AA79A14D95A644598D1D7D1623",
		"ProcessID": "3447a347ebbe433a86cfbfd0a5c5be68",
		"DataType": "MotorVehicle",
		"@version": "1"
	}
}

主題test02對應的索引爲 kafka-test02-2020.05.08 

數據格式: 

{
	"_index": "kafka-test02-2020.05.08",
	"_type": "doc",
	"_id": "lnVD9nEBZ5SvKbknzah0",
	"_version": 1,
	"_score": 1,
	"_source": {
		"PlateDeviceID": "11011835011321002022",
		"DeviceID": "",
		"MotorVehicleID": "8E3B0F63-D5AE-4C2C-AA50-DDF8F39951BE",
		"VehicleClass": "1",
		"type": "kafka-to-elastic",
		"PlateNo": "遼LJY888",
		"PassTime": "20200508085539",
		"AppearTime": "20200508085539",
		"PlateEventSort": "16",
		"PlateShotTime": "20200508085539",
		"Direction": "0",
		"MotorVehicleStoragePath": "http:\192.168.1.171:17999\ICESDKPic\20200508//8/Plate_1588899348077133.jpeg",
		"PlateStoragePath": "http:\192.168.1.171:17999\ICESDKPic\20200508//8/Plate_1588899348077133.jpeg",
		"@timestamp": "2020-05-08T21:49:30.588Z",
		"PlateFileFormat": "jpeg",
		"TollgateID": "",
		"VehicleColor": "-1",
		"SourceID": "12",
		"MarkTime": "20200508085539",
		"@version": "1"
	}
}

 最後,我們可以在head插件中查看輸出的數據。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章