Flume多Agent串聯,單Source多Chanel/Sink使用、單Source到HDFS和Kafka

1、兩個Agent串聯

串聯的Agent中間要採用Avro Sink和 Avro Source方式進行數據傳輸
在這裏插入圖片描述
案例:
Agent的結構:source -> channel -> sink -> source -> channel -> sink
Agent,Source選擇:exec->memory->avro->avro->memory->logger
我是一臺虛擬機測試,如果是兩臺或多臺Agent,要注意bind的地址

exec-avro-agent.conf

###exec-avro-agent.conf文件###
exec-avro-agent.sources = exec-source
exec-avro-agent.channels = memory-channel
exec-avro-agent.sinks = avro-sink

exec-avro-agent.sources.exec-source.type = exec
exec-avro-agent.sources.exec-source.command = tail -F /home/hadoop/data/flume/multiple/chuanlian/input/avro_access.data

exec-avro-agent.channels.memory-channel.type = memory

exec-avro-agent.sinks.avro-sink.type = avro
exec-avro-agent.sinks.avro-sink.hostname = localhost
exec-avro-agent.sinks.avro-sink.port = 44444

exec-avro-agent.sources.exec-source.channels = memory-channel
exec-avro-agent.sinks.avro-sink.channel = memory-channel

avro-logger-agent.conf

###avro-logger-agent.conf文件###
avro-logger-agent.sources = avro-source
avro-logger-agent.channels = memory-channel
avro-logger-agent.sinks = logger-sink

avro-logger-agent.sources.avro-source.type = avro
avro-logger-agent.sources.avro-source.bind = localhost
avro-logger-agent.sources.avro-source.port = 44444

avro-logger-agent.channels.memory-channel.type = memory

avro-logger-agent.sinks.logger-sink.type = logger

avro-logger-agent.sources.avro-source.channels = memory-channel
avro-logger-agent.sinks.logger-sink.channel = memory-channel

先啓動 avro-logger agent

###先啓動 avro-logger agent ####
flume-ng agent \
--name avro-logger-agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro-logger-agent.conf \
-Dflume.root.logger=INFO,console

克隆一臺,再啓動 exec-avro agent

###再啓動 exec-avro agent ####
flume-ng agent \
--name exec-avro-agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec-avro-agent.conf \
-Dflume.root.logger=INFO,console

測試

[hadoop@vm01 input]$ mkdir -p multiple/chuanlian/input/
[hadoop@vm01 input]$ vi avro_access.data
[hadoop@vm01 input]$ echo "hello hadoop" >>avro_access.data 

在這裏插入圖片描述

2、單Source多Chanel/Sink

Multiplexing the flow:

  • Multiplexing Channel Selector :多路Channel選擇器,是將根據自定義的選擇器規則,將數據發送到指定Channel上,比如同一份日誌中根據不同業務,選擇性的Sink到HDFS不同目錄下
  • Replicating Channel Selector :多副本Channel選擇器,每個Channel數據是一樣的。比如同時傳送數據到HDFS做批處理、Kafka流式處理。

在這裏插入圖片描述
Replicating Channel Selector
一個Source的數據傳送一份到Hdfs,另一份輸出到控制檯

NetCat Source -->memory-->Hdfs
			  -->memory-->logger

配置文件
replicating-channel-agent.conf

a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sources.r1.selector.type = replicating

a1.channels.c1.type = memory
a1.channels.c2.type = memory

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://vm01:9000/flume/multipleFlow/%Y%m%d%H%M
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.filePrefix = wsktest-
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 100000000
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text

a1.sinks.k2.type = logger

a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

啓動

flume-ng agent \
--name a1 \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/replicating-channel-agent.conf \
-Dflume.root.logger=INFO,console

測試
克隆一臺,telnet

[hadoop@vm01 conf]$ telnet localhost 44444
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello hadoop
OK

在這裏插入圖片描述
去Hdfs查看

[hadoop@vm01 input]$ hdfs dfs -text /flume/multipleFlow/201908081917/*
hello hadoop
[hadoop@vm01 input]$ 

3、單Source到HDFS和Kafka

配置信息

Taildir-HdfsAndKafka-Agnet.sources = taildir-source
Taildir-HdfsAndKafka-Agnet.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks = hdfs-sink kafka-sink

Taildir-HdfsAndKafka-Agnet.sources.taildir-source.type = TAILDIR
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups = f1
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.filegroups.f1 = /home/hadoop/data/flume/HdfsAndKafka/input/.*
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.positionFile = /home/hadoop/data/flume/HdfsAndKafka/taildir_position/taildir_position.json
Taildir-HdfsAndKafka-Agnet.sources.taildir-source.selector.type = replicating

Taildir-HdfsAndKafka-Agnet.channels.c1.type = memory
Taildir-HdfsAndKafka-Agnet.channels.c2.type = memory

Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.type = hdfs
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.path = hdfs://hadoop001:9000/flume/HdfsAndKafka/%Y%m%d%H%M
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.useLocalTimeStamp=true
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.filePrefix = wsktest-
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollInterval = 10
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollSize = 100000000
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.rollCount = 0
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.fileType=DataStream
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.hdfs.writeFormat=Text

Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.brokerList = localhost:9092
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.topic = wsk_test


Taildir-HdfsAndKafka-Agnet.sources.taildir-source.channels = c1 c2
Taildir-HdfsAndKafka-Agnet.sinks.hdfs-sink.channel = c1
Taildir-HdfsAndKafka-Agnet.sinks.kafka-sink.channel = c2

啓動

flume-ng agent \
--name Taildir-HdfsAndKafka-Agnet \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/Taildir-HdfsAndKafka-Agnet.conf \
-Dflume.root.logger=INFO,console

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章