文章目錄

一、Flume概述

二、企業開發案例

一、Flume概述

1.1 Flume定義

Flume是Cloudera提供的一個高可用的，高可靠的，分佈式的海量日誌採集、聚合和傳輸的系統。Flume基於流式架構，靈活簡單。

Flume最主要的作用就是，實時讀取服務器本地磁盤的數據，將數據寫入到HDFS。

1.2 Flume的優點

可以和任意存儲進程集成
輸入的的數據速率大於寫入目的存儲的速率，Flume會進行緩衝，減小HDFS的壓力。
Flume中的事務基於Channel，使用了兩個事務模型（sender + receiver），確保消息被可靠發送。

Flume使用兩個獨立的事務分別負責從soucrce到channel，以及從channel到sink的事件傳遞。一旦事務中所有的數據全部成功提交到channel，那麼source才認爲該數據讀取完成。同理，只有成功被sink寫出去的數據，纔會從channel中移除。

1.3 Flume組成架構

Put事務流程：

doPut：將批數據先寫入臨時緩衝區putList
doCommit：檢查channel內存隊列是否足夠合併。
doRollback：channel內存隊列空間不足，回滾數據

Take事務：

doTake：先將數據取到臨時緩衝區takeList
doCommit：如果數據全部發送成功，則清除臨時緩衝區takeList
doRollback：數據發送過程中如果出現異常，rollback將臨時緩衝區takeList中的數據歸還給channel內存隊列。

下面我們來詳細介紹一下Flume架構中的組件。

①Agent

Agent是一個JVM進程，它以事件的形式將數據從源頭送至目的。
Agent主要有3個部分組成：Source、Channel、Sink。

②Source

Source是負責接收數據到Flume Agent的組件。Source組件可以處理各種類型、各種格式的日誌數據，包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。

③Channel

Channel是位於Source和Sink之間的緩衝區。因此，Channel允許Source和Sink運作在不同的速率上。Channel是線程安全的，可以同時處理幾個Source的寫入操作和幾個Sink的讀取操作。

Flume自帶兩種Channel：Memory Channel和File Channel。

Memory Channel：內存中的隊列。Memory Channel在不需要關心數據丟失的情景下適用。如果需要關心數據丟失，那麼Memory Channel就不應該使用，因爲程序死亡、機器宕機或者重啓都會導致數據丟失。

File Channel：將所有事件寫到磁盤。因此在程序關閉或機器宕機的情況下不會丟失數據。

④Sink

Sink不斷地輪詢Channel中的事件且批量地移除它們，並將這些事件批量寫入到存儲或索引系統、或者被髮送到另一個Flume Agent。

Sink是完全事務性的。在從Channel批量刪除數據之前，每個Sink用Channel啓動一個事務。批量事件一旦成功寫出到存儲系統或下一個Flume Agent，Sink就利用Channel提交事務。事務一旦被提交，該Channel從自己的內部緩衝區刪除事件。

Sink組件目的地包括hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定義

⑤Event

Flume數據傳輸的基本單元，以事件的形式將數據從源頭送至目的地。Event由可選的header和載有數據的一個byte array構成。Header是容納了key-value字符串對的HashMap。

1.4 Flume拓撲結構

①Flume Agent連接

這種模式是將多個Flume給順序連接起來了，從最初的Source開始到最終Sink傳送的目的存儲系統。此模式不建議橋接過多的Flume數量，Flume數量過多不僅會影響傳輸速率，而且一旦傳輸過程中某個節點Flume宕機，會影響整個傳輸系統。

②單source，多channel、sink

Flume支持將事件流向一個或者多個目的地。這種模式將數據源複製到多個Channel中，每個Channel都有相同的數據，Sink可以選擇傳送的不同的目的地。

③Flume負載均衡

Flume支持使用將多個Sink邏輯上分到一個Sink組，Flume將數據發送到不同的Sink，主要解決負載均衡和故障轉移問題。

④ Flume Agent聚合

這種模式是我們最常見的，也非常實用，日常web應用通常分佈在上百個服務器，大者甚至上千個、上萬個服務器。產生的日誌，處理起來也非常麻煩。用Flume的這種組合方式能很好的解決這一問題，每臺服務器部署一個Flume採集日誌，傳送到一個集中收集日誌的Flume，再由此Flume上傳到hdfs、hive、hbase、jms等，進行日誌分析。

1.5 Flume Agent內部原理

1.6 Flume安裝

解壓apache-flume-1.7.0-bin.tar.gz到/opt/module/目錄下

[root@hadoop100 software]$ tar -zxf apache-flume-1.7.0-bin.tar.gz -C 
/opt/module/

2.複製conf下的flume-env.sh.template爲flume-env.sh，並配置JAVA_HOME

[root@hadoop100 conf]$ mv flume-env.sh.template flume-env.sh
[root@hadoop100 conf]$ vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

二、企業開發案例

2.1 監控端口數據

案例需求： 首先啓動Flume任務，監控本機44444端口，服務端；然後通過netcat工具向本機44444端口發送消息，客戶端；最後Flume將監聽的數據實時顯示在控制檯。

實現步驟：

① 創建Flume Agent配置文件flume-netcat-logger.conf

在Flume目錄下創建Job文件夾並進入Job文件夾

[root@hadoop100 flume]# mkdir job
[root@hadoop100 flume]# cd job/

創建Flume Agent配置文件flume-netcat-logger.conf

[root@hadoop100 flume]# vim flume-netcat-logger.conf
# Name the components on this agent
# a1 :表示agent的名稱
a1.sources = r1  #r1 :表示a1的輸入源
a1.sinks = k1 #k1 :表示a1的輸出目的地
a1.channels = c1 #c1：表示a1的緩衝區

# Describe/configure the source
a1.sources.r1.type = netcat #表示a1的輸入源類型爲netcat端口類型
a1.sources.r1.bind = localhost #表示a1的監聽的主機
a1.sources.r1.port = 44444 #表示a1的監聽的端口號

# Describe the sink
a1.sinks.k1.type = logger #表示a1的輸出目的地是控制檯logger類型

# Use a channel which buffers events in memory
a1.channels.c1.type = memory #表示a1的channel類型是memory內存型
a1.channels.c1.capacity = 1000 #表示al的channel總容量1000個event
#表示a1的channel傳輸時收集到了100條event以後再去提交事務
a1.channels.c1.transactionCapacity = 100 
										
# Bind the source and sink to the channel
a1.sources.r1.channels = c1 #表示將r1和c1連接起來
a1.sinks.k1.channel = c1 # 表示將k1和c1連接起來

②開啓Flume監聽端口

第一種寫法：

[root@hadoop100 flume]# bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

第二種寫法：

[root@hadoop100 flume]$ bin/flume-ng agent -c conf/ -n a1 –f 
job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

參數說明：

--conf conf/：表示配置文件存儲在conf/目錄
--name a1：表示給agent起名爲a1
--conf-file job/flume-netcat.conf ：flume本次啓動讀取的配置文件是在job文件夾下的flume-telnet.conf文件。
-Dflume.root.logger==INFO,console ：-D表示flume運行時動態修改flume.root.logger參數屬性值，並將控制檯日誌打印級別設置爲INFO級別。日誌級別包括:log、info、warn、error。

③ 使用netcat工具向44444端口發送內容

[root@hadoop100 flume]$ nc localhost 44444
Hello Flume

④在Flume監聽頁面觀察接收數據情況

2.2 實時讀取本地文件到HDFS

案例需求： 實時監控Hive日誌，並上傳到HDFS中

實現步驟：

①Flume要想將數據輸出到HDFS，必須持有Hadoop相關jar包

將commons-configuration-1.6.jar、hadoop-auth-2.7.2.jar、hadoop-common-2.7.2.jar、hadoop-hdfs-2.7.2.jar、commons-io-2.4.jar、htrace-core-3.1.0-incubating.jar拷貝到/opt/module/flume/lib文件夾下。

②創建flume-file-hdfs.conf文件

內容如下：

# Name the components on this agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

# Describe/configure the source
a2.sources.r2.type = exec
a2.sources.r2.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a2.sources.r2.shell = /bin/bash -c

# Describe the sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop100:9000/flume/%Y%m%d/%H
#上傳文件的前綴
a2.sinks.k2.hdfs.filePrefix = logs-
#是否按照時間滾動文件夾
a2.sinks.k2.hdfs.round = true
#多少時間單位創建一個新的文件夾
a2.sinks.k2.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k2.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k2.hdfs.batchSize = 1000
#設置文件類型，可支持壓縮
a2.sinks.k2.hdfs.fileType = DataStream
#多久生成一個新的文件
a2.sinks.k2.hdfs.rollInterval = 60
#設置每個文件的滾動大小
a2.sinks.k2.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a2.sinks.k2.hdfs.rollCount = 0

# Use a channel which buffers events in memory
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

注意：對於所有與時間相關的轉義序列，Event Header中必須存在以"timestamp"的key（除非hdfs.useLocalTimeStamp設置爲true，此方法會使用TimestampInterceptor自動添加timestamp），即設置a3.sinks.k3.hdfs.useLocalTimeStamp = true。

③ 執行監控配置

[root@hadoop100 flume]# bin/flume-ng agent --conf conf/ --name a2 
--conf-file job/flume-file-hdfs.conf

④開啓Hadoop和Hive並操作Hive產生日誌

[root@hadoop100 hadoop-2.7.2]$ sbin/start-dfs.sh
[root@hadoop101 hadoop-2.7.2]$ sbin/start-yarn.sh

[root@hadoop100 hive]$ bin/hive
hive (default)>

⑤在HDFS上查看文件

2.3 實時讀取目錄文件到HDFS

案例需求： 使用Flume監聽整個目錄的文件

實現步驟：

① 創建配置文件flume-dir-hdfs.conf

a3.sources = r3
a3.sinks = k3
a3.channels = c3

# Describe/configure the source
a3.sources.r3.type = spooldir
#監控的地址
a3.sources.r3.spoolDir = /opt/module/flume-1.7.0/upload
a3.sources.r3.fileSuffix = .COMPLETED
a3.sources.r3.fileHeader = true
#忽略所有以.tmp結尾的文件，不上傳
a3.sources.r3.ignorePattern = ([^ ]*\.tmp)

# Describe the sink
a3.sinks.k3.type = hdfs
#文件上傳到hdfs的路徑
a3.sinks.k3.hdfs.path = hdfs://hadoop100:9000/flume/upload/%Y%m%d/%H
#上傳文件的前綴
a3.sinks.k3.hdfs.filePrefix = upload-
#是否按照時間滾動文件夾
a3.sinks.k3.hdfs.round = true
#多少時間單位創建一個新的文件夾
a3.sinks.k3.hdfs.roundValue = 1
#重新定義時間單位
a3.sinks.k3.hdfs.roundUnit = hour
#是否使用本地時間戳
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a3.sinks.k3.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a3.sinks.k3.hdfs.fileType = DataStream
#多久生成一個新的文件
a3.sinks.k3.hdfs.rollInterval = 60
#設置每個文件的滾動大小大概是128M
a3.sinks.k3.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a3.sinks.k3.hdfs.rollCount = 0

# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

②啓動監控文件夾命令

[root@hadoop100 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/flume-dir-hdfs.conf

說明：
加粗樣式

不要在監控目錄中創建並持續修改文件
上傳完成的文件會以.COMPLETED結尾
被監控文件夾每500毫秒掃描一次文件變動

②向upload文件夾中添加文件

[root@hadoop100 flume]$ mkdir upload
[root@hadoop100 upload]$ vim  test.txt
123
456

③查看HDFS上的數據

、
④查看upload文件夾

2.4 單數據源多出口案例(選擇器)

案例需求： 使用Flume-1監控文件變動，Flume-1將變動內容傳遞給Flume-2，Flume-2負責存儲到HDFS。同時Flume-1將變動內容傳遞給Flume-3，Flume-3負責輸出到Local FileSystem。

實現步驟：

①準備工作

在/opt/module/flume/job目錄下創建group1文件夾：
[root@hadoop100 job]# mkdir group1/

在/opt/module/data/目錄下創建flume3文件夾
[root@hadoop100 data]# mkdir flume3

②創建flume-file-flume.conf

配置1個接收日誌文件的source和兩個channel、兩個sink，分別輸送給flume-flume-hdfs和flume-flume-dir。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 將數據流複製給所有channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一個數據發送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

注：Avro是由Hadoop創始人Doug Cutting創建的一種語言無關的數據序列化和RPC框架。

③創建flume-flume-hdfs.conf

配置上級Flume輸出的Source，輸出是到HDFS的Sink。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 將數據流複製給所有channel
a1.sources.r1.selector.type = replicating

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
# sink端的avro是一個數據發送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100 
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

[root@hadoop100 group1]# 
[root@hadoop100 group1]# cat flume-flume-hdfs.conf 
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
# source端的avro是一個數據接收服務
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = hdfs
a2.sinks.k1.hdfs.path = hdfs://hadoop100:9000/flume2/%Y%m%d/%H
#上傳文件的前綴
a2.sinks.k1.hdfs.filePrefix = flume2-
#是否按照時間滾動文件夾
a2.sinks.k1.hdfs.round = true
#多少時間單位創建一個新的文件夾
a2.sinks.k1.hdfs.roundValue = 1
#重新定義時間單位
a2.sinks.k1.hdfs.roundUnit = hour
#是否使用本地時間戳
a2.sinks.k1.hdfs.useLocalTimeStamp = true
#積攢多少個Event才flush到HDFS一次
a2.sinks.k1.hdfs.batchSize = 100
#設置文件類型，可支持壓縮
a2.sinks.k1.hdfs.fileType = DataStream
#多久生成一個新的文件
a2.sinks.k1.hdfs.rollInterval = 600
#設置每個文件的滾動大小大概是128M
a2.sinks.k1.hdfs.rollSize = 134217700
#文件的滾動與Event數量無關
a2.sinks.k1.hdfs.rollCount = 0

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

④創建flume-flume-dir.conf

配置上級Flume輸出的Source，輸出是到本地目錄的Sink。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = file_roll
a3.sinks.k1.sink.directory = /opt/module/data/flume3

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

⑤執行配置文件

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a3
 --conf-file job/group1/flume-flume-dir.conf

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a2
 --conf-file job/group1/flume-flume-hdfs.conf

[root@hadoop100 flume-1.7.0]$ bin/flume-ng agent --conf conf/ --name a1
 --conf-file job/group1/flume-file-flume.conf

⑥執行Hive命令

[root@hadoop100 hive]$ bin/hive
hive (default)> select * from stu;

⑦檢查數據

HDFS：

本地：

2.5 單數據源多出口案例(Sink組)

案例需求： 使用Flume-1監控控制檯實時輸入數據，Flume-1將內容輪訓分別傳遞給Flume-2，Flume-3然後在控制檯打印。

實現步驟：

①準備工作

在/opt/module/flume/job下group2文件夾

[root@hadoop100 job]# mkdir group2

②創建flume-netcat-flume.conf

配置1個接收日誌文件的source和1個channel、兩個sink，分別輸送給flume-flume-console1和flume-flume-console2。

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = round_robin
a1.sinkgroups.g1.processor.selector.maxTimeOut=10000

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop100
a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop100
a1.sinks.k2.port = 4142

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

③創建flume-flume-console1.conf和flume-flume-console2.conf

配置上級Flume輸出的Source，輸出是到本地控制檯。

flume-flume-console1.conf：

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.bind = hadoop100
a2.sources.r1.port = 4141

# Describe the sink
a2.sinks.k1.type = logger

# Describe the channel
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

flume-flume-console2.conf：

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c2

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop100
a3.sources.r1.port = 4142

# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c2.type = memory
a3.channels.c2.capacity = 1000
a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c2
a3.sinks.k1.channel = c2

④執行配置文件

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a3
--conf-file job/group2/flume-flume-console2.conf 
-Dflume.root.logger=INFO,console

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a2
--conf-file job/group2/flume-flume-console1.conf 
-Dflume.root.logger=INFO,console

[root@hadoop100 flume-1.7.0]#  bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/group2/flume-netcat-flume.conf

⑤查看Flume2及Flume3的控制檯打印日誌

2.6 多數據源彙總案例

案例需求：

Hadoop100上的Flume-1監控文件/opt/module/group.log；
Hadoop101上的Flume-2監控某一個端口的數據流；
Flume-1與Flume-2將數據發送給hadoop102上的Flume-3，Flume-3將最終數據打印到控制檯。

實現步驟：

①準備工作
在/opt/module/flume/job下group3文件夾

[root@hadoop100 job]# mkdir group3

②創建flume1-logger-flume.conf

Hadoop100：配置Source用於監控hive.log文件，配置Sink輸出數據到下一級Flume。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive-1.2.1/logs/hive.log
a1.sources.r1.shell = /bin/bash -c

# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141

# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

③創建flume2-netcat-flume.conf

Hadoop101：配置Source監控端口44444數據流，配置Sink數據到下一級Flume。

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1

# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = hadoop101
a2.sources.r1.port = 44444

# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname = hadoop102
a2.sinks.k1.port = 4141

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

④創建flume3-flume-logger.conf

Hadoop102：配置source用於接收flume1與flume2發送過來的數據流，最終合併後sink到控制檯。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1

# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = hadoop102
a3.sources.r1.port = 4141

# Describe the sink
# Describe the sink
a3.sinks.k1.type = logger

# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

⑤執行配置文件

[root@hadoop102 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a3 
--conf-file job/group3/flume3-flume-logger.conf
 -Dflume.root.logger=INFO,console

[root@hadoop101 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a2 
--conf-file job/group3/flume2-netcat-flume.conf

[root@hadoop100 flume-1.7.0]#   bin/flume-ng agent --conf conf/ --name a1 
--conf-file job/group3/flume1-logger-flume.conf

Flume(一)：概述和企業開發案例

文章目錄

一、Flume概述

1.1 Flume定義

1.2 Flume的優點

1.3 Flume組成架構

1.4 Flume拓撲結構

1.5 Flume Agent內部原理

1.6 Flume安裝

二、企業開發案例

2.1 監控端口數據

2.2 實時讀取本地文件到HDFS

2.3 實時讀取目錄文件到HDFS

2.4 單數據源多出口案例(選擇器)

2.5 單數據源多出口案例(Sink組)

2.6 多數據源彙總案例

Hive(五)：企業調優

Kafka(三)：面試題

Flume(一)：概述和企業開發案例

Flume(二)：監控、自定義組件、面試題

HBase(三)：集成Hive、HBase優化

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結