Section I:文件清單
1.apache-flume-1.8.0-bin.tar.gz
Section II: 下載鏈接
[Flume 下載鏈接]:http://flume.apache.org/releases/index.html
Section III: 通信工具Telnet和Flume部署
總覽,集羣信息:
節點角色 | Master | Slave1 | Slave2 |
---|---|---|---|
IP | 192.168.137.128 | 192.168.137.129 | 192.168.137.130 |
HostName | BlogMaster | BlogSlave1 | BlogSlave2 |
Hadoop | BolgMaster-YES | BlogSlave1-YES | BlogSlave2-YES |
Telnet | BolgMaster-YES | BlogSlave1-YES | BlogSlave2-YES |
Flume | BolgMaster-YES | BlogSlave1-NO | BlogSlave2-NO |
Step 1: 集羣各節點均需安裝telnet通信工具
BlogMaster、BlogSlave1和BlogSlave2節點均需安裝Telnet通信工具,安裝命令如下:
對於BlogMaster節點:
[root@BlogMaster conf]# yum install telnet
對於BlogSlave1節點:
[root@BlogSlave1 ~]# yum install telnet
對於BlogSlave2節點:
[root@BlogSlave2 ~]# yum install telnet
Step 2: Flume部署
以下操作僅在主節點BlogMaster進行。
- Step 2.1: 解壓flume安裝包至指定目錄
具體地,解壓指定目錄爲/opt/cluster,即Hadoop集羣所在根目錄,解壓命令如下:
[root@BlogMaster ~]# tar -zxvf apache-flume-1.8.0-bin.tar.gz -C /opt/cluster/
- Step 2.2: 配置flume-env.sh環境變量(位於:/opt/cluster/apache-flume-1.8.0-bin/conf)
值得注意,進入該目錄後,不出意外只會有flume-env.sh.template的文件。這裏,則需以cp命令將其拷貝並重命名爲flume-env.sh。之後進入該文件,修改其原始關聯的JAVA_HOME,具體如下:
# Enviroment variables can be set here.
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
- Step 2.3: 配置log4j.properties文件的flume日誌目錄選項(位於:/opt/cluster/apache-flume-1.8.0-bin/conf)
進入該文件後,修改存儲flume運行的日誌記錄的目錄選項,具體如下:
flume.log.dir=/opt/cluster/apache-flume-1.8.0-bin/logs
之後,一定要在flume安裝目錄下創建名爲"logs”的文件夾。
[root@BlogMaster apache-flume-1.8.0-bin]# mkdir logs
- Step 2.4: 與HDFS交互的Hadoop相關Jar包配置
爲使Flume具備將所監控數據與Hadoop集羣的HDFS系統進行數據交互的能力,此處需要配置Flume與HDFS交互的Hadoop相關Jar的文件,並將其拷貝於Flume安裝目錄下lib子目錄中。這裏所指Jar包文件包含內容,具體如下:
- commons-configuration-1.6.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
- hadoop-auth-2.8.4.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
- hadoop-common-2.8.4.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/common)
- hadoop-hdfs-2.8.4.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs)
- commons-io-2.4.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
- htrace-core4-4.0.1-incubating.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
對此,操作如下:
第一步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/common目錄,執行如下命令:
[root@BlogMaster common]# cp hadoop-common-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib
第二步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs目錄,執行如下命令:
[root@BlogMaster hdfs]# cp hadoop-hdfs-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib
第三步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib目錄,執行如下命令:
[root@BlogMaster lib]# cp commons-configuration-1.6.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp hadoop-auth-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp commons-io-2.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp htrace-core4-4.0.1-incubating.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
監控測試
進入Flume安裝目錄下,創建一個專用於Flume執行監控任務的名爲“Job”目錄,以便於管理。
Step 1: 控制檯方式監測數據
進入該Job目錄後,採用touch命令創建一個名爲“netcat-flume-logger.conf”的文件,並進入該文件添加如下內容:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = BlogMaster
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
之後,進入如下三步操作:
第一步: 在主節點BlogMaster執行啓動Flume agent的命令
[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a1 --conf-file job/netcat-flume-logger.conf -Dflume.root.logger=INFO,console
出現如下結果,則說明Flume進入監測狀態:
Info: Sourcing environment configuration script /opt/cluster/apache-flume-1.8.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/cluster/hadoop-2.8.4/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/cluster/apache-hive-1.2.2-bin) for Hive access
+ exec /opt/cluster/jdk1.8.0_181/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/cluster/apache-flume-1.8.0-bin/conf:/opt/cluster/apache-flume-1.8.0-bin/lib/*:/opt/cluster/hadoop-2.8.4/etc/hadoop:/opt/cluster/hadoop-2.8.4/share/hadoop/common/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/common/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/*:/opt/cluster/hadoop-2.8.4/contrib/capacity-scheduler/*.jar:/opt/cluster/apache-hive-1.2.2-bin/lib/*' -Djava.library.path=:/opt/cluster/hadoop-2.8.4/lib/native org.apache.flume.node.Application --name a1 --conf-file job/netcat-flume-logger.conf
SLF4J: Class path contains multiple SLF4J bindings.
第二步: 在任意一臺節點執行產生數據的“生產者”命令
[root@BlogSlave2 ~]# telnet BlogMaster 44444
出現如下結果,則說明進入生產狀態。
[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.
第三步: 對比觀察生產者產生數據和BlogMaster的Shell端反饋數據的異同
在生產者端輸入數據,看主節點BlogMaster Shell端是否反饋相同數據:
生產者端:
[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.
sda
OK
I love Xiaoxiong
OK
BlogMaster Shell端:
2019-11-15 09:45:35,605 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 64 61 0D sda. }
2019-11-15 09:46:21,682 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 20 49 20 6C 6F 76 65 20 58 69 61 6F 78 69 6F 6E I love Xiaoxion }
兩者的一致,說明本地監測與通信配置成功。
Step 2: HDFS方式監測數據
在執行下述操作前,需啓動Hadoop集羣和Yarn服務。
進一步,進入該Job目錄後,採用touch命令創建一個名爲“flume-file-hdfs.conf”的文件,並進入該文件添加如下內容:
## define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2
## define sources
a2.sources.r2.type = exec
a2.sources.r2.command =tail -F /opt/cluster/apache-hive-1.2.2-bin/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
a2.sources.r2.batchSize=800000
## define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000000
a2.channels.c2.transactionCapacity = 100000
## define sinks
a2.sinks.k2.type = hdfs
##將收集的日誌文件放在對應的目錄下
a2.sinks.k2.hdfs.path = hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/%Y%m%d/%S/
## 文件類型
a2.sinks.k2.hdfs.fileType = DataStream
##文件寫入格式
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 100000
a2.sinks.k2.hdfs.rollInterval=0
##hdfs 設置大小
a2.sinks.k2.hdfs.rollSize=102400000
a2.sinks.k2.hdfs.rollCount=10000
##啓動本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp=true
##日誌文件前綴
a2.sinks.k2.hdfs.filePrefix = events-
a2.sinks.k2.hdfs.round = true
a2.sinks.k2.hdfs.roundValue = 10
a2.sinks.k2.hdfs.roundUnit = second
### bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
其後,執行如下兩步操作:
第一步:啓動Fume Agent,開始監測
[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf -Dflume.root.logger=INFO,console
第二步:操作Hive,產生日誌
hive (flume_test)> select * from student;
OK
student.id student.name
1 stu1
第三步:查看Flume監測數據是否存儲於HDFS指定目錄
Flume啓動任務Shell端顯示日誌信息,如下:
2019-11-15 10:29:57,724 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/50//events-.1573784997429.tmp
2019-11-15 10:30:00,276 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-15 10:30:00,330 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/00//events-.1573785000277.tmp
登錄網址:http://192.168.137.128:50070/explorer.html#/,查看存放於HDFS上的flume文件夾下的數據信息: