大數據平臺分佈式搭建 - Flume部署

Section I:文件清單

1.apache-flume-1.8.0-bin.tar.gz

Section II: 下載鏈接

[Flume 下載鏈接]:http://flume.apache.org/releases/index.html

Section III: 通信工具Telnet和Flume部署

總覽,集羣信息:

節點角色 Master Slave1 Slave2
IP 192.168.137.128 192.168.137.129 192.168.137.130
HostName BlogMaster BlogSlave1 BlogSlave2
Hadoop BolgMaster-YES BlogSlave1-YES BlogSlave2-YES
Telnet BolgMaster-YES BlogSlave1-YES BlogSlave2-YES
Flume BolgMaster-YES BlogSlave1-NO BlogSlave2-NO

Step 1: 集羣各節點均需安裝telnet通信工具

BlogMaster、BlogSlave1和BlogSlave2節點均需安裝Telnet通信工具,安裝命令如下:
對於BlogMaster節點:

[root@BlogMaster conf]# yum install telnet

對於BlogSlave1節點:

[root@BlogSlave1 ~]# yum install telnet

對於BlogSlave2節點:

[root@BlogSlave2 ~]# yum install telnet

Step 2: Flume部署

以下操作僅在主節點BlogMaster進行。

  • Step 2.1: 解壓flume安裝包至指定目錄

具體地,解壓指定目錄爲/opt/cluster,即Hadoop集羣所在根目錄,解壓命令如下:

[root@BlogMaster ~]# tar -zxvf apache-flume-1.8.0-bin.tar.gz -C /opt/cluster/
  • Step 2.2: 配置flume-env.sh環境變量(位於:/opt/cluster/apache-flume-1.8.0-bin/conf)

值得注意,進入該目錄後,不出意外只會有flume-env.sh.template的文件。這裏,則需以cp命令將其拷貝並重命名爲flume-env.sh。之後進入該文件,修改其原始關聯的JAVA_HOME,具體如下:

# Enviroment variables can be set here.
export JAVA_HOME=/opt/cluster/jdk1.8.0_181
  • Step 2.3: 配置log4j.properties文件的flume日誌目錄選項(位於:/opt/cluster/apache-flume-1.8.0-bin/conf)

進入該文件後,修改存儲flume運行的日誌記錄的目錄選項,具體如下:

flume.log.dir=/opt/cluster/apache-flume-1.8.0-bin/logs

之後,一定要在flume安裝目錄下創建名爲"logs”的文件夾。

[root@BlogMaster apache-flume-1.8.0-bin]# mkdir logs
  • Step 2.4: 與HDFS交互的Hadoop相關Jar包配置

爲使Flume具備將所監控數據與Hadoop集羣的HDFS系統進行數據交互的能力,此處需要配置Flume與HDFS交互的Hadoop相關Jar的文件,並將其拷貝於Flume安裝目錄下lib子目錄中。這裏所指Jar包文件包含內容,具體如下:

  1. commons-configuration-1.6.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  2. hadoop-auth-2.8.4.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  3. hadoop-common-2.8.4.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/common)
  4. hadoop-hdfs-2.8.4.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs)
  5. commons-io-2.4.jar (位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)
  6. htrace-core4-4.0.1-incubating.jar(位於/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib)

對此,操作如下:
第一步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/common目錄,執行如下命令:

[root@BlogMaster common]# cp hadoop-common-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib

第二步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs目錄,執行如下命令:

[root@BlogMaster hdfs]# cp hadoop-hdfs-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib

第三步: 進入/opt/cluster/hadoop-2.8.4/share/hadoop/tools/lib目錄,執行如下命令:

[root@BlogMaster lib]# cp commons-configuration-1.6.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp hadoop-auth-2.8.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp commons-io-2.4.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
[root@BlogMaster lib]# cp htrace-core4-4.0.1-incubating.jar /opt/cluster/apache-flume-1.8.0-bin/lib/
監控測試

進入Flume安裝目錄下,創建一個專用於Flume執行監控任務的名爲“Job”目錄,以便於管理。

Step 1: 控制檯方式監測數據

進入該Job目錄後,採用touch命令創建一個名爲“netcat-flume-logger.conf”的文件,並進入該文件添加如下內容:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = BlogMaster
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

之後,進入如下三步操作:
第一步: 在主節點BlogMaster執行啓動Flume agent的命令

[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a1 --conf-file job/netcat-flume-logger.conf -Dflume.root.logger=INFO,console

出現如下結果,則說明Flume進入監測狀態:

Info: Sourcing environment configuration script /opt/cluster/apache-flume-1.8.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/opt/cluster/hadoop-2.8.4/bin/hadoop) for HDFS access
Info: Including Hive libraries found via (/opt/cluster/apache-hive-1.2.2-bin) for Hive access
+ exec /opt/cluster/jdk1.8.0_181/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/opt/cluster/apache-flume-1.8.0-bin/conf:/opt/cluster/apache-flume-1.8.0-bin/lib/*:/opt/cluster/hadoop-2.8.4/etc/hadoop:/opt/cluster/hadoop-2.8.4/share/hadoop/common/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/common/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/hdfs/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/yarn/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/lib/*:/opt/cluster/hadoop-2.8.4/share/hadoop/mapreduce/*:/opt/cluster/hadoop-2.8.4/contrib/capacity-scheduler/*.jar:/opt/cluster/apache-hive-1.2.2-bin/lib/*' -Djava.library.path=:/opt/cluster/hadoop-2.8.4/lib/native org.apache.flume.node.Application --name a1 --conf-file job/netcat-flume-logger.conf
SLF4J: Class path contains multiple SLF4J bindings.

第二步: 在任意一臺節點執行產生數據的“生產者”命令

[root@BlogSlave2 ~]# telnet BlogMaster 44444

出現如下結果,則說明進入生產狀態。

[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.

第三步: 對比觀察生產者產生數據和BlogMaster的Shell端反饋數據的異同
在生產者端輸入數據,看主節點BlogMaster Shell端是否反饋相同數據:
生產者端:

[root@BlogSlave2 ~]# telnet BlogMaster 44444
Trying 192.168.137.128...
Connected to BlogMaster.
Escape character is '^]'.
sda
OK
 I love Xiaoxiong
OK

BlogMaster Shell端:

2019-11-15 09:45:35,605 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 73 64 61 0D                                     sda. }
2019-11-15 09:46:21,682 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 20 49 20 6C 6F 76 65 20 58 69 61 6F 78 69 6F 6E  I love Xiaoxion }

兩者的一致,說明本地監測與通信配置成功。

Step 2: HDFS方式監測數據

在執行下述操作前,需啓動Hadoop集羣和Yarn服務。

進一步,進入該Job目錄後,採用touch命令創建一個名爲“flume-file-hdfs.conf”的文件,並進入該文件添加如下內容:

## define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2

## define sources
a2.sources.r2.type = exec
a2.sources.r2.command =tail -F /opt/cluster/apache-hive-1.2.2-bin/logs/hive.log
a2.sources.r2.shell = /bin/bash -c
a2.sources.r2.batchSize=800000

## define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000000
a2.channels.c2.transactionCapacity = 100000

## define sinks
a2.sinks.k2.type = hdfs
##將收集的日誌文件放在對應的目錄下
a2.sinks.k2.hdfs.path = hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/%Y%m%d/%S/
## 文件類型
a2.sinks.k2.hdfs.fileType = DataStream
##文件寫入格式
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 100000
a2.sinks.k2.hdfs.rollInterval=0
##hdfs 設置大小
a2.sinks.k2.hdfs.rollSize=102400000
a2.sinks.k2.hdfs.rollCount=10000
##啓動本地時間戳
a2.sinks.k2.hdfs.useLocalTimeStamp=true 

##日誌文件前綴
a2.sinks.k2.hdfs.filePrefix = events-
a2.sinks.k2.hdfs.round = true
a2.sinks.k2.hdfs.roundValue = 10
a2.sinks.k2.hdfs.roundUnit = second

### bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

其後,執行如下兩步操作:
第一步:啓動Fume Agent,開始監測

[root@BlogMaster apache-flume-1.8.0-bin]# bin/flume-ng agent --conf conf/ --name a2 --conf-file job/flume-file-hdfs.conf -Dflume.root.logger=INFO,console

第二步:操作Hive,產生日誌

hive (flume_test)> select * from student;
OK
student.id	student.name
1	stu1

第三步:查看Flume監測數據是否存儲於HDFS指定目錄
Flume啓動任務Shell端顯示日誌信息,如下:

2019-11-15 10:29:57,724 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/50//events-.1573784997429.tmp
2019-11-15 10:30:00,276 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-15 10:30:00,330 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:251)] Creating hdfs://BlogMaster:9000/flume/hive_hdfs_via_flume/20191115/00//events-.1573785000277.tmp

登錄網址:http://192.168.137.128:50070/explorer.html#/,查看存放於HDFS上的flume文件夾下的數據信息:
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章