一、需求說明

flume監控linux上一個目錄(/home/flume_data)下進入的文件，並寫入hdfs的相應目錄下(hdfs://master:9000/flume/spool/%Y%m%d%H%M)

二、新建配置文件

1、在conf下新建配置文件hdfs-logger.conf

# Name the components on this agent
spool-hdfs-agent.sources = spool-source
spool-hdfs-agent.sinks = hdfs-sink
spool-hdfs-agent.channels = memory-channel

# Describe/configure the source
spool-hdfs-agent.sources.spool-source.type = spooldir
spool-hdfs-agent.sources.spool-source.spoolDir = /home/flume_data

# Describe the sink
spool-hdfs-agent.sinks.hdfs-sink.type = hdfs
spool-hdfs-agent.sinks.hdfs-sink.hdfs.path = hdfs://master:9000/flume/spool/%Y%m%d%H%M
spool-hdfs-agent.sinks.hdfs-sink.hdfs.useLocalTimeStamp = true
spool-hdfs-agent.sinks.hdfs-sink.hdfs.fileType = CompressedStream
spool-hdfs-agent.sinks.hdfs-sink.hdfs.writeFormat = Text
spool-hdfs-agent.sinks.hdfs-sink.hdfs.codeC = gzip
spool-hdfs-agent.sinks.hdfs-sink.hdfs.filePrefix = wsk
spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollInterval = 30
spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollSize = 1024
spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollCount = 0

# Use a channel which buffers events in memory
spool-hdfs-agent.channels.memory-channel.type = memory
spool-hdfs-agent.channels.memory-channel.capacity = 1000
spool-hdfs-agent.channels.memory-channel.transactionCapacity = 100

# Bind the source and sink to the channel
spool-hdfs-agent.sources.spool-source.channels = memory-channel
spool-hdfs-agent.sinks.hdfs-sink.channel = memory-channel

2、說明

(1)spool-hdfs-agent爲agent的名字，需要在啓動flume命令中的- name中配置的；

(2)/home/flume_data爲flume監控採集目錄；

(3)hdfs://master:9000/flume/spool/%Y%m%d%H%M，爲flume輸出hdfs的目錄地址，%Y%m%d%H%M是輸出文件夾時間格式；

(4)flume有三種滾動方式。
按照時間:spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollInterval = 30
按照大小:spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollSize = 1024
按照count:spool-hdfs-agent.sinks.hdfs-sink.hdfs.rollCount = 0

滾動的意思是當flume監控的目錄達到了配置信息中的某一條滾動方式的時候，會觸發flume提交一個文件到hdfs中（即在hdfs中生成一個文件）

rollInterval

默認值：30
hdfs sink間隔多長將臨時文件滾動成最終目標文件，單位：秒；
如果設置成0，則表示不根據時間來滾動文件；
注：滾動（roll）指的是，hdfs sink將臨時文件重命名成最終目標文件，並新打開一個臨時文件來寫入數據；

rollSize

默認值：1024
當臨時文件達到該大小（單位：bytes）時，滾動成目標文件；
如果設置成0，則表示不根據臨時文件大小來滾動文件；

rollCount

默認值：10
當events數據達到該數量時候，將臨時文件滾動成目標文件；
如果設置成0，則表示不根據events數據來滾動文件；

(5)rollSize控制的大小是指的壓縮前的，所以若hdfs文件使用了壓縮，需調大rollsize的大小

(6)當文件夾下的某個文件被採集到hdfs上，會有個。complete的標誌

(7)使用Spooling Directory Source採集文件數據時若該文件數據已經被採集，再對該文件做修改是會報錯的停止的，其次若放進去一個已經完成採集的同名數據文件也是會報錯停止的

(8)寫HDFS數據可按照時間分區，注意改時間刻度內無數據則不會生成該時間文件夾

(9)生成的文件名稱默認是前綴+時間戳，這個是可以更改的。

三、啓動flume

1、命令

[root@master bin]# ./flume-ng agent --conf conf --conf-file ../conf/hdfs-logger.conf --name spool-hdfs-agent -Dflume.root.logger=INFO,console

2、向採集目錄發送文件

[root@master flumeData]# cp teacher /home/flume_data/
[root@master flumeData]# cp student /home/flume_data/

[root@master flumeData]# cat teacher 
chenlaoshi
malaoshi
haolaoshi
weilaoshi
hualaoshi
[root@master flumeData]# cat student 
zhangsan
lisi
wangwu
xiedajiao
xieguangkun

3、控制檯日誌打印

20/04/21 10:08:56 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: spool-source started
20/04/21 10:09:07 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
20/04/21 10:09:07 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/flume_data/teacher to /home/flume_data/teacher.COMPLETED
20/04/21 10:09:07 INFO hdfs.HDFSCompressedDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
20/04/21 10:09:07 INFO hdfs.BucketWriter: Creating hdfs://master:9000/flume/spool/202004211009/wsk.1587434947074.gz.tmp
20/04/21 10:09:08 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
20/04/21 10:09:08 INFO compress.CodecPool: Got brand-new compressor [.gz]
20/04/21 10:09:17 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
20/04/21 10:09:17 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/flume_data/student to /home/flume_data/student.COMPLETED

4、監控目錄

[root@master flume_data]# ls
student.COMPLETED  teacher.COMPLETED

5、hdfs存儲效果

下載解壓打開

四、此方式的缺點

1、雖然能監控一個文件夾，但是無法監控遞歸的文件夾中的數據；

2、若採集時Flume掛了，無法保證重啓時還從之前文件讀取的那一行繼續採集數據；

Flume使用Spooling Directory Source採集文件夾數據到hdfs

一、需求說明

二、新建配置文件

1、在conf下新建配置文件hdfs-logger.conf

2、說明

三、啓動flume

1、命令

2、向採集目錄發送文件

3、控制檯日誌打印

4、監控目錄

5、hdfs存儲效果

四、此方式的缺點

HTTP URL 詳解

sqoop的安裝及簡單使用

Flume單機安裝及測試

kafka+sparkStreaming+mysql

命令查看yarn當前任務列表

Es爲Hbase創建二級索引思路

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結