一、Flume介紹

1.1 概述

fcloudera開源提供的一個開源的日誌採集工具；
可以從各個地方採集我們的數據
- socket網絡數據包，
- 文件夾，
- 某一個指定的文件裏面，
- kafka消息隊列裏面採集。
可以將採集來的數據，發送到其他地方，比如日之外文件，hdfs數據存儲，kafka消息隊列。

一些概念：

Event： 一個數據單元，消息頭和消息體組成。（Events可以是日誌記錄、 avro 對象等。）
Flow：  Event從源點到達目的點的遷移的抽象。
Agent： 一個獨立的Flume進程,一個agent就是一個JVM，包含組件Source、 Channel、 Sink。

1.2 核心三組件

source：連接數據源，從數據源獲取數據
channel：管道的作用，連接source與sink 主要起到數據的緩衝以以及連接的作用
sink：數據下沉的目的地，採集的數據要發送到哪裏去都是sink說了算

這三個組件運行起來叫做一個flume的實例叫做agen

1.2.1 Source

Source是數據的收集端，負責將數據捕獲後進行特殊的格式化，將數據封裝到事件（event）裏，然後將事件推入Channel中。Flume提供了很多內置的Source，支持 Avro， log4j， syslog 和 http。

常用Source：

Netcat Source  監控一個網絡的套接字端口（ip和端口），來獲取數據
Spool Source監聽一個指定的目錄，即只要應用程序想這個指定的目錄中添加新的文件，
EXEC Source  監聽一條shell命令的執行結果：tail -F hello.txt
Avro Source  監聽上一級Agent的數據流

1.2.2 Channel

用於Source收集數據的緩存，Channel存放數據的單位是Event

常用的Channel:

Memary Channel: 數據存放在一個內存的緩存隊列中

1.2.3 Sink

用於將Channel中的event數據傳送到目的地

常用的Sink:

Logger Sink :是將數據發送到控制檯
HDFS  Sink  :將數據發送到HDFS
Avro  Sink  :實現多級Agent之間的數據連接

一些參數設置

二、Flume 安裝

上傳解壓配置Javahome

tar -zxvf flume-ng-1.6.0-cdh5.14.0.tar.gz -C /export/servers/
cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
cp  flume-env.sh.template flume-env.sh
vim flume-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141

三、Flume採集實例

3.1 從文件夾採集數據到HDFS集羣

Spool Directory Source ——> Memory Channel ——>HDFS Sink

cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
mkdir -p /export/servers/dirfile
vim spooldir.conf

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
##注意：不能往監控目中重複丟同名文件
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /export/servers/dirfile
a1.sources.r1.fileHeader = true

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = hdfs://node01:8020/spooldir/files/%y-%m-%d/%H%M/
#寫入hdfs的文件名前綴
a1.sinks.k1.hdfs.filePrefix = events-
#是否啓用時間上的”捨棄”
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
#sink間隔多長將臨時文件滾動成最終目標文件，單位：秒,默認是30秒
a1.sinks.k1.hdfs.rollInterval = 3
#當臨時文件達到該大小（單位：bytes）時，滾動成目標文件
a1.sinks.k1.hdfs.rollSize = 20
#默認值：10，當events數據達到該數量時候，將臨時文件滾動成目標文件
a1.sinks.k1.hdfs.rollCount = 5
a1.sinks.k1.hdfs.batchSize = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#生成的文件類型，默認是Sequencefile，可用DataStream，則爲普通文本
a1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Channel參數解釋：

capacity：默認該通道中最大的可以存儲的event數量
trasactionCapacity：每次最大可以從source中拿到或者送到sink中的event數量
keep-alive：event添加到通道中或者移出的允許時間

啓動

bin/flume-ng agent -c ./conf -f ./conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console

3.2 高可用Flum-NG配置案例failover

名稱	HOST	角色
Agent1	node01	Web Server
Collector1	node02	AgentMstr1
Collector2	node03	AgentMstr2

將node03機器上面的flume安裝包以及文件生產的兩個目錄拷貝到node01機器上面去

node03

cd /export/servers
scp -r apache-flume-1.6.0-cdh5.14.0-bin/ node01:$PWD
scp -r shells/ taillogs/ node01:$PWD

node01

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf

vim agent.conf

#agent1 name
agent1.channels = c1
agent1.sources = r1
agent1.sinks = k1 k2
#
##set gruop
agent1.sinkgroups = g1
#
##set channel
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
#
agent1.sources.r1.channels = c1
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -F /export/servers/taillogs/access_log
#
agent1.sources.r1.interceptors = i1 i2
agent1.sources.r1.interceptors.i1.type = static
agent1.sources.r1.interceptors.i1.key = Type
agent1.sources.r1.interceptors.i1.value = LOGIN
agent1.sources.r1.interceptors.i2.type = timestamp
#
## set sink1
agent1.sinks.k1.channel = c1
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = node02
agent1.sinks.k1.port = 52020
#
## set sink2
agent1.sinks.k2.channel = c1
agent1.sinks.k2.type = avro
agent1.sinks.k2.hostname = node03
agent1.sinks.k2.port = 52020
#
##set sink group
agent1.sinkgroups.g1.sinks = k1 k2
#
##set failover
agent1.sinkgroups.g1.processor.type = failover

# 誰的數值大就用誰的
agent1.sinkgroups.g1.processor.priority.k1 = 10
agent1.sinkgroups.g1.processor.priority.k2 = 1

# 懲罰時間，如果k1發生故障，10秒不接收數據
agent1.sinkgroups.g1.processor.maxpenalty = 10000

node02與node03配置flumecollection

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
vim collector.conf

node02

#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#
##set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#
## other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = node02
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = node02
a1.sources.r1.channels = c1
#
##set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d
#

node03

#set Agent name
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#
##set channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#
## other node,nna to nns
a1.sources.r1.type = avro
a1.sources.r1.bind = node03
a1.sources.r1.port = 52020
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = Collector
a1.sources.r1.interceptors.i1.value = node03
a1.sources.r1.channels = c1
#
##set sink to hdfs
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path= hdfs://node01:8020/flume/failover/
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.filePrefix=%Y-%m-%d

啓動順序

誰接收誰先啓動

node03 -- > node02 --> node01

node03和node02

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n a1 -c conf -f conf/collector.conf -Dflume.root.logger=DEBUG,console

node01

cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin
bin/flume-ng agent -n agent1 -c conf -f conf/agent.conf -Dflume.root.logger=DEBUG,console

node01機器啓動文件產生腳本

cd  /export/servers/shells
sh tail-file.sh

四、擴展

flume實現收集mysql的數據

兩個jar包：
- flume-ng-sql-source-1.3.7.jar
- MySql的JDBC驅動包放在Flume庫目錄；
創建相關的目錄文件；
配置Flume。

日誌採集工具——Flume

一、Flume介紹

1.1 概述

1.2 核心三組件

1.2.1 Source

1.2.2 Channel

1.2.3 Sink

二、Flume 安裝

三、Flume採集實例

3.1 從文件夾採集數據到HDFS集羣

3.2 高可用Flum-NG配置案例failover

四、擴展

從Git安裝、配置遠程倉庫到免輸入密碼推送一次搞掂

機器學習前菜

使用Sublime對比兩個文件，並解決對比過程中的中文亂碼

入門人工智能之初見Python

解決DNS服務器未響應網絡異常

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結