flume日誌採集及斷點去重模塊(mac\linux安裝)

mac安裝直接:brew install flume

官網文檔參考:http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html

flume模塊agent主要分爲三塊sourse,channel、sink三塊,因爲封裝的比較好,不需要編程,只需要設置三塊具體模塊及參數就行
在這裏插入圖片描述
brew安裝文件一般在/usr/local/Cellar/flume/1.9.0_1目錄下:
編輯自己的配置文件:/libexec/conf/下創建
比如test3.conf

運行需要在 /bin 目錄下執行

flume-ng agent --conf conf --conf-file ../libexec/conf/test3.conf --name a1 -Dflume.root.logger=INFO,console
 指定Agent的組件名稱
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 指定Flume source(要監聽的路徑)

a1.sources.r1.type = TAILDIR
# 元數據位置
a1.sources.r1.positionFile = /opt/logs/taildir_position.json
# 監控的目錄
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1=/opt/logs/test3/.*log
a1.sources.r1.skipToEnd = True
a1.sources.r1.headers.f1.headerKey1 = aaa
a1.sources.r1.fileHeader = true


# 指定Flume sink
a1.sinks = k1
a1.sinks = k1
a1.sinks.k1.type = http
a1.sinks.k1.endpoint = http://127.0.0.1:5007/test
a1.sinks.k1.connectTimeout = 2000
a1.sinks.k1.requestTimeout = 2000
a1.sinks.k1.acceptHeader = application/json
a1.sinks.k1.contentTypeHeader = application/json
a1.sinks.k1.defaultBackoff = flase
a1.sinks.k1.defaultRollback = flase
a1.sinks.k1.defaultIncrementMetrics = false
a1.sinks.k1.backoff.4XX = false
a1.sinks.k1.rollback.4XX = false
a1.sinks.k1.incrementMetrics.4XX = true
a1.sinks.k1.backoff.200 = false
a1.sinks.k1.rollback.200 = false
a1.sinks.k1.incrementMetrics.200 = true

# 指定Flume channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100


# 綁定source和sink到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

flume 正則過濾,直接在source模塊裏添加

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = regex_filter
a1.sources.r1.interceptors.i1.regex = .*A.*
#如果excludeEvents設爲false,表示過濾掉不是以A開頭的events。如果excludeEvents設爲true,則表示過濾掉以A開頭的events。

a1.sources.r1.interceptors.i1.excludeEvents = flase

對接kafka,sink改爲

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = ***
a1.sinks.k1.kafka.bootstrap.servers = 127.0.0.1:9092

日誌採集採用resouse TAILDIR模塊,因爲會記錄偏移量

  • 注意的是如果你外部每次打開編輯log都會全量數據重複讀取

  • 只能實時傳入纔行,比如下面方式(不能vim打開編輯)
    sh-3.2# echo ‘“a”’ >> test2.log
    sh-3.2# echo “{‘a’:1}” >> test2.log

  • 然後就是sink http需要把回滾重試關閉
    a1.sinks.k1.defaultBackoff = flase
    a1.sinks.k1.defaultRollback = flase

  • 讀取寫入文件需要提前設置權限,sudo

linux安裝

參考:https://www.jianshu.com/p/3e4f7db8080f

1、直接官網下載
wget https://mirror.bit.edu.cn/apache/flume/1.9.0/apache-flume-1.9.0-bin.tar.gz
2、創建文件夾,加壓
mkdir /opt/flume
tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /opt/flume
3、今日conf文件下更改
cp flume-env.sh.template flume-env.sh
vim flume-env.sh
export JAVA_HOME=/apps/jdk1.8.0_60
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
4、創建配置文件測試
cp flume-conf.properties.template flume-conf.properties
# 定義一個服務名稱爲a1,source,channel,sink分別爲r1, c1, k1
a1.sources = r1
a1.channels = c1
a1.sinks = k1
# r1監聽8888端口的網絡狀態
a1.sources.r1.type = netcat
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 8888
# k1的輸入時日誌形式
a1.sinks.k1.type = logger
# c1類型爲內存
a1.channels.c1.type = memory
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

5、啓動區bin目錄下運行
./flume-ng agent -c ../conf -f ../conf/flume-conf.properties -n a1 -Dflume.root.logger=INFO,console

6、然後另起窗口或者通過其他機器連接測試
nc localhost 8888
(如果沒安裝nc可以yum install nc.x86_64  進行安裝)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章