flume1.9安裝測試

1.下載

2.部署

創建部署目錄，並上傳安裝包
- mkdir sys && rz -be
解壓安裝包
- tar -zxvf apache-flume-1.9.0-bin.tar.gz
設置環境變量
- vi ~/.bash_profile
- source ~/.bash_profile
修改配置文件
- cd $FLUME_HOME/conf
- mv flume-env.sh.template flume-env.sh
- vi flume-env.sh,修改JAVA_HOME的值，需要安裝java8以上版本；如有需要也可適當調整JAVA_OPTS參數

3.配置

1.實時監控單個日誌文件變化，並寫入hdfs

mv flume-conf.properties.template flume-conf.properties
修改flume-conf,配置source

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
## configure the source
# exec 指的是命令
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/bigdata/logs/test.log
a1.sources.r1.channels = c1

修改flume-conf,配置sink

# Config the sink
#下沉目標
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
#指定目錄, flume幫做目的替換
a1.sinks.k1.hdfs.path = hdfs://bi-name1/data/logs/trace_logs/111/%y-%m-%d/
#文件的命名, 前綴
a1.sinks.k1.hdfs.filePrefix = warn

#10 分鐘就改目錄，生成新的目錄  
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

#時間：每3s滾動生成一個新的文件 0表示不使用時間來滾動
a1.sinks.k1.hdfs.rollInterval = 0

#空間： 文件滾動的大小限制(bytes) 當達到1kb時滾動生成新的文件
a1.sinks.k1.hdfs.rollSize = 1024

#事件：寫入多少個event數據後滾動文件(事件個數)，滾動生成新的文件
a1.sinks.k1.hdfs.rollCount = 20

#5個事件就開始往裏面寫入
a1.sinks.k1.hdfs.batchSize = 5

#用本地時間格式化目錄
a1.sinks.k1.hdfs.useLocalTimeStamp = true

#下沉後, 生成的文件類型，默認是Sequencefile，可用DataStream，則爲普通文本
a1.sinks.k1.hdfs.fileType = DataStream

修改flume-conf,配置channel

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.實時監控目錄日誌文件變化，並寫入hdfs

配置source

 # source類型
 a1.sources.s1.type = TAILDIR
 # 元數據位置
 a1.sources.s1.positionFile = /home/bigdata/file/flume/taildir_position.json
 # 監控的目錄
 a1.sources.s1.filegroups = f1
 a1.sources.s1.filegroups.f1=/home/bigdata/logs/.*log
 a1.sources.s1.fileHeader = true

sink和channel配置同上

3.實時監控目錄日誌文件變化，並寫入kafka,配置如下

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

## configure the source
# exec 指的是命令
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/bigdata/logs/test.log
a1.sources.r1.channels = c1

## configure the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.channel=c1
a1.sinks.k1.kafka.bootstrap.servers = broker1:9092,broker2:9092,broker3:9092
a1.sinks.k1.kafka.topic = test1
#指定必須有多少個分區副本接收到了消息，生產者才認爲消息發送成功,
##0:Never wait,1:wait for leader only,-1:wait for all replicas;default:1
a1.sinks.k1.kafka.producer.acks=0

## configure the channel
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /data/flume/checkpoint
a1.channels.c1.dataDirs = /data/flume/data

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4.運行並測試

啓動Flume:
$ bin/flume-ng agent -n a1 -c conf -f conf/flume.conf -Dflume.root.logger=INFO,console

使用python2.7追加寫入日誌文件

#-*- coding: utf-8 -*-
import time
import sys
import io

reload(sys)
sys.setdefaultencoding('utf-8')
filePath="/home/bigdata/logs/test.log"

def writeLog():
    ff = io.open(filePath,"ab+")
    while True:
      time.sleep(1)
      tm = str(time.time())
      json='{"name":"zhangsan","age":20}'
      ff.write(str(json+"\n"))

if __name__=="__main__":
    print(str(time.time()))
    writeLog()

查看kafka是否收到消息

$ kafka-console-consumer --bootstrap-server slave199:9092 --topic test1

5.bug修復

6.補充

python2.7 io.open()的mode參數說明

關於open()的mode參數：
'r'：讀 ; 'w'：寫 ; 'a'：追加
'r+' == r+w（可讀可寫，文件若不存在就報錯(IOError)）
'w+' == w+r（可讀可寫，文件若不存在就創建）
'a+' ==a+r（可追加可寫，文件若不存在就創建）
如果是二進制文件，就都加一個b就好啦：
'rb'　　'wb'　　'ab'　　'rb+'　　'wb+'　　'ab+'

Flume啓動命令說明

flume-ng agent  --conf  conf  --conf-file  conf/file.log --name agent1 -Dflume.root.logger=DEBUG, console
    -c (--conf) ： flume的conf文件路徑
    -f (--conf-file) ： 自定義的flume配置文件
    -n (--name)： 自定義的flume配置文件中agent的name

flume sink hdfs屬性說明:

  type HDFS
  hdfs.path 必填，HDFS 目錄路徑 (eg hdfs://namenode/flume/webdata/)
  hdfs.filePrefix FlumeData Flume在目錄下創建文件的名稱前綴
  hdfs.fileSuffix – 追加到文件的名稱後綴 (eg .avro - 注: 日期時間不會自動添加)
  hdfs.inUsePrefix – Flume正在處理的文件所加的前綴
  hdfs.inUseSuffix .tmp Flume正在處理的文件所加的後綴

flume使用指南:http://flume.apache.org/FlumeUserGuide.html

flume1.9安裝測試

1.下載

2.部署

3.配置

4.運行並測試

5.bug修復

6.補充

CDH6.2.0 集羣搭建

flume1.9安裝測試

spark streaming消費kafka

DolphinScheduler安裝與使用

PostgreSQL客戶端安裝

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結