在CentOS 7上安裝配置Flume

原創

2020-06-24 16:26

簡介

大數據時代的各種互聯網應用產生大量的數據和日誌，需要將這些日誌收集起來，進行統一的分析處理，在應用系統和數據分析系統之間需要一座橋樑，進行通用的日誌收集處理。Apache Flume 是一種分佈式的、高可靠的、高可用的日誌收集聚合系統，將不同來源海量的日誌數據傳輸到集中的數據存儲。Flume 最初由Cloudera開發，後成爲Apache基金會頂級項目。Flume 可用於日誌數據、網絡流量數據、社交網絡數據、郵件數據等不同數據源的處理，與Facebook的Scribe、Yahoo的Chukwa、LinkedIn的Kafka類似，是一款特性優異，簡單易用的日誌系統。

Flume agent 負責把外部事件流（數據流）傳輸到指定下一跳，agent包括source（數據源）、channel（傳輸通道）、sink（接收端）。Flume agent可以多跳級聯，組成複雜的數據流。 Flume 支持多種類型的source：Avro數據源、Thrift數據源、Kafka數據源、NetCat數據源、Syslog數據源、文件數據源、自定義數據源等，可靈活地與應用系統集成，需要較少的開發代價。 Flume 能夠與常見的大數據工具結合，支持多種sink：HDFS、Hive、HBase、Kafka等，將數據傳輸到這些系統，進行進一步分析處理。

本教程主要介紹Flume 1.6.0在美團雲 CentOS 7主機上的安裝和配置，並進行功能驗證。

安裝

安裝JDK

Flume 運行系統要求1.6以上的Java 運行環境，從oracle網站下載JDK 安裝包，解壓安裝：

$tar zxvf jdk-8u65-linux-x64.tar.gz
$mv jdk1.8.0_65 java

設置Java 環境變量：

JAVA_HOME=/opt/java
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME PATH

安裝Flume

從官網下載Flume 二進制安裝包，解壓安裝：

tar zxvf apache-flume-1.6.0-bin.tar.gz
mv apache-flume-1.6.0-bin flume
cd flume

配置

source 使用 necat 類型,sink 採用 file_roll 類型, 從監聽端口獲取數據，保存到本地文件。拷貝配置模板：

cp conf/flume-conf.properties.template conf/flume-conf.properties

編輯配置如下：

# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'

agent.sources = r1
agent.channels = c1
agent.sinks = s1

# For each one of the sources, the type is defined
agent.sources.r1.type = netcat
agent.sources.r1.bind = localhost
agent.sources.r1.port = 8888

# The channel can be defined as follows.
agent.sources.r1.channels = c1

# Each sink's type must be defined
agent.sinks.s1.type = file_roll
agent.sinks.s1.sink.directory = /tmp/log/flume

#Specify the channel the sink should use
agent.sinks.s1.channel = c1

# Each channel's type is defined.
agent.channels.c1.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.c1.capacity = 100

功能驗證

1.建立輸出目錄

mkdir -p /tmp/log/flume

2.啓動服務

bin/flume-ng agent --conf conf -f conf/flume-conf.properties -n agent&

運行日誌位於logs目錄，或者啓動時添加-Dflume.root.logger=INFO,console 選項前臺啓動，輸出打印日誌，查看具體運行日誌，服務異常時查原因。

3.發送數據

telnet localhost 8888
輸入
hello world!
hello Flume!

4.查看數據文件查看 /tmp/log/flume 目錄文件:

cat /tmp/log/flume/1447671188760-2
hello world!
hello Flume!

與Kafka 集成

Flume 可以靈活地與Kafka 集成，Flume側重數據收集，Kafka側重數據分發。 Flume可配置source爲Kafka，也可配置sink 爲Kafka。配置sink爲kafka例子如下

agent.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.s1.topic = mytopic
agent.sinks.s1.brokerList = localhost:9092
agent.sinks.s1.requiredAcks = 1
agent.sinks.s1.batchSize = 20
agent.sinks.s1.channel = c1

Flume 收集的數據經由Kafka分發到其它大數據平臺進一步處理。

總結

本文主要介紹了Flume的安裝和簡單應用，配置測試場景驗證了Flume 功能。可根據實際應用場景配置Flume，設置對應的source、sink，方便地收集各種應用數據。

參考資料

Flume 用戶手冊

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

在CentOS 7上安裝配置Flume

簡介

安裝

安裝JDK

安裝Flume

配置

功能驗證

與Kafka 集成

總結

參考資料

Spring 中的 @Async 和 @Scheduled 理解

在CentOS 7上安裝配置Flume

關於Class.getResource和ClassLoader.getResource的路徑問題

程序員心底的小聲音

MYSQL SQL_NO_CACHE的真正含義

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結