一、概述
官方文檔介紹:
http://flume.apache.org/FlumeUserGuide.html#flume-sources
二、Flume Sources 描述
2.1 Avro Source
2.1.1 介紹
Avro端口監聽並接收來自外部的Avro客戶流的事件。當內置Avro去Sinks另一個配對Flume代理,它就可以創建分層採集的拓撲結構。官網說的比較繞,當然我的翻譯也很弱,其實就是flume可以多級代理,然後代理與代理之間用Avro去連接。==字體加粗的屬性必須進行設置==。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
threads | – | Maximum number of worker threads to spawn |
selector.type | ||
selector.* | ||
interceptors | – | Space-separated list of interceptors |
interceptors.* | – | |
compression-type | none | This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource |
ssl | false | Set this to true to enable SSL encryption. You must also specify a “keystore” and a “keystore-password”. |
keystore | – | This is the path to a Java keystore file. Required for SSL. |
keystore-password | – | The password for the Java keystore. Required for SSL. |
keystore-type | JKS | The type of the Java keystore. This can be “JKS” or “PKCS12”. |
exclude-protocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified. |
ipFilter | false | Set this to true to enable ipFiltering for netty |
ipFilterRules | – | Define N netty ipFilter pattern rules with this config. |
2.1.2 示例
示例請參考官方文檔
進入flume文件中的conf目錄下,創建一個a1.conf文件。定義:sinks,channels,sources
#a1.conf:單節點Flume配置
# 命名此代理上的組件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#配置sources
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
#配置sinks
a1.sinks.k1.type = logger
#配置channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#爲sources和sinks綁定channels
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
啓動 Flume
[root@flume flume]# bin/flume-ng agent --conf conf --conf-file conf/a1.conf --name a1 -Dflume.root.logger=INFO,console
或者
[root@flume flume]# bin/flume-ng agent -c conf -f conf/a1.conf -n a1 -Dflume.root.logger=INFO,console
測試 Flume
重新打開一個終端,我們可以telnet端口44444並向Flume發送一個事件:
[root@flume ~]# telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello world! <ENTER> # 輸入的內容
OK
原始的Flume終端將在日誌消息中輸出事件:
2018-11-02 15:29:47,203 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2018-11-02 15:29:47,214 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:166)] CreatedserverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
2018-11-02 15:29:58,507 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 48 65 6C 6C 6F 20 57 6F 72 6C 64 21 0D Hello World!. }
2.2 Thrift Source
ThriftSource 與Avro Source 基本一致。只要把source的類型改成thrift即可,例如a1.sources.r1.type = thrift,比較簡單,不做贅述。
Property Name | Default | Description |
---|---|---|
channels | ||
type | – | The component type name, needs to be thrift |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
threads | – | Maximum number of worker threads to spawn |
selector.type | ||
selector.* | – | |
interceptors | – | Space separated list of interceptors |
interceptors.* | ||
ssl | false | Set this to true to enable SSL encryption. You must also specify a “keystore” and a “keystore-password”. |
keystore | – | This is the path to a Java keystore file. Required for SSL. |
keystore-password | – | The password for the Java keystore. Required for SSL. |
keystore-type | JKS | The type of the Java keystore. This can be “JKS” or “PKCS12”. |
exclude-protocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified. |
kerberos | false | Set to true to enable kerberos authentication. In kerberos mode, agent-principal and agent-keytab are required for successful authentication. The Thrift source in secure mode, will accept connections only from Thrift clients that have kerberos enabled and are successfully authenticated to the kerberos KDC. |
agent-principal | – | The kerberos principal used by the Thrift Source to authenticate to the kerberos KDC. |
agent-keytab | – | The keytab location used by the Thrift Source in combination with the agent-principal to authenticate to the kerberos KDC. |
2.3 Exec Source
2.3.1 介紹
ExecSource的配置就是設定一個Unix(linux)命令,然後通過這個命令不斷輸出數據。如果進程退出,Exec Source也一起退出,不會產生進一步的數據。
下面是官網給出的source的配置,加粗的參數是必選,描述就不解釋了。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be exec |
command | – | The command to execute |
shell | – | A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc. |
restartThrottle | 10000 | Amount of time (in millis) to wait before attempting a restart |
restart | false | Whether the executed cmd should be restarted if it dies |
logStdErr | false | Whether the command’s stderr should be logged |
batchSize | 20 | The max number of lines to read and send to the channel at a time |
batchTimeout | 3000 | Amount of time (in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
2.3.2 示例
創建一個a2.conf文件
#配置文件
#Name the components on this agent
a1.sources= s1
a1.sinks= k1
a1.channels= c1
#配置sources
a1.sources.s1.type = exec
a1.sources.s1.command = tail -f /opt/flume/test.log
a1.sources.s1.channels = c1
#配置sinks
a1.sinks.k1.type= logger
a1.sinks.k1.channel= c1
#配置channel
a1.channels.c1.type= memory
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a2.conf --name a1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
測試 Flume
重新打開一個終端,我們往監聽的日誌裏添加數據:
[root@flume ~]# echo "hello world" >> test.log
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 03:47:32,508 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
2.4 JMS Source
2.4.1 介紹
從JMS系統(消息、主題)中讀取數據,ActiveMQ已經測試過
Property | Name | Default Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be jms |
initialContextFactory | – | Inital Context Factory, e.g: org.apache.activemq.jndi.ActiveMQInitialContextFactory |
connectionFactory | – | The JNDI name the connection factory should appear as |
providerURL | – | The JMS provider URL |
destinationName | – | Destination name |
destinationType | – | Destination type (queue or topic) |
messageSelector | – | Message selector to use when creating the consumer |
userName | – | Username for the destination/provider |
passwordFile | – | File containing the password for the destination/provider |
batchSize | 100 | Number of messages to consume in one batch |
converter.type | DEFAULT | Class to use to convert messages to flume events. See below. |
converter.* | – | Converter properties. |
converter.charset | UTF-8 | Default converter only. Charset to use when converting JMS TextMessages to byte arrays. |
createDurableSubscription | false | Whether to create durable subscription. Durable subscription can only be used with destinationType topic. If true, “clientId” and “durableSubscriptionName” have to be specified. |
clientId | – | JMS client identifier set on Connection right after it is created. Required for durable subscriptions. |
durableSubscriptionName | – | Name used to identify the durable subscription. Required for durable subscriptions. |
2.4.2 官網示例
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = jms
a1.sources.r1.channels = c1
a1.sources.r1.initialContextFactory = org.apache.activemq.jndi.ActiveMQInitialContextFactory
a1.sources.r1.connectionFactory = GenericConnectionFactory
a1.sources.r1.providerURL = tcp://mqserver:61616
a1.sources.r1.destinationName = BUSINESS_DATA
a1.sources.r1.destinationType = QUEUE
2.5 Spooling Directory Source
2.5.1 介紹
Spooling Directory Source監測配置的目錄下新增的文件,並將文件中的數據讀取出來。其中,Spool Source有2個注意地方,第一個是拷貝到spool目錄下的文件不可以再打開編輯,第二個是spool目錄下不可包含相應的子目錄。這個主要用途作爲對日誌的準實時監控。
下面是官網給出的source的配置,加粗的參數是必選。可選項太多,這邊就介紹一個fileSuffix,即文件讀取後添加的後綴名,這個是可以更改。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be spooldir. |
spoolDir | – | The directory from which to read files from. |
fileSuffix | .COMPLETED | Suffix to append to completely ingested files |
2.5.2 示例
創建一個a3.conf文件
a1.sources = s1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.s1.type =spooldir
a1.sources.s1.spoolDir =/opt/flume/logs
a1.sources.s1.fileHeader= true
a1.sources.s1.channels =c1
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
# Use a channel which buffers events inmemory
a1.channels.c1.type = memory
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a3.conf --name a1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
重新打開一個終端,我們將test.log移動到logs目錄:
[root@flume flume]# cp test.log logs/
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 03:54:54,207 (pool-3-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:324)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
2018-11-03 03:54:54,207 (pool-3-thread-1) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:433)] Preparing to move file /opt/flume/logs/test.log to /opt/flume/logs/test.log.COMPLETED
2.6 NetCat Source
2.6.1 介紹
Netcat source 在某一端口上進行偵聽,它將每一行文字變成一個事件源,也就是數據是基於換行符分隔。它的工作就像命令nc -k -l [host] [port] 換句話說,它打開一個指定端口,偵聽數據將每一行文字變成Flume事件,並通過連接通道發送。
下面是官網給出的source的配置,加粗的參數是必選。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be netcat |
bind | – | Host name or IP address to bind to |
port | – | Port # to bind to |
max-line-length | 512 | Max line length per event body (in bytes) |
ack-every-event | TRUE | Respond with an “OK” for every event received |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
2.6.2 示例
實際例子,見 2.3.2 例子就是 Netcat source,這裏不演示了。
2.7 Sequence Generator Source
一個簡單的序列發生器,不斷產成與事件計數器0和1的增量開始。主要用於測試(官網說),這裏也不做贅述。
2.8 Syslog Sources
讀取syslog數據,並生成Flume 事件。 這個Source分成三類SyslogTCP Source、
Multiport Syslog TCP Source(多端口)與SyslogUDP Source。其中TCP Source爲每一個用回車(\ n)來分隔的字符串創建一個新的事件。而UDP Source將整個消息作爲一個單一的事件。
下面是官網給出的source的配置,加粗的參數是必選。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be syslogtcp |
host | – | Host name or IP address to bind to |
port | – | Port # to bind to |
eventSize | 2500 | Maximum size of a single event line, in bytes |
keepFields | none | Setting this to ‘all’ will preserve the Priority, Timestamp and Hostname in the body of the event. A spaced separated list of fields to include is allowed as well. Currently, the following fields can be included: priority, version, timestamp, hostname. The values ‘true’ and ‘false’ have been deprecated in favor of ‘all’ and ‘none’. |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* | – |
2.8.1 Syslog TCPSource
2.8.1.1 介紹
這個是最初的Syslog Sources
下面是官網給出的source的配置,加粗的參數是必選,這裏可選我省略了。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be syslogtcp |
host | – | Host name or IP address to bind to |
port | – | Port # to bind to |
2.8.1.2 示例
官方配置
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
創建一個a4.conf文件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 50000
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1
# Use a channel which buffers events inmemory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
這裏我們設置的偵聽端口爲localhost 50000
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a4.conf --name a1 -Dflume.root.logger=INFO,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
測試 Flume
重新打開一個終端,我們往監聽端口發送數據:
[root@flume ~]# echo "hello world" | nc localhost 50000
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 04:47:34,518 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
2.8.2 Multiport Syslog TCP Source
2.8.2.1 介紹
這是一個更新,更快,支持多端口版本的SyslogTCP Source。他不僅僅監控一個端口,還可以監控多個端口。官網配置基本差不多,就是可選配置比較多。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be multiport_syslogtcp |
host | – | Host name or IP address to bind to. |
ports | – | Space-separated list (one or more) of ports to bind to. |
portHeader | – | If specified, the port number will be stored in the header of each event using the header name specified here. This allows for interceptors and channel selectors to customize routing logic based on the incoming port. |
這裏說明下需要注意的是這裏ports設置已經取代tcp 的port,這個千萬注意。還有portHeader這個可以與後面的interceptors 與 channel selectors自定義邏輯路由使用。
2.8.2.2 示例
官方配置
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = multiport_syslogtcp
a1.sources.r1.channels = c1
a1.sources.r1.host = 0.0.0.0
a1.sources.r1.ports = 10001 10002 10003
a1.sources.r1.portHeader = port
創建一個a5.conf文件
# Name thecomponents on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#Describe/configure the source
a1.sources.r1.type = multiport_syslogtcp
a1.sources.r1.ports = 50000 60000
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
# Describe thesink
a1.sinks.k1.type= logger
a1.sinks.k1.channel = c1
# Use a channelwhich buffers events in memory
a1.channels.c1.type= memory
a1.channels.c1.capacity= 1000
a1.channels.c1.transactionCapacity= 100
這裏我們偵聽 localhost 的2個端口50000與60000
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a5.conf --name a1 -Dflume.root.logger=INFO,console
測試 Flume
重新打開一個終端,我們往監聽端口發送數據:
[root@flume ~]# echo "hello world 01" | nc localhost 50000
[root@flume ~]# echo "hello world 02" | nc localhost 60000
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 05:56:34,588 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{flume,.syslog,status=Invalid} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world 01 }
2018-11-03 05:56:34,588 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{flume,.syslog,status=Invalid} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world 02 }
2個端口的數據已經發送過來了。
2.8.2 Syslog UDP Source
2.8.2.1 介紹
其實這個就是與TCP不同的協議而已。
官網配置與TCP一致,就不說了。
2.8.2.1 示例
官方配置
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = syslogudp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
創建一個a6.conf文件
# Name thecomponents on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#Describe/configure the source
a1.sources.r1.type = syslogudp
a1.sources.r1.port = 50000
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
# Describe thesink
a1.sinks.k1.type= logger
a1.sinks.k1.channel = c1
# Use a channelwhich buffers events in memory
a1.channels.c1.type= memory
a1.channels.c1.capacity= 1000
a1.channels.c1.transactionCapacity= 100
這裏我們偵聽 localhost 的2個端口50000與60000
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a6.conf --name a1 -Dflume.root.logger=INFO,console
測試 Flume
重新打開一個終端,我們往監聽端口發送數據:
[root@flume ~]# echo "hello world" | nc –u localhost 50000
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 06:10:34,768 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{Serverity=0, flume,.syslog,status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello world }
Ok,數據已經發送過來了。
2.9 HTTP Source
2.9.1 介紹
HTTP Source是HTTP POST和GET來發送事件數據的,官網說GET應只用於實驗。Flume 事件使用一個可插拔的“handler”程序來實現轉換,它必須實現的HTTPSourceHandler接口。此處理程序需要一個HttpServletRequest和返回一個flume 事件列表。
所有在一個POST請求發送的事件被認爲是在一個事務裏,一個批量插入flume 通道的行爲。
下面是官網給出的source的配置,加粗的參數是必選。
Property Name | Default | Description |
---|---|---|
type | The component type name, needs to be http | |
port | – | The port the source should bind to. |
bind | 0.0.0.0 | The hostname or IP address to listen on |
handler | org.apache.flume.source.http.JSONHandler | The FQCN of the handler class. |
2.9.2 示例
官方配置
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = http
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
a1.sources.r1.handler = org.example.rest.RestHandler
a1.sources.r1.handler.nickname = random props
創建一個a7.conf文件
#Name the components on this agent
a1.sources= r1
a1.sinks= k1
a1.channels= c1
#Describe/configure the source
a1.sources.r1.type= http
a1.sources.r1.port= 50000
a1.sources.r1.channels= c1
#Describe the sink
a1.sinks.k1.type= logger
a1.sinks.k1.channel = c1
#Use a channel which buffers events in memory
a1.channels.c1.type= memory
a1.channels.c1.capacity= 1000
a1.channels.c1.transactionCapacity= 100
啓動 Flume
[root@flume flume]# ./bin/flume-ng agent --conf conf --conf-file ./conf/a7.conf --name a1 -Dflume.root.logger=INFO,console
測試 Flume
重新打開一個終端,我們用生成JSON 格式的POSTrequest發數據:
[root@flume ~]# echo "hello world" | nc –u localhost 50000
curl -X POST -d '[{"headers" :{"test1" : "test1 is header","test2" : "test2 is header"},"body" : "hello test3"}]' http://localhost:50000
原始的Flume終端將在日誌消息中輸出事件:
2018-11-03 06:20:56,678 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{test1=test1 is header, test2=test2 is header} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 hello test2 }
這裏headers與body都正常輸出。
2.10 自定義Source
一個自定義 Source 其實是對 Source 接口的實現。當我們開始flume代理的時候必須將自定義 Source 和相依賴的jar包放到代理的 classpath 下面。自定義 Source 的 type 就是我們實現 Source 接口對應的類全路徑。