NIFI的學習筆記

最近在學NIFI,整理了一些學習筆記方便日後查看。
概念部分可以先跳過,查看使用方法直接跳到第4prat,有圖文簡介。

參考出處: http://nifi.apache.org/docs.html

1.Definition

Put simply NiFi was built to automate the flow of data between systems

2.The core concepts

NiFi’s fundamental design concepts closely relate to the main ideas of Flow Based Programming [FBP]
在這裏插入圖片描述
2.1 FlowFile
2.1.1 The information packet moving through different system
2.1.2 it made up of two parts: k/v Attributes and Content
2.2 FlowFile Processor
2.2.1 It is the NiFi component that is responsible for creating, sending, receiving, transforming, routing, splitting, merging, and processing FlowFiles.
2.2.2 It access to attributes of a given FlowFile and its content stream.
2.2.3 it operate on zero or more FlowFiles and either commit that work or rollback.
2.3 Connection
2.3.1 it looks like the bounder buffer,providing linkage between system.
2.3.2 These queues can be prioritized dynamically and can have upper bounds on load, which enable back pressure.
2.4 Flow controller
2.4.1 It maintains the knowledge of how processes connect and manages the threads and allocations resource.
2.4.2 It acts as the broker facilitating the exchange of FlowFiles between processors.
2.5 Processor Group
2.5.1 A Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports.
2.5.2 they create entirely new components simply by composition of other components as above.

3.NIFI Architecture

在這裏插入圖片描述
NiFi executes within a JVM on a host operating system. The primary components of NiFi on the JVM.

4.How to build flow with processors

4.1 Entering web of NiFi(默認方式是允許任何都可以登入的,可以進行NIFI認證設置,隔離其他用戶,詳情之後再寫一邊相關的日誌)

open a web browser and navigate to http://localhost:8080/nifi. (The port can be changed by editing the nifi.properties file in the NiFi conf directory, but the default port is 8080.)
注:環境如搭建在JVM上linux上,IP地址須在nifi.properties文件修改。

在這裏插入圖片描述

4.2 Adding a Processor

draging the Processor icon to canvas
Processor)
and then this will give us a dialog that allows us to choose which Processor we want to add:(We can filter by these tags or by Processor name by typing into the Filter box)
在這裏插入圖片描述

4.3 Configuring a Processor

4.3.1 we can configure it by right-clicking on the Processor and choosing the Configure menu item.
在這裏插入圖片描述
4.3.2 The Processor cannot be started until all required properties have been configured. (Properties that are in bold)
必須項目默認都填好了,所以可以跳過這步

4.3.3 If we set the directory name to ./data-in, this will cause the Processor to start picking up any data in the data-in subdirectory of the NiFi Home directory
注:很奇怪!實際操作NIFI是找的絕對路徑,英文描述說的是安裝NIFI的相對路徑??????浪費了我巨多時間!!!!!!!!!!!

(1)填寫input Directory還有將keep source file設置為true(不懂,難道是源文件依然保留一份在input目錄?查了一下usage就是我猜的那樣!!)
在這裏插入圖片描述
(2)選中success(成功了就自動終止)
在這裏插入圖片描述
4.4.4 In order for this property to be valid, create a directory named data-in in the NiFi home directory and then click the Ok button to close the dialog.
我是在主目錄下新建了一個input目錄,並且在其下新建一個包含內容的txt格式文件。
順便新建output目錄,因為之後需要一個輸出源。(注:這個順便新建是大坑,根本不需要新建,後面反而有同名目錄衝突!!)
在這裏插入圖片描述

4.4.5
新建一個putfile的processor,步驟同4.4.3,配置如下圖。
在這裏插入圖片描述
在這裏插入圖片描述
4.4 Connecting Processors
(還不太理解其中define的用法,以後填坑,現在連接就完事了)
4.4.1 We can now send the output of the GetFile Processor to the PutFile Processor. Hover over the GetFile Processor with the mouse and
a Connection Icon ( 在這裏插入圖片描述)
will appear over the middle of the Processor. We can drag this icon from the GetFile Processor to the LogAttribute Processor.
在這裏插入圖片描述
4.4.2 connection的相關配置信息
(1)This gives us a dialog to choose which Relationships we want to include for this connection. Because GetFile has only a single Relationship, success, it is automatically selected for us.
Relationshipsz理解成數據傳輸的中介,有date flowing
在這裏插入圖片描述
(2)By default, it is set to “0 sec” which indicates that the data should not expire. However, we can change the value so that when data in this Connection reaches a certain age, it will automatically be deleted (and a corresponding EXPIRE Provenance event will be created).
根據需求來設置
在這裏插入圖片描述
4.5 Starting and Stopping Processors
4.5.1 We can select the Processors and then click the Start icon in the Operate palette.
4.5.2 We can then stop the Processors by using the Stop icon.
4.5.3 In order to configure a Processor, we must first stop the Processor and wait for any tasks that may be executing to finish. (實際操作還要加上清空隊列才行)

注:當我開啟了兩個processor之後,出現了warning,提示說是我之前就創建了一個同名目錄,我的解決辦法是去delete file或者設置忽略重複的文件
在這裏插入圖片描述
4.6 可以在輸出的目錄下查詢到實際的文件
在這裏插入圖片描述
操作了很多次,所以文件很多
在這裏插入圖片描述

一些實際操作遇到的問題
1.如果我修改輸入源的文件內容,重新start也無法傳輸數據。好神奇!?
解決辦法:可以通過重新創建processor來傳輸數據。

5.Error handing

Processors maybe do not handle errors themselves,但是下面介紹了一些自救的辦法

5.1 Exceptions within the Processor

During the execution of the onTrigger method of a Processor, many things can potentially go awry. Common failure conditions include:

5.1.1 Incoming data is not in the expected format.
5.1.2 Network connections to external services fail.
5.1.3 Reading or writing data to a disk fails.
5.1.4 There is a bug in the Processor or a dependent library.

There are two types of Exceptions that can escape a Processor
(1) ProcessException:
If a ProcessException is thrown from the Processor, the framework will assume that this is a failure that is a known outcome. Moreover, it is a condition where attempting to process the data again later may be successful. As a result, the framework will roll back the session that was being processed and penalize the FlowFiles that were being processed.(framework發現已知的error會自動roll back 和penalize FlowFiles之後可能就運行成功了)

(2) other Exception:
If any other Exception escapes the Processor, though, the framework will assume that it is a failure that was not taken into account by the developer. In this case, the framework will also roll back the session and penalize the FlowFiles. (framework發現開發人員未知的error也會自動roll back 和penalize FlowFiles)However, in this case, we can get into some very problematic cases:

For example, the Processor may be in a bad state and may continually run, depleting system resources, without providing any useful work. (processor 不斷消耗資源)
This is fairly common, for instance, when a NullPointerException is thrown continually. (不斷拋出空指針異常)
In order to avoid this case, if an Exception other than ProcessException is able to escape the Processor’s onTrigger method, the framework will also “Administratively Yield” the Processor. This means that the Processor will not be triggered to run again for some amount of time. The amount of time is configured in the nifi.properties file but is 10 seconds by default.(爲了系統資源不再被消耗,processor一段時間內不會觸發和重新運行)

5.2 Exceptions within a callback: IOException, RuntimeException

More often than not, when an Exception occurs in a Processor, it occurs from within a callback (I.e., InputStreamCallback, OutputStreamCallback, or StreamCallback). That is, during the processing of a FlowFile’s content.
Callbacks are allowed to throw either RuntimeException or IOException.
In the case of RuntimeException, this Exception will propagate back to the onTrigger method. (這個異常將傳播回onTrigger方法)
In the case of an IOException, the Exception will be wrapped within a ProcessException and this ProcessException will then be thrown from the Framework.(異常將被包裝在ProcessException中,然後從框架中拋出ProcessException。)
For this reason, it is recommended that Processors that use callbacks do so within a try/catch block and catch ProcessException as well as any other RuntimeException that they expect their callback to throw. It is not recommended that Processors catch the general Exception or Throwable cases.(processors盡量不要捕獲一般異常和拋出去)

5.3 Penalization vs. Yielding

5.3.1 Penalization:FlowfFile is error,Processor penalized FlowFile,and then process other FlowFile until Penality Duration is pass.
5.3.2 Yielding: Processor don’t process in Yield Duration.

5.4 Session Rollback

All methods that are called on a ProcessSession happen as a transaction事務性. When we decided to end the transaction, we can do so either by calling commit() or by calling rollback().
this is handled by the AbstractProcessor class: if the onTrigger method throws an Exception, the AbstractProcessor will catch the Exception, call session.rollback(), and then re-throw the Exception. Otherwise, the AbstractProcessor will call commit() on the ProcessSession.
注:when the commit method is called,the FlowFiles are transferred to the outbound queues so that the next Processors can operate on the data.
(進入隊列中)

org.apache.nifi.annotations.behavior.SupportsBatching annotation.
If a Processor utilizes this annotation, calls to ProcessSession.commit may not take affect immediately. Rather, these commits may be batched together in order to provide higher throughput. However, if at any point, the Processor rolls back the ProcessSession, all changes since the last call to commit will be discarded and all “batched” commits will take affect. These “batched” commits are not rolled back.(大概就是說批處理註解不會立刻執行commit數據,而是等到一大批數據一起commit)

6.Controller Services

A Controller Service must be comprised of an interface that extends ControllerService.

6.1 Developing a ControllerService

Implementations can then be interacted with only through their interface. A Processor, for instance, will never be given a concrete implementation of a ControllerService and therefore must reference the service only via interfaces that extends ControllerService.(A Processor必須僅通過擴展ControllerService的接口引用服務)

6.2 Interacting with a ControllerService

對於大多數用例,使用identifiesControllerService PropertyDescriptor Builder 的方法是首選。爲了使用這個方法,我們創建了一個PropertyDescriptor,它引用了一個Controller服務:

public static final PropertyDescriptor SSL_CONTEXT_SERVICE = new PropertyDescriptor.Builder()
  .name("SSL Context Service")
  .description("Specified the SSL Context Service that can be used to create secure connections")
  .required(true)
  .identifiesControllerService(SSLContextService.class)
  .build();

In order to make use of this service, the Processor can use code such as:

final SSLContextService sslContextService = context.getProperty(SSL_CONTEXT_SERVICE)
	.asControllerService(SSLContextService.class);

SSLContextService是一個 interface

7.Reporting Tasks

NiFi provides a capability for reporting status, statistics, metrics, and monitoring information to external services by means of the ReportingTask interface. ReportingTasks are given access to a host of information to determine how the system is performing.(就是能顯示當前各種狀態情況等)

7.1 Developing a Reporting Task

Just like with the Processor and ControllerService interfaces, the ReportingTask interface exposes methods for configuration, validation, and initialization. These methods are all identical to those of the Processor and ControllerService interfaces.(ReportingTask提供給了方法去完成配置,初始化等)

8.Testing

a nifi-mock module that can be used in conjunction with JUnit to provide extensive testing of components.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章