Chapter 2 Data Processing Using the DataStream API Event time and watermarks

Event time and watermarks

Flink Streaming API takes inspiration from Google Data Flow model. It supports different concepts of time for its streaming API. In general, there three places where we can capture time in a streaming environment. They are as follows
Flink Streaming API 的靈感源於Google Data Flow模型。它支持不同的時間概念。一般來說,有三個地方可以捕獲到時間。分別爲:

Event time

The time at which event occurred on its producing device. For example in IoT project, the time at which sensor captures a reading. Generally these event times needs to embed in the record before they enter Flink. At the time processing, these timestamps are extracted and considering for windowing. Event time processing can be used for out of order events.
事件時間是設備產生事件的時間,比如:在IOT項目中,傳感器捕獲讀事件的時間。通常,這些時間在事件還沒有進入Flink之前,需要先嵌入到記錄裏。在處理過程中,提取這些時間戮並考慮時間窗口。事件時間處理可以用於無序事件

Processing time

Processing time is the time of machine executing the stream of data processing. Processing time windowing considers only that timestamps where event is getting processed.Processing time is simplest way of stream processing as it does not require any synchronization between processing machines and producing machines. In distributed asynchronous environment processing time does not provide determinism as it is dependent on the speed at which records flow in the system.
(Processing time 是機器處理數據流的時間。處理時間窗口只考慮事件開始被處理時的時間戮。處理時間是最簡單的流處理方式,因爲它不需要處理機生產機之間的同步。在分佈式異步環境中,處理時間是不確定的,因爲這取決於記錄在系統中的流動速度。)

Ingestion time

This is time at which a particular event enters Flink. All time based operations refer to this timestamp. Ingestion time is more expensive operation than processing but it gives predictable results. Ingestion time programs cannot handle any out of order events as it assigns timestamp only after the event is entered the Flink system.Here is an example which shows how to set event time and watermarks. In case of ingestion time and processing time, we just need to the time characteristics and watermark generation is taken care automatically. Following is a code snippet for the same.

(這個時間是事件進入flink的時間。所有基於時間的操作都會引用這個時間。Ingestion timeprocessing time更耗時,但它會給出一個可預見的結果。基於Ingestion time的程序不能處理任何亂序事件,因爲它會在事件進入Flink系統之後指定時間戮。下面有一個例子,這個例子顯示瞭如何設置event timewatermark。在ingestion timeprocessing time的場景下,我們只需要設置時間特徵(Timecharacteriatic),水印會自動生成。下面代碼是其中的一個代碼片段)
譯者注:關於水印的文章http://vishnuviswanath.com/flink_eventtime.html

In Java:

final SreamExecutionEnvironment env=StzeamExecutionEnvizonment. getExecutionEnvironment ();

env.setStreamTimeCharacteristic (Timecharacteriatic.ProceasinqTime);
OR
env.setStreamTimeCharacteristic (Timecharacterietic. Inceationtime);

In Scala:

val env = streamExecutionEnvironment.gerExecutionEnvronment 
env.setStreamTimeCharacteziatic (Timecharacteristic. ProceaaingTime)
OR
env.setStreamTimeCharasteristic (TimeCharacteristic. IngestionTime)

In case of event time stream programs, we need to specify the way to assign watermarks and timestamps. There are two ways of assigning watermarks and timestamps:

  • Directly from data source attribute .
  • Using a timestamp assigner
    evnet time的程序中,我們需要指定水印和時間戮的生成方式。有兩種方式指定水印和時間戮。
  • 直接從數據源的屬性中獲取
  • 使用時間戮分配器

To work with event time streams, we need to assign the time characteristic as follows
處理event time流,我們象下面這種方式指定time characteristic

In Java:

final StreamExecutionEnvironment env =streamExecutionEnvironment.getExecutionEnvironment ();
env.setStreamrimeCharacteriatic (Timecharacteristic.EventTime):

In Scala:

val env = streamExecutionEnvironment. getExecut ionEnvironment;
 env.setStreamrimeCharacteriatic  (Timechazacterigtic. Event Time)

It is always best to store event time while storing the record in source. Flink also supports some pre-defined timestamp extractors and watermark generators. Refer to https://ci.ap ache.org/projects/flink/flink-docs-release-1.2/dev/event_timestamp_extractors.html

把記錄存到source的同時存event time總是最好的。Flink 也支持一些pre-defined的時間戮提取器和水印生成器,參見...

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章