Flink-時間特性 | 定義處理時間的三種方式 | 定義事件時間的三種方式

時間特性（Time Attributes）

基於時間的操作（比如 Table API 和 SQL 中窗口操作），需要定義相關的時間語義和時間數據來源的信息
Table 可以提供一個邏輯上的時間字段，用於在表處理程序中，指示時間和訪問相應的時間戳
時間屬性，可以是每個表schema的一部分。一旦定義了時間屬性，它就可以作爲一個字段引用，並且可以在基於時間的操作中使用
時間屬性的行爲類似於常規時間戳，可以訪問，並且進行計算

定義處理時間（Processing Time）

處理時間語義下，允許表處理程序根據機器的本地時間生成結果。它是時間的最簡單概念。它既不需要提取時間戳，也不需要生成 watermark:

由 DataStream 轉換成表時指定
定義 Table Schema 時指定
在創建表的 DDL 中定義

由 DataStream 轉換成表時指定

在定義Schema期間，可以使用.proctime，指定字段名定義處理時間字段,這個proctime屬性只能通過附加邏輯字段，來擴展物理schema。因此，只能在schema定義的末尾定義它

import java.sql.Timestamp

import com.atguigu.bean.SensorReading
import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.functions.timestamps.BoundedOutOfOrdernessTimestampExtractor
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
import org.apache.flink.table.api.Table
import org.apache.flink.table.api.scala._

object ProcessingTimeTest {
  def main(args: Array[String]): Unit = {

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    // 開啓事件時間語義
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

    // 創建表環境
    val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)

    val inputDStream: DataStream[String] = env.readTextFile("D:\\MyWork\\WorkSpaceIDEA\\flink-tutorial\\src\\main\\resources\\SensorReading.txt")

    val dataDStream: DataStream[SensorReading] = inputDStream.map(
      data => {
        val dataArray: Array[String] = data.split(",")
        SensorReading(dataArray(0), dataArray(1).toLong, dataArray(2).toDouble)
      })
      .assignTimestampsAndWatermarks( new BoundedOutOfOrdernessTimestampExtractor[SensorReading]
      ( Time.seconds(1) ) {
        override def extractTimestamp(element: SensorReading): Long = element.timestamp * 1000L
      } )

    // 用proctime定義處理時間
    val dataTable: Table = tableEnv
      .fromDataStream(dataDStream, 'id, 'temperature, 'timestamp, 'pt.proctime)

    // 查詢
    val resultTable: Table = dataTable
      .select('id, 'temperature,'pt) // 查詢id和temperature字段
      .filter('id === "sensor_1") // 輸出sensor_1得數據

    // 測試輸出
    resultTable.toAppendStream[ (String, Double, Timestamp) ].print( "process" )
    // 查看錶結構
    dataTable.printSchema()

    env.execute(" table ProcessingTime test job")
  }
}

定義 Table Schema 時指定

在定義Schema的時候，加上一個新的字段，並指定成proctime就可以了。

import java.sql.Timestamp

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.{DataTypes, Table}
import org.apache.flink.table.api.scala._
import org.apache.flink.table.descriptors.{Csv, FileSystem, Kafka, OldCsv, Schema}

object ProcessingTimeSchemaTest {
  def main(args: Array[String]): Unit = {

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    // 創建表環境
    val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)

    tableEnv.connect( new Kafka()
      .version( "0.11" ) // 版本
      .topic( "sensor" ) // 主題
      .property("zookeeper.connect", "hadoop102:2181")
      .property("bootstrap.servers", "hadoop102:9092")
    )
      .withFormat( new Csv() ) // 新版本得Csv
      .withSchema( new Schema()
        .field("id", DataTypes.STRING())
        .field("timestamp", DataTypes.BIGINT())
        .field("temperature", DataTypes.DOUBLE())
          .field("pt", DataTypes.TIMESTAMP(3))
          .proctime()
      )
      .createTemporaryTable( "proctimeInputTable" )

    val dataTable: Table = tableEnv.from("proctimeInputTable")

    // 查詢
    val resultTable: Table = dataTable
      .select('id, 'temperature, 'pt) // 查詢id和temperature字段
      .filter('id === "sensor_1") // 輸出sensor_1得數據

    // 測試輸出
    resultTable.toAppendStream[ (String, Double, Timestamp) ].print( "process" )
    // 查看錶結構
    dataTable.printSchema()

    env.execute(" table ProcessingTime test job")
  }
}

不是所有的連接器都可以，比如連接FileSystem就會直接報錯，而Kafka的實現了下面這兩個類，所以沒問題

在創建表的 DDL 中定義

在創建表的DDL中，增加一個字段並指定成proctime，也可以指定當前的時間字段。（現在不支持）

import java.sql.Timestamp

import com.atguigu.bean.SensorReading
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{DataTypes, Table}
import org.apache.flink.table.descriptors.{Csv, Kafka, Schema}

object ProcessingTimeSqlTest {
  def main(args: Array[String]): Unit = {

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    // 創建表環境
    val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)

    val sinkDDL: String =
      """
        |create table dataTable (
        |  id varchar(20) not null,
        |  ts bigint,
        |  temperature double,
        |  pt AS PROCTIME()
        |) with (
        |  'connector.type' = 'filesystem',
        |  'connector.path' = 'file:D:\MyWork\WorkSpaceIDEA\flink-tutorial\src\main\resources\SensorReading.txt',
        |  'format.type' = 'csv'
        |)
  """.stripMargin

    tableEnv.sqlUpdate(sinkDDL)
    
    val dataTable: Table = tableEnv.from("dataTable")
    // 查詢
    val resultTable: Table = dataTable
      .select('id, 'temperature, 'pt) // 查詢id和temperature字段
      .filter('id === "sensor_1") // 輸出sensor_1得數據

    // 測試輸出
    resultTable.toAppendStream[ (String, Double, Timestamp) ].print( "process" )

    // 查看錶結構
    dataTable.printSchema()
    tableEnv.sqlUpdate(sinkDDL) // 執行 DDL

    env.execute(" table ProcessingTimeSqlTest test job")
  }
}

定義事件時間（Event Time）

由 DataStream 轉換成表時指定

在 DataStream 轉換成 Table，使用 .rowtime 可以定義事件時間屬性

// 將 DataStream轉換爲 Table，並指定時間字段val sensorTable = tableEnv.fromDataStream(dataStream, 
                       'id, 'timestamp.rowtime, 'temperature)
// 或者，直接追加時間字段val sensorTable = tableEnv.fromDataStream(dataStream, 
                       'id, 'temperature, 'timestamp, 'rt.rowtime)

// 或者，直接在watermark中定義的時間字段直接綁定（可以不起別名）
val sensorTable = tableEnv.fromDataStream(dataStream, 
                       'id, 'temperature, 'timestamp.rowtime as 'ts)

定義 Table Schema 時指定

import java.sql.Timestamp

import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.scala._
import org.apache.flink.table.api.{DataTypes, Table}
import org.apache.flink.table.descriptors.{Csv, Kafka, Rowtime, Schema}

object EventTimeSchemaTest {
  def main(args: Array[String]): Unit = {

    val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
    env.setParallelism(1)

    // 創建表環境
    val tableEnv: StreamTableEnvironment = StreamTableEnvironment.create(env)

    tableEnv.connect( new Kafka()
      .version( "0.11" ) // 版本
      .topic( "sensor" ) // 主題
      .property("zookeeper.connect", "hadoop102:2181")
      .property("bootstrap.servers", "hadoop102:9092")
    )
      .withFormat( new Csv() ) // 新版本得Csv
      .withSchema( new Schema()
        .field("id", DataTypes.STRING())
        .field("timestamp", DataTypes.BIGINT())
        .field("temperature", DataTypes.DOUBLE())
          .rowtime(
            new Rowtime()
              .timestampsFromField("timestamp") // 從字段中提取時間戳
              .watermarksPeriodicBounded(1000) // watermark延遲1秒
          )
      )
      .createTemporaryTable( "peventTimeInputTable" )

    val dataTable: Table = tableEnv.from("peventTimeInputTable")

    // 查詢
    val resultTable: Table = dataTable
      .select('id, 'temperature, 'pt) // 查詢id和temperature字段
      .filter('id === "sensor_1") // 輸出sensor_1得數據

    // 測試輸出
    resultTable.toAppendStream[ (String, Double, Timestamp) ].print( "process" )
    // 查看錶結構
    dataTable.printSchema()

    env.execute(" table peventTimeInputTable test job")
  }
}

在創建表的 DDL 中定義

這裏FROM_UNIXTIME是系統內置的時間函數，用來將一個整數（秒數）轉換成“YYYY-MM-DD hh:mm:ss”格式（默認，也可以作爲第二個String參數傳入）的日期時間字符串（date time string）；然後再用TO_TIMESTAMP將其轉換成Timestamp。

val sinkDDL: String =
"""
|create table dataTable (
|  id varchar(20) not null,
|  ts bigint,
|  temperature double,
|  rt AS TO_TIMESTAMP( FROM_UNIXTIME(ts) ),
|  watermark for rt as rt - interval '1' second
|) with (
|  'connector.type' = 'filesystem',
|  'connector.path' = 'file:///D:\\..\\sensor.txt',
|  'format.type' = 'csv'
|)
""".stripMargin
tableEnv.sqlUpdate(sinkDDL) // 執行 DDL

Flink-時間特性 | 定義處理時間的三種方式 | 定義事件時間的三種方式

時間特性（Time Attributes）

定義處理時間（Processing Time）

由 DataStream 轉換成表時指定

定義 Table Schema 時指定

在創建表的 DDL 中定義

定義事件時間（Event Time）

由 DataStream 轉換成表時指定

定義 Table Schema 時指定

在創建表的 DDL 中定義

win11關閉自動檢測病毒刪文件

Error:scalac: Error: Error compiling the sbt component compiler-interface-2.11.8-55.0

用戶畫像代碼實操

Flink電商項目第一天-電商用戶行爲分析及完整圖步驟解析-熱門商品統計TopN的實現

Flink- 將錶轉換成DataStream | 查看執行計劃 | 流處理和關係代數的區別 | 動態表 | 流式持續查詢的過程 | 將流轉換成動態表 | 持續查詢 | 將動態錶轉換成 DS

Flink-分組窗口 | Over Windows | SQL 中的 Group Windows | SQL 中的 Over Windows

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結