可能是史上覆蓋flinksql功能最全的demo--part2

接上一篇文章可能是史上覆蓋flinksql功能最全的demo–part1

Flink SQL join Table的5種方式

靜態表常規join

靜態表常規join指的是:靜態表join靜態表

例:按地區和優先級顯示特定日期的客戶及其訂單

-- 訂單表dev_orders(基於S3的靜態表) join MySQL表
SET execution.type=batch;
USE CATALOG hive;
SELECT
  r_name AS `region`,
  o_orderpriority AS `priority`,
  COUNT(DISTINCT c_custkey) AS `number_of_customers`,
  COUNT(o_orderkey) AS `number_of_orders`
FROM dev_orders
JOIN prod_customer ON o_custkey = c_custkey
JOIN prod_nation ON c_nationkey = n_nationkey
JOIN prod_region ON n_regionkey = r_regionkey
WHERE
  FLOOR(o_ordertime TO DAY) = TIMESTAMP '2020-04-01 0:00:00.000'
  AND NOT o_orderpriority = '4-NOT SPECIFIED'
GROUP BY r_name, o_orderpriority
ORDER BY r_name, o_orderpriority;

在這裏插入圖片描述

動態表常規join

動態表常規join指的是:動態表join靜態表

例:將上例中的靜態訂單表改爲動態表,查詢相同也的業務邏輯

-- 將靜態訂單表dev_orders改爲動態訂單表prod_orders,移除ORDER BY子句(流處理引擎不支持)
SET execution.type=streaming;
USE CATALOG hive;
SELECT
  r_name AS `region`,
  o_orderpriority AS `priority`,
  COUNT(DISTINCT c_custkey) AS `number_of_customers`,
  COUNT(o_orderkey) AS `number_of_orders`
FROM default_catalog.default_database.prod_orders
JOIN prod_customer ON o_custkey = c_custkey
JOIN prod_nation ON c_nationkey = n_nationkey
JOIN prod_region ON n_regionkey = r_regionkey
WHERE
  FLOOR(o_ordertime TO DAY) = TIMESTAMP '2020-04-01 0:00:00.000'
  AND NOT o_orderpriority = '4-NOT SPECIFIED'
GROUP BY r_name, o_orderpriority;

注意:

  1. 靜態表只會在任務啓動時加載一次,數據更新後無法反饋到已經啓動的任務中
  2. 所有輸入表的數據都會被flink寫到狀態中

時間區間join(Interval Join)

時間區間join通常用於類似需求:將兩個(或多個)動態表的事件進行join,這些動態表在一個時間上下文中相互關聯,例如在同一時間發生的事件。Flink SQL對這種連接進行了特殊的優化。

例:將子訂單表和訂單表進行關聯,找到緊急狀態的未付款子訂單

USE CATALOG default_catalog;
SELECT
  o_ordertime AS `ordertime`,
  o_orderkey AS `order`,
  l_linenumber AS `linenumber`,
  l_partkey AS `part`,
  l_suppkey AS `supplier`,
  l_quantity AS `quantity`
FROM prod_lineitem
JOIN prod_orders ON o_orderkey = l_orderkey
WHERE
  l_ordertime BETWEEN o_ordertime - INTERVAL '5' MINUTE AND o_ordertime AND
  l_linestatus = 'O' AND
  o_orderpriority = '1-URGENT';

注意:

  1. where條件中左表和右表必須有基於Event-time語義或Processin-time語義的關聯條件,本例中爲:
l_ordertime BETWEEN o_ordertime - INTERVAL '5' MINUTE AND o_ordertime
  1. 本例中,要求l_ordertime BETWEEN o_ordertime - INTERVAL ‘5’ MINUTE AND o_ordertime,所以在flink state中只保留近5分鐘的父訂單數據即可,減小了對flink內存的要求。

臨時表join(Enrichment Join with Lookup Table in MySQL)

即Temporal Table Join,適用於僅插入(insert-only)動態表join靜態表(無更新或更新頻率較低)。

例:子訂單表prod_lineitem(動態表)join 實時匯率表 prod_rates,用來計算人民幣訂單金額。

USE CATALOG default_catalog;

SELECT
  l_proctime AS `querytime`,
  l_orderkey AS `order`,
  l_linenumber AS `linenumber`,
  l_currency AS `currency`,
  rs_rate AS `cur_rate`, 
  (l_extendedprice * (1 - l_discount) * (1 + l_tax)) / rs_rate AS `open_in_euro`
FROM prod_lineitem
JOIN hive.`default`.prod_rates FOR SYSTEM_TIME AS OF l_proctime ON rs_symbol = l_currency
WHERE
  l_linestatus = 'O'
  AND l_currency = 'CNY';
  

查詢結果:

在這裏插入圖片描述

如上圖,人民幣匯率8.0166。

接下來,修改mysql維表中的人民幣匯率爲9.999:

# 修改人民幣匯率
docker-compose exec mysql mysql -Dsql-demo -usql-demo -pdemo-sql

SELECT * FROM PROD_RATES;

UPDATE PROD_RATES SET RS_TIMESTAMP = '2020-04-01 01:00:00.000', RS_RATE = 9.999 WHERE RS_SYMBOL='CNY';

在這裏插入圖片描述

實時join的結果中,匯率也變爲9.999:
在這裏插入圖片描述

注意:

  1. processing-time語義:根據processing-time去關聯靜態表(匯率表)mysql中的行
  2. mysql維表的更新會實時反饋到正在運行的job中

關鍵語法:

JOIN hive.`default`.prod_rates FOR SYSTEM_TIME AS OF l_proctime ON rs_symbol = l_currency

在join中指定動態表processing-time字段(l_proctime):FOR SYSTEM_TIME AS OF l_proctime

臨時表函數join(Enrichment Join against Temporal Table)

Temporal Table Function Join指的是,通過join變更日誌,進行某個事件時間點精確關聯。

例:通過關聯訂單產生時刻的匯率,計算各幣種的訂單金額。

以Temporal Table Join中的案例需求爲例,將mysql維表改爲kafka維表(匯率變化時向kafka中寫入最新匯率)。

使用TemporalTableFunction prod_rates_temporal 查詢最新匯率:

USE CATALOG default_catalog;

SELECT
  l_ordertime AS `ordertime`,
  l_orderkey AS `order`,
  l_linenumber AS `linenumber`,
  l_currency AS `currency`,
  rs_rate AS `cur_rate`, 
  (l_extendedprice * (1 - l_discount) * (1 + l_tax)) / rs_rate AS `open_in_euro`
FROM
  prod_lineitem,
  LATERAL TABLE(prod_rates_temporal(l_ordertime))
WHERE rs_symbol = l_currency AND
  l_linestatus = 'O';

結果:
在這裏插入圖片描述

注意:

  1. Event-time語義:以Event-time爲依據,關聯temporal table(kafka topic)中的行(匯率)
  2. 匯率變化通過向kafka topic中produce一條數據的方式變更

關鍵語法:
LATERAL TABLE(prod_rates_temporal(l_ordertime))

  1. LATERAL TABLE:Temporal Table Function 關聯關鍵字
  2. prod_rates_temporal(l_ordertime):指向匯率變更日誌的function,以Event-time作爲參數
  3. 截止flink1.10版本,僅支持inner join
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章