使用HIVE 自帶的JSON 函數進行json解析 同時解析JSON數組

數據展示
這是遺傳JSON字符串 ,是一個還款計劃,其中包含了很多的還款計劃。

{"plan":[{"principal":"1114.09","interest":"489.14","date":"2018-11-02"},{"principal":"1124.30","interest":"423.03","date":"2018-12-02"},{"principal":"1134.61","interest":"412.72","date":"2019-01-02"},{"principal":"1145.01","interest":"402.32","date":"2019-02-02"},{"principal":"1155.50","interest":"391.83","date":"2019-03-02"},{"principal":"1166.10","interest":"381.23","date":"2019-04-02"},{"principal":"1176.78","interest":"370.55","date":"2019-05-02"},{"principal":"1187.57","interest":"359.76","date":"2019-06-02"},{"principal":"1198.46","interest":"348.87","date":"2019-07-02"},{"principal":"1209.44","interest":"337.89","date":"2019-08-02"},{"principal":"1220.53","interest":"326.80","date":"2019-09-02"},{"principal":"1231.72","interest":"315.61","date":"2019-10-02"},{"principal":"1243.01","interest":"304.32","date":"2019-11-02"},{"principal":"1254.40","interest":"292.93","date":"2019-12-02"},{"principal":"1265.90","interest":"281.43","date":"2020-01-02"},{"principal":"1277.51","interest":"269.82","date":"2020-02-02"},{"principal":"1289.22","interest":"258.11","date":"2020-03-02"},{"principal":"1301.03","interest":"246.30","date":"2020-04-02"},{"principal":"1312.96","interest":"234.37","date":"2020-05-02"},{"principal":"1325.00","interest":"222.33","date":"2020-06-02"},{"principal":"1337.14","interest":"210.19","date":"2020-07-02"},{"principal":"1349.40","interest":"197.93","date":"2020-08-02"},{"principal":"1361.77","interest":"185.56","date":"2020-09-02"},{"principal":"1374.25","interest":"173.08","date":"2020-10-02"},{"principal":"1386.85","interest":"160.48","date":"2020-11-02"},{"principal":"1399.56","interest":"147.77","date":"2020-12-02"},{"principal":"1412.39","interest":"134.94","date":"2021-01-02"},{"principal":"1425.34","interest":"121.99","date":"2021-02-02"},{"principal":"1438.40","interest":"108.93","date":"2021-03-02"},{"principal":"1451.59","interest":"95.74","date":"2021-04-02"},{"principal":"1464.90","interest":"82.43","date":"2021-05-02"},{"principal":"1478.32","interest":"69.01","date":"2021-06-02"},{"principal":"1491.87","interest":"55.46","date":"2021-07-02"},{"principal":"1505.55","interest":"41.78","date":"2021-08-02"},{"principal":"1519.35","interest":"27.98","date":"2021-09-02"},{"principal":"1533.28","interest":"14.05","date":"2021-10-02"}]}

目標表建表語句,現在的需求是要將一個JSON轉化爲多條數據,並且增加還款期數

CREATE TABLE `app.app_cpdji_repayment_plan`
  (
    `platform_no` string COMMENT '社會信用編碼'
    , `project_id` string COMMENT '項目編號'
    , `contract_id` string COMMENT '合同編號'
    , `repayment_periods` INT COMMENT '還款期數'
    , `repayment_date` string COMMENT '還款日期'
    , `principal` DECIMAL(20,2) COMMENT '應還本金'
    , `interest` DECIMAL(20,2) COMMENT '應還利息'
  )
  COMMENT '還款計劃' STORED AS PARQUET;
  1. 獲取JSON 中計劃的內容
 select get_json_object(repay_plan, '$.plan');
  1. 將JSON串中的各行進行區分 ,分出多行數據
 split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest')
  1. 將數組進行列轉行
 explode( split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest') d AS DATE1
  , principal
  ,interest; 
  1. 完成數據加工,進行按照日期排序
INSERT INTO app.app_cpdji_repayment_plan
SELECT platform_no
  ,project_no
  ,contract_no
  ,ROW_NUMBER() over(
                   PARTITION BY project_no
                     ,contract_no
                     ,platform_no
                   ORDER BY DATE1 ) rt
  ,DATE1
  ,principal
  ,interest
FROM ( SELECT json
    ,project_no
    ,contract_no
    ,platform_no
  FROM app.app_cpdji_view_ods_prodc_inv_contract LATERAL VIEW explode( split(regexp_replace( regexp_replace( regexp_replace(get_json_object(repay_plan, '$.plan'),'\\[','') , '\\]','') ,'\\}\\,\\{' ,'\\}\\;\\{') ,'\\;')) course_scores AS json) AS table1 LATERAL VIEW json_tuple(json,'date','principal','interest') d AS DATE1
  , principal
  ,interest;

這樣一個簡單的數據解析就完成了,不得不說HIVE進行JSON解析是真的很不錯。
進行計算消耗時間
花費了 1100多秒。

在這裏插入圖片描述
一共200億的數據量,這個效率還是太棒了。

發佈了15 篇原創文章 · 獲贊 6 · 訪問量 2867
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章