Hive|如何避免數據傾斜

原創

2021-05-31 11:44

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1. hive中桶的概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於每一個表（table）或者分區， Hive可以進一步組織成桶，也就是說桶是更爲細粒度的數據範圍劃分。Hive也是針對某一列進行桶的組織。Hive採用對列值哈希，然後除以桶的個數求餘的方式決定該條記錄存放在哪個桶當中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"把表（或者分區）組織成桶（Bucket）有兩個理由：","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"（1）獲得更高的查詢處理效率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"桶爲表加上了額外的結構，Hive 在處理有些查詢時能利用這個結構。具體而言，連接兩個在（包含連接列的）相同列上劃分了桶的表，可以使用 Map 端連接（Map-side join）高效的實現。比如JOIN操作。對於JOIN操作兩個表有一個相同的列，如果對這兩個表都進行了桶操作。那麼將保存相同列值的桶進行JOIN操作就可以，可以大大較少JOIN的數據量。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"（2）使取樣（sampling）更高效。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在處理大規模數據集時，在開發和修改查詢的階段，如果能在數據集的一小部分數據上試運行查詢，會帶來很多方便。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"創建帶桶的 table","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"create table bucketed_user(id int,name string) clustered by (id) sorted by(name) into 4 buckets row format delimited fields terminated by '\\t' stored as textfile; ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先，我們來看如何告訴Hive—個表應該被劃分成桶。我們使用CLUSTERED BY 子句來指定劃分桶所用的列和要劃分的桶的個數：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"CREATE TABLE bucketed_user (id INT) name STRING)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"CLUSTERED BY (id) INTO 4 BUCKETS;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏，我們使用用戶ID來確定如何劃分桶(Hive使用對值進行哈希並將結果除以桶的個數取餘數。這樣，任何一桶裏都會有一個隨機的用戶集合（PS：其實也能說是隨機，不是嗎？）。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於map端連接的情況，兩個表以相同方式劃分桶。處理左邊表內某個桶的 mapper知道右邊表內相匹配的行在對應的桶內。因此，mapper只需要獲取那個桶 (這只是右邊表內存儲數據的一小部分)即可進行連接。這一優化方法並不一定要求兩個表必須桶的個數相同，兩個表的桶個數是倍數關係也可以。用HiveQL對兩個劃分了桶的表進行連接，可參見“map連接”部分（P400）。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"桶中的數據可以根據一個或多個列另外進行排序。由於這樣對每個桶的連接變成了高效的歸併排序(merge-sort), 因此可以進一步提升map端連接的效率。以下語法聲明一個表使其使用排序桶：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"CREATE TABLE bucketed_users (id INT, name STRING)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"CLUSTERED BY (id) SORTED BY (id ASC) INTO 4 BUCKETS;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們如何保證表中的數據都劃分成桶了呢？把在Hive外生成的數據加載到劃分成桶的表中，當然是可以的。其實讓Hive來劃分桶更容易。這一操作通常針對已有的表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hive並不檢查數據文件中的桶是否和表定義中的桶一致(無論是對於桶的數量或用於劃分桶的列）。如果兩者不匹配，在査詢時可能會碰到錯誤或未定義的結果。因此，建議讓Hive來進行劃分桶的操作。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.hive中join優化","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"mapside join","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方法一：","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"select /*+ MAPJOIN(time_dim) ","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"/ count(","attrs":{}},{"type":"text","text":") from","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"store_sales join time_dim on (ss_sold_time_sk = t_time_sk)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方法二：這個可以由hive自動進行map端join","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.auto.convert.join=true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"select count(*) from","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"store_sales join time_dim on (ss_sold_time_sk = t_time_sk)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"執行下面這段代碼將會產生兩個map-only方法：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"select /*+ MAPJOIN(time_dim, date_dim) ","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"/ count(","attrs":{}},{"type":"text","text":") from","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"store_sales","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"join time_dim on (ss_sold_time_sk = t_time_sk)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"join date_dim on (ss_sold_date_sk = d_date_sk)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"where t_hour = 8 and d_year = 2002","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置下面兩個屬性hive將會進行自動執行上述過程，第一個屬性默認爲true，第二個屬性是設置map端join適合讀取內存文件的大小。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.auto.convert.join.noconditionaltask = true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.auto.convert.join.noconditionaltask.size = 10000000;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"Sort-Merge-Bucket (SMB) joins可以被轉化爲SMB map joins。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們只需要設置一下幾個參數即可：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.auto.convert.sortmerge.join=true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.optimize.bucketmapjoin = true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.optimize.bucketmapjoin.sortedmerge = true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"已設置大表選擇的策略","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用下面屬性：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.auto.convert.sortmerge.join.bigtable.selection.policy= org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ; ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"幾種策略設置","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"org.apache.hadoop.hive.ql.optimizer.AvgPartitionSizeBasedBigTableSelectorForAutoSMJ (default)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"org.apache.hadoop.hive.ql.optimizer.LeftmostBigTableSelectorForAutoSMJ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"詳細請參考一下連接：hive中進行連接方案詳解(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3，數據傾斜","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Hive的執行是分階段的，map處理數據量的差異取決於上一個stage的reduce輸出，所以如何將數據均勻的分配到各個reduce中，就是解決數據傾斜的根本所在。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"操作","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/51/51dafb8a0725a2b46941898f5c5ba58f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"原因","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1，數據在節點上分佈不均","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2，key分佈不均(key中存在個別值數據量比較大，比如NULL,那麼join時就會容易發生數據傾斜)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3，count(disctinct key)，在數據兩比較大的時候容易發生數據傾斜，因爲count（distinct）是按照group by字段進行分組的","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4，group by的使用容易造成數據傾斜","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"5，業務數據本身的特性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"6，建表時考慮不周","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"7，某些SQL語句本身就有數據傾斜","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"表現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任務進度長時間維持在99%左右，查看任務監控頁面發現只有少量reduce任務未完成。因爲其處理的數據量和其他reduce差異過大。單一reduce的記錄數與平均記錄數差異過大，通常可能達到3倍甚至更多。最長時長遠大於平均時長。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4，數據傾斜的解決方案","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"參數調節","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.map.aggr=true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Map 端部分聚合，相當於Combiner","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.groupby.skewindata=true;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有數據傾斜的時候進行負載均衡，當選項設定爲 true，生成的查詢計劃會有兩個 MR Job。第一個 MR Job 中，Map 的輸出結果集合會隨機分佈到 Reduce 中，每個 Reduce 做部分聚合操作，並輸出結果，這樣處理的結果是相同的 Group By Key 有可能被分發到不同的 Reduce 中，從而達到負載均衡的目的；第二個 MR Job 再根據預處理的數據結果按照 Group By Key 分佈到 Reduce 中（這個過程可以保證相同的 Group By Key 被分佈到同一個 Reduce 中），最後完成最終的聚合操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cf/cfff8b2606c024fb98faf853b2a02740.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"SQL語句調節：","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何Join 關於驅動表的選取，選用join key分佈最均勻的表作爲驅動表做好列裁剪和filter操作，以達到兩表做join的時候，數據量相對變小的效果。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大小表Join 使用map join讓小的維度表（1000條以下的記錄條數）先進內存。在map端完成reduce.","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大表Join大表把空值的key變成一個字符串加上隨機數，把傾斜的數據分到不同的reduce上，由於null值關聯不上，處理後並不影響最終結果。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"count distinct大量相同特殊值容易傾斜，當xx字段存在大量的某個值時，NULL或者空的記錄","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決思路將特定的值，進行特定的處理。比如是null，過濾掉，where case 特定方式轉換特定的值，使得這些值不一樣，同時這些值不影響分析。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"group by維度過小：Group by在Map端進行部分數據合併","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.map.aggr ; --> 是否在Map端進行數據聚合，默認設置爲true； set hive.groupby.mapaggr.checkinterval ; --> 在Map端進行聚合操作的條目數。 ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"進行負載均衡負載均衡","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"set hive.groupby.skewindata ; 默認值是false，需要設置成true ; 當設置爲true時，會變成兩個MapReduce ; ","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一個MR JOb中，map的輸出結果會隨機分佈到Reduce中，每個Reduce做部分聚合操作，並輸出結果，這樣出來的結果相同的Group By Key有可能被分發到不同的Reduce中，從而達到輔助均衡目的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二個MR JOb，會根據預處理數據結果按照key分佈到Reduce中，最終完成聚合操作。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5，典型的業務場景","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"空值產生的數據傾斜","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"場景：如日誌中，常會有信息丟失的問題，比如日誌中的 user_id，如果取其中的 user_id 和用戶表中的user_id 關聯，會碰到數據傾斜的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方法1：user_id爲空的不參與關聯","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"s","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"elect * from log a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"join users b","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"on a.user_id is not null","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"and a.user_id = b.user_id","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"union all","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"select * from log a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"where a.user_id is null;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方法2 ：賦與空值分新的key值","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"`select *","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"from log a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"left outer join users b","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"on case when a.user_id is null then concat(‘hive’,rand() ) else a.user_id end = b.user_id;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"`","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"結論：方法2比方法1效率更好，不但io少了，而且作業數也少了。解決方法1中 log讀取兩次，jobs是2。解決方法2 job數是1 。這個優化適合無效 id (比如 -99 , ’’, null 等) 產生的傾斜問題。把空值的 key 變成一個字符串加上隨機數，就能把傾斜的數據分到不同的reduce上 ,解決數據傾斜問題。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"不同數據類型關聯產生數據傾斜","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"場景：用戶表中user_id字段爲int，log表中user_id字段既有string類型也有int類型。當按照user_id進行兩個表的Join操作時，默認的Hash操作會按int型的id來進行分配，這樣會導致所有string類型id的記錄都分配到一個Reducer中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方法：把數字類型轉換成字符串類型","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"`select * from users a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"left outer join logs b","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"on a.usr_id = cast(b.user_id as string)","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"`","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"小表不小不大，怎麼用 map join 解決傾斜問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用 map join解決小表(記錄數少)關聯大表的數據傾斜問題，這個方法使用的頻率非常高，但如果小表很大，大到map join會出現bug或異常，這時就需要特別的處理。以下例子:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"`select * from log a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"left outer join users b","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"on a.user_id = b.user_id;","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"`","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"users 表有 600w+ 的記錄，把 users 分發到所有的 map 上也是個不小的開銷，而且 map join 不支持這麼大的小表。如果用普通的 join，又會碰到數據傾斜的問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"解決方法：","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"`select /","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"+mapjoin(x)","attrs":{}},{"type":"text","text":"/* from log a","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"left outer join (","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"text"},"content":[{"type":"text","text":"select /*+mapjoin(c)*/d.* \n \n from ( select distinct user_id from log ) c \n \n join users d \n \n on c.user_id = d.user_id \n \n) x \n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"on a.user_id = b.user_id;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"`","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"假如，log裏user_id有上百萬個，這就又回到原來map join問題。所幸，每日的會員uv不會太多，有交易的會員不會太多，有點擊的會員不會太多，有佣金的會員不會太多等等。所以這個方法能解決很多場景下的數據傾斜問題。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6，數據傾斜總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使map的輸出數據更均勻的分佈到reduce中去，是我們的最終目標。由於Hash算法的侷限性，按key Hash會或多或少的造成數據傾斜。大量經驗表明數據傾斜的原因是人爲的建表疏忽或業務邏輯可以規避的。在此給出較爲通用的步驟：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、採樣log表，哪些user_id比較傾斜，得到一個結果表tmp1。由於對計算框架來說，所有的數據過來，他都是不知道數據分佈情況的，所以採樣是並不可少的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、數據的分佈符合社會學統計規則，貧富不均。傾斜的key不會太多，就像一個社會的富人不多，奇特的人不多一樣。所以tmp1記錄數會很少。把tmp1和users做map join生成tmp2,把tmp2讀到distribute file cache。這是一個map過程。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"3、map讀入users和log，假如記錄來自log,則檢查user_id是否在tmp2裏，如果是，輸出到本地文件a,否則生成的key,value對，假如記錄來自member,生成的key,value對，進入reduce階段。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"4、最終把a文件，把Stage3 reduce階段輸出的文件合併起寫到hdfs。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果確認業務需要這樣傾斜的邏輯，考慮以下的優化方案：","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"1、對於join，在判斷小表不大於1G的情況下，使用map join","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"2、對於group by或distinct，設定","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"hive.groupby.skewindata=true ","attrs":{}}],"attrs":{}}]}]}

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

618網購節，電商能擋住惡意網絡爬蟲的攻擊嗎？

2023年，杭州中院審結了兩起涉及“搬店軟件”的不正當競爭案件。本案的原告是國內某大型知名電子商務平臺的運營主體，而被告則是開發了一款名爲“某搬家快速商品上貨批量發佈”的複製軟件，被控非法獲取平臺商品信息並在其他服務市場銷售。根據原告的訴

2024-06-07 00:14:57

提高數據抓取效率：Swift中Crawler的併發管理

前言數據的獲取和處理能力成爲衡量一個應用性能的重要標準。網絡爬蟲作爲數據抓取的重要工具，其效率直接影響到數據獲取的質量和速度。Swift語言以其出色的性能和簡潔的語法，成爲了許多開發者編寫網絡爬蟲的首選語言。本文將詳細介紹如何在Swi

2024-06-07 00:06:36

超越預期：Containerd 如何成爲 Kubernetes 的首選容器運行時

> 作者：尹珉，KubeSphere Ambassado，rKubeSphere Contributor，KubeSphere 社區用戶委員會杭州站站長。踏上 Containerd 技術之旅容器技術已經成爲現代軟件開發和部署的核心工

2024-06-06 23:16:14

原來Stable Diffusion是這樣工作的

stable diffusion是一種潛在擴散模型，可以從文本生成人工智能圖像。爲什麼叫做潛在擴散模型呢？這是因爲與在高維圖像空間中操作不同，它首先將圖像壓縮到潛在空間中，然後再進行操作。在這篇文章中，我們將深入瞭解它到底是如何工作的,還

2024-06-06 21:38:48

首批！Zilliz 獲得亞馬遜雲科技生成式 AI 合作伙伴能力認證

Zilliz 正式宣佈通過亞馬遜雲科技生成式 AI 能力認證！這一認證不僅肯定了 Zilliz 在人工智能和非結構化數據領域的卓越能力，也標誌着 Zilliz 在推動 AI 技術創新和應用的道路上邁出了重要一步。亞馬遜雲科技生

2024-06-06 14:16:04

一文搞懂 Spring 循環依賴

這個其實是一個特別高頻的面試題，松哥也一直很想和大家仔細來聊一聊這個話題，網上關於這塊的文章很多，但是我一直覺得要把這個問題講清楚還有點難度，今天我來試一試，看能不能和小夥伴們把這個問題梳理清楚，當然，如果小夥伴們覺得看文章不過癮，松哥也有

2024-06-06 13:11:47

原來 pt-osc 改表是這樣實現的！原理詳解【附場景案例】

pt-osc原理探索及其觸發器的深入分析 > 作者：莫善，某互聯網公司高級 DBA。 > > 愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯繫小編並註明來源。 > > 本文約 6000 字，預計閱讀需要 20 分鐘。背景自工

2024-06-06 11:58:38

營銷系統黑名單優化：位圖的應用解析

背景營銷系統中，客戶投訴是業務發展的一大阻礙，一般會過濾掉黑名單高風險賬號，並配合頻控策略，來減少客訴，進而增加營銷效率，減少營銷成本，提升營銷質量。營銷系統一般是通過大數據分析建模，在CDP（客戶數據平臺，以客戶爲核心，圍繞數據融

京東雲開發者

2024-06-06 11:54:12

跨越雲端，華爲雲技術專家分享高效跨雲遷移實踐

本文分享自華爲雲社區《【華爲雲Stack】【大架光臨】第18期：跨越雲端，華爲雲Stack的高效跨雲遷移實踐》，作者：大架光臨。 1 背景在企業雲化的浪潮中，混合多雲已經是企業IT部署的新常態，虛擬機承載的業務佔據很大的比重。在上雲

2024-06-06 10:56:54

【AI應用開發全流程】使用AscendCL開發板完成模型推理

本文分享自華爲雲社區《【昇騰開發全流程】AscendCL開發板模型推理》，作者：沉迷sk。前言學會如何安裝配置華爲雲ModelArts、開發板Atlas 200I DK A2。並打通一個Ascend910訓練到Ascend310推理

2024-06-05 22:57:15

計算機英文教材太難啃？Higress 和通義千問幫你！

作者：張添翼（澄潭）計算機相關英文教材的中譯本質量堪憂，對於計算機專業的學生來說，應該深有體會。因爲大部分教材的譯者本人可能未必完全喫透書中技術內容，又或者是領域技術大拿，但並不擅長英文翻譯。本文將介紹基於 AI 大語言模型進行英文技術

2024-06-05 21:13:50

ApsaraMQ Copilot for RocketMQ：消息數據集成鏈路的健康管家

作者：文婷引言如何正確使用消息隊列保證業務集成鏈路的穩定性，是消息隊列用戶首要關心的問題。ApsaraMQ Copilot for RocketMQ 從集成業務穩定性、成本、性能等方面幫助用戶更高效地使用產品。背景消息隊列產品通過異

2024-06-05 21:13:47

Apache DolphinScheduler 社區5月月報更新！

各位熱愛 DolphinScheduler 的小夥伴們，社區5月份月報更新啦！這裏將記錄 DolphinScheduler 社區每月的重要更新，歡迎關注，期待下個月你也登上Merge Star月度榜單哦~ 月度Merge Star 感謝以下

2024-06-04 21:22:03

修復 MySQL 8.4 的 "mysql_native_password is not loaded" 插件未加載錯誤

修復 MySQL 8.4 的 "mysql_native_password is not loaded" 插件未加載錯誤將 mysql_native_password 用戶更新到 caching_sha2_password 在具有足夠權限

2024-06-04 14:30:04

界面控件DevExpress WinForms的流程圖組件 - 可完美複製Visio功能（二）

DevExpress WinForms的Diagram（流程圖）組件允許您複製Microsoft Visio中的許多功能，並能在下一個Windows Forms項目中引入信息豐富的圖表、流程圖和組織圖。 P.S：DevExpress Win

2024-06-04 12:32:12

24小時熱門文章

最新文章

最新評論文章