spark編程模型（十五）之RDD鍵值轉換操作（Transformation Operation）——cogroup、join

cogroup

參數爲1個RDD
- def cogroup[W](other: RDD[(K, W)]): RDD[(K, (Iterable[V], Iterable[W]))]
- def cogroup[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W]))]
- def cogroup[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (Iterable[V], Iterable[W]))]
參數爲2個RDD
- def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)]): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
- def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
- def cogroup[W1, W2](other1: RDD[(K, W1)], other2: RDD[(K, W2)], partitioner: Partitioner): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
參數爲3個RDD
- def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)]): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
- def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], numPartitions: Int): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
- def cogroup[W1, W2, W3](other1: RDD[(K, W1)], other2: RDD[(K, W2)], other3: RDD[(K, W3)], partitioner: Partitioner): RDD[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
cogroup相當於SQL中的全外關聯full outer join，返回左右RDD中的記錄，關聯不上的爲空
參數numPartitions用於指定結果的分區數

參數partitioner用於指定分區函數

var rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2)
var rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)

scala> var rdd3 = rdd1.cogroup(rdd2)
rdd3: org.apache.spark.rdd.RDD[(String, (Iterable[String], Iterable[String]))] = MapPartitionsRDD[12] at cogroup at :25

scala> rdd3.partitions.size
res3: Int = 2

scala> rdd3.collect
res1: Array[(String, (Iterable[String], Iterable[String]))] = Array(
(B,(CompactBuffer(2),CompactBuffer())), 
(D,(CompactBuffer(),CompactBuffer(d))), 
(A,(CompactBuffer(1),CompactBuffer(a))), 
(C,(CompactBuffer(3),CompactBuffer(c)))
)


scala> var rdd4 = rdd1.cogroup(rdd2,3)
rdd4: org.apache.spark.rdd.RDD[(String, (Iterable[String], Iterable[String]))] = MapPartitionsRDD[14] at cogroup at :25

scala> rdd4.partitions.size
res5: Int = 3

scala> rdd4.collect
res6: Array[(String, (Iterable[String], Iterable[String]))] = Array(
(B,(CompactBuffer(2),CompactBuffer())), 
(C,(CompactBuffer(3),CompactBuffer(c))), 
(A,(CompactBuffer(1),CompactBuffer(a))), 
(D,(CompactBuffer(),CompactBuffer(d))))

var rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2)
var rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)
var rdd3 = sc.makeRDD(Array(("A","A"),("E","E")),2)

scala> var rdd4 = rdd1.cogroup(rdd2,rdd3)
rdd4: org.apache.spark.rdd.RDD[(String, (Iterable[String], Iterable[String], Iterable[String]))] = 
MapPartitionsRDD[17] at cogroup at :27

scala> rdd4.partitions.size
res7: Int = 2

scala> rdd4.collect
res9: Array[(String, (Iterable[String], Iterable[String], Iterable[String]))] = Array(
(B,(CompactBuffer(2),CompactBuffer(),CompactBuffer())), 
(D,(CompactBuffer(),CompactBuffer(d),CompactBuffer())), 
(A,(CompactBuffer(1),CompactBuffer(a),CompactBuffer(A))), 
(C,(CompactBuffer(3),CompactBuffer(c),CompactBuffer())), 
(E,(CompactBuffer(),CompactBuffer(),CompactBuffer(E))))

join

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]
join相當於SQL中的內關聯join，只返回兩個RDD根據K可以關聯上的結果，join只能用於兩個RDD之間的關聯，如果要多個RDD關聯，多關聯幾次即可
參數numPartitions用於指定結果的分區數

參數partitioner用於指定分區函數

var rdd1 = sc.makeRDD(Array(("A","1"),("B","2"),("C","3")),2)
var rdd2 = sc.makeRDD(Array(("A","a"),("C","c"),("D","d")),2)

scala> rdd1.join(rdd2).collect
res10: Array[(String, (String, String))] = Array((A,(1,a)), (C,(3,c)))

spark編程模型（十五）之RDD鍵值轉換操作（Transformation Operation）——cogroup、join

cogroup

join

再談23種設計模式（3）：行爲型模式（學習筆記）

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

那積滿灰塵的機械鍵盤，該重新拿起了

spark編程模型（二十二）之RDD存儲行爲操作（Action Operation）——saveAsTextFile、saveAsSequenceFile、saveAsObjectFile

spark自定義分區實例

Hive 與 SparkSQL 整合

spark編程模型（十八）之RDD集合標量行爲操作（Action Operation）——first、count、reduce、collect

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結