前言
RDD的算子雖然豐富,但是執行效率不如DS,DF,一般業務可以用DF或者DS就能輕鬆完成,但是有時候業務只能通過RDD的算子來完成,下面就簡單介紹之間的轉換。
三者間的速度比較測試!
這裏的DS區別於sparkstream裏的DStream!!
轉換關係
RDD的出現早於DS,DF。由於scala的擴展機制,必定是要用到隱式轉換的!
所以在RDD下要轉DF或者DS,就應該導隱式對象包!
val conf = new SparkConf().setMaster("local[*]").setAppName("Foreach")
val ssc = new StreamingContext(conf, Seconds(3))
ssc.checkpoint("./ck2")
//獲取ss
val ss = SparkSession.builder()
.config(ssc.sparkContext.getConf)
.getOrCreate()
//通過 對SparkSession類裏面的implicits對象的導入實現!
import ss.implicits._
RDD轉DS,將RDD的每一行封裝成樣例類,再調用toDS方法
DF與RDD之間的轉換
後面纔出現的DF,DS到RDD只需要直接通過對象.rdd就可以轉化
package spark.sql.std.day01
import org.apache.spark.sql.SparkSession
/**
* @ClassName:DF2RDD
* @author: zhengkw
* @description:
* @date: 20/05/13上午 10:52
* @version:1.0
* @since: jdk 1.8 scala 2.11.8
*/
object DF2RDD {
def main(args: Array[String]): Unit = {
//創建一個builder,從builder或者取session
val spark = SparkSession.builder()
.appName("DF2RDD")
.master("local[2]")
.getOrCreate()
//獲得df
val df = spark.read.json("E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
df.printSchema()
//df轉rdd
val rdd = df.rdd
val result = rdd.map(row => {
val age = row.get(0)
//row.getAs()
// row.getLong()
val name = row.get(1)
(age, name)
}
)
result.collect.foreach(println)
}
}
封裝樣例類
package spark.sql.std.day01
import org.apache.spark.sql.SparkSession
import scala.collection.mutable
/**
* @ClassName:RDD2DF_2
* @author: zhengkw
* @description:
* @date: 20/05/13上午 11:35
* @version:1.0
* @since: jdk 1.8 scala 2.11.8
*/
object RDD2DF_2 {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("RDD2DF_2")
.master("local[2]")
.getOrCreate()
val list = List(User(22, "java"), User(23, "keke"))
val list1 = list :+ User(15, "ww")
// list.foreach(println)
val rdd = spark.sparkContext.parallelize(list1)
import spark.implicits._
val df = rdd.toDF("age", "name")
df.show()
}
}
case class User(age: Int, name: String)
DF與DS之間的轉換
package spark.sql.std.day02
import org.apache.spark.sql.SparkSession
/**
* @ClassName:DSDF
* @author: zhengkw
* @description:
* @date: 20/05/14上午 10:06
* @version:1.0
* @since: jdk 1.8 scala 2.11.8
*/
object DSDF {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[2]")
.appName("DSDF")
.getOrCreate()
import spark.implicits._
//read json讀到的數字會轉成Long
val df = spark.read.json("file:///E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
val ds = df.as[People]
ds.show()
val df1 = ds.toDF()
df1.show()
}
}
case class People(age: Long, name: String)
DS2DF封裝樣例類方式
package spark.sql.std.day02
import org.apache.spark.sql.SparkSession
/**
* @ClassName:DS2RDD
* @author: zhengkw
* @description:
* @date: 20/05/14上午 9:34
* @version:1.0
* @since: jdk 1.8 scala 2.11.8
*/
object DS2RDD {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[2]")
.appName("DS2RDD")
.getOrCreate()
//val df = spark.read.json("E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
val list = User(21, "nokia") :: User(18, "java") :: User(20, "scala") :: User(20, "nova") :: Nil
import spark.implicits._
val rdd = spark.sparkContext.parallelize(list)
val ds = rdd.toDS()
ds.rdd.collect().foreach(println)
}
}
case class User(age: Int, name: String)
關於其他的不在本博文討論範圍,相關信息查閱其他博文!
三者區別於聯繫