Spark中RDD與DF與DS之間的轉換關係

前言

在這裏插入圖片描述RDD的算子雖然豐富,但是執行效率不如DS,DF,一般業務可以用DF或者DS就能輕鬆完成,但是有時候業務只能通過RDD的算子來完成,下面就簡單介紹之間的轉換。
三者間的速度比較測試!
這裏的DS區別於sparkstream裏的DStream!!

轉換關係

RDD的出現早於DS,DF。由於scala的擴展機制,必定是要用到隱式轉換的!
所以在RDD下要轉DF或者DS,就應該導隱式對象包!

 val conf = new SparkConf().setMaster("local[*]").setAppName("Foreach")
    val ssc = new StreamingContext(conf, Seconds(3))
    ssc.checkpoint("./ck2")
        //獲取ss
    val ss = SparkSession.builder()
      .config(ssc.sparkContext.getConf)
      .getOrCreate()
      //通過 對SparkSession類裏面的implicits對象的導入實現!
import ss.implicits._

RDD轉DS,將RDD的每一行封裝成樣例類,再調用toDS方法

DF與RDD之間的轉換

後面纔出現的DF,DS到RDD只需要直接通過對象.rdd就可以轉化

package spark.sql.std.day01

import org.apache.spark.sql.SparkSession

/**
 * @ClassName:DF2RDD
 * @author: zhengkw
 * @description:
 * @date: 20/05/13上午 10:52
 * @version:1.0
 * @since: jdk 1.8 scala 2.11.8
 */
object DF2RDD {
  def main(args: Array[String]): Unit = {
    //創建一個builder,從builder或者取session
    val spark = SparkSession.builder()
      .appName("DF2RDD")
      .master("local[2]")
      .getOrCreate()
    //獲得df
    val df = spark.read.json("E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
    df.printSchema()
    //df轉rdd
    val rdd = df.rdd
    val result = rdd.map(row => {
      val age = row.get(0)
      //row.getAs()
     // row.getLong()
      val name = row.get(1)
      (age, name)
    }
    )
    result.collect.foreach(println)

  }
}

封裝樣例類

package spark.sql.std.day01

import org.apache.spark.sql.SparkSession

import scala.collection.mutable


/**
 * @ClassName:RDD2DF_2
 * @author: zhengkw
 * @description:
 * @date: 20/05/13上午 11:35
 * @version:1.0
 * @since: jdk 1.8 scala 2.11.8
 */
object RDD2DF_2 {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("RDD2DF_2")
      .master("local[2]")
      .getOrCreate()
    val list = List(User(22, "java"), User(23, "keke"))
    val list1 = list :+ User(15, "ww")
    // list.foreach(println)

    val rdd = spark.sparkContext.parallelize(list1)
    import spark.implicits._
    val df = rdd.toDF("age", "name")
    df.show()

  }
}

case class User(age: Int, name: String)

DF與DS之間的轉換

package spark.sql.std.day02

import org.apache.spark.sql.SparkSession

/**
 * @ClassName:DSDF
 * @author: zhengkw
 * @description:
 * @date: 20/05/14上午 10:06
 * @version:1.0
 * @since: jdk 1.8 scala 2.11.8
 */
object DSDF {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .master("local[2]")
      .appName("DSDF")
      .getOrCreate()
    import spark.implicits._
   //read json讀到的數字會轉成Long
    val df = spark.read.json("file:///E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
    val ds = df.as[People]
    ds.show()
    val df1 = ds.toDF()
    df1.show()

  }
}

case class People(age: Long, name: String)

DS2DF封裝樣例類方式

package spark.sql.std.day02

import org.apache.spark.sql.SparkSession

/**
 * @ClassName:DS2RDD
 * @author: zhengkw
 * @description:
 * @date: 20/05/14上午 9:34
 * @version:1.0
 * @since: jdk 1.8 scala 2.11.8
 */
object DS2RDD {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .master("local[2]")
      .appName("DS2RDD")
      .getOrCreate()
    //val df = spark.read.json("E:\\IdeaWorkspace\\sparkdemo\\data\\people.json")
    val list = User(21, "nokia") :: User(18, "java") :: User(20, "scala") :: User(20, "nova") :: Nil

    import spark.implicits._
    val rdd = spark.sparkContext.parallelize(list)
    val ds = rdd.toDS()
    ds.rdd.collect().foreach(println)
  }
}

case class User(age: Int, name: String)

關於其他的不在本博文討論範圍,相關信息查閱其他博文!
三者區別於聯繫

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章