【Spark】SparkSql 數據類型轉換

前言

數據類型轉換這個在任何語言框架中都會涉及到，看起來非常簡單，不過要把所有的數據類型都掌握還是需要一定的時間歷練。

SparkSql數據類型

數字類型

ByteType：代表一個字節的整數。範圍是-128到127
ShortType：代表兩個字節的整數。範圍是-32768到32767
IntegerType：代表4個字節的整數。範圍是-2147483648到2147483647
LongType：代表8個字節的整數。範圍是-9223372036854775808到9223372036854775807
FloatType：代表4字節的單精度浮點數
DoubleType：代表8字節的雙精度浮點數
DecimalType：代表任意精度的10進制數據。通過內部的java.math.BigDecimal支持。BigDecimal由一個任意精度的整型非標度值和一個32位整數組成
StringType：代表一個字符串值
BinaryType：代表一個byte序列值
BooleanType：代表boolean值
Datetime類型
TimestampType：代表包含字段年，月，日，時，分，秒的值
DateType：代表包含字段年，月，日的值

複雜類型

ArrayType(elementType, containsNull)：代表由elementType類型元素組成的序列值。containsNull用來指明ArrayType中的值是否有null值
MapType(keyType, valueType, valueContainsNull)：表示包括一組鍵 - 值對的值。通過keyType表示key數據的類型，通過valueType表示value數據的類型。valueContainsNull用來指明MapType中的值是否有null值
StructType(fields):表示一個擁有StructFields (fields)序列結構的值
StructField(name, dataType, nullable):代表StructType中的一個字段，字段的名字通過name指定，dataType指定field的數據類型，nullable表示字段的值是否有null值。

Spark Sql數據類型和Scala數據類型對比

sparksql 數據類型	scala數據類型
ByteType	Byte
ShortType	Short
IntegerType	Int
LongType	Long
FloatType	Float
DoubleType	Double
DecimalType	scala.math.BigDecimal
StringType	String
BinaryType	Array[Byte]
BooleanType	Boolean
TimestampType	java.sql.Timestamp
DateType	java.sql.Date
ArrayType	scala.collection.Seq
MapType	scala.collection.Map
StructType	org.apache.spark.sql.Row
StructField	The value type in Scala of the data type of this field (For example, Int for a StructField with the data type IntegerType)

Spark Sql數據類型轉換案例

一句話描述：調用Column類的cast方法

如何獲取Column類

這個之前寫過

df("columnName")            // On a specific `df` DataFrame.
col("columnName")           // A generic column not yet associated with a DataFrame.
col("columnName.field")     // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName"               // Scala short hand for a named column.

測試數據準備

1,tom,23
2,jack,24
3,lily,18
4,lucy,19

spark入口代碼

val spark = SparkSession
      .builder()
      .appName("test")
      .master("local[*]")
      .getOrCreate()

測試默認數據類型

spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .dtypes
      .foreach(println)

結果：

(id,StringType)
(name,StringType)
(age,StringType)

說明默認都是StringType類型

把數值型的列轉爲IntegerType

 import spark.implicits._
    spark.read.
      textFile("./data/user")
      .map(_.split(","))
      .map(x => (x(0), x(1), x(2)))
      .toDF("id", "name", "age")
      .select($"id".cast("int"), $"name", $"age".cast("int"))
      .dtypes
      .foreach(println)

結果：

(id,IntegerType)
(name,StringType)
(age,IntegerType)

Column類cast方法的兩種重載

第一種
def cast(to: String): Column
Casts the column to a different data type, using the canonical string representation of the type. The supported types are:
string, boolean, byte, short, int, long, float, double, decimal, date, timestamp.

// Casts colA to integer.
df.select(df("colA").cast("int"))
Since
1.3.0

第二種
def cast(to: DataType): Column
Casts the column to a different data type.

// Casts colA to IntegerType.
import org.apache.spark.sql.types.IntegerType
df.select(df("colA").cast(IntegerType))
// equivalent to
df.select(df("colA").cast("int"))

雲祁°

發佈了287 篇原創文章 · 獲贊 140 · 訪問量 2萬+

私信關注

【Spark】SparkSql 數據類型轉換

前言

SparkSql數據類型

數字類型

複雜類型

Spark Sql數據類型和Scala數據類型對比

Spark Sql數據類型轉換案例

如何獲取Column類

測試數據準備

spark入口代碼

測試默認數據類型

把數值型的列轉爲IntegerType

Column類cast方法的兩種重載

ziw2pdf

apisix~helm方式的部署到k8s

firmeye - IoT固件漏洞挖掘工具

一個月面試近20家大中小廠，在互聯網寒冬突破重圍，成功上岸！

數倉分層的意義價值及如何設計數據分層

Spark（十七）Spark Core 調優之資源調優JVM的GC垃圾收集器

Spark（十五）Spark Core 調優之Spark資源調優

Spark（十三）Spark Core 調優之Shuffle調優

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結