帶你入門GeoSpark系列之二【Spatial RDD篇】

系列目錄

帶你入門GeoSpark系列之一【環境篇】
帶你入門GeoSpark系列之二【Spatial RDD篇】
帶你入門GeoSpark系列之三【SQL&空間查詢&索引篇】

1、基本地理數據概念

GeoSpark本質還是對地理要素進行操作,所以它支持了常用的一些地學幾何圖形。
幾何圖形中主要有三個要素:點,線,面。
橫縱座標構成點,多個點構成線,環線構成面,點線面混合構成幾何集合。

對應的幾個類爲:
座標:Coordinate
點:Point、MultiPoint
線:LineString、MultiLineString(多條線)、LinearRing(環線)
面:Polygon、MultiPolygon
集合:GeometryCollection

之後我們創建的RDD[T] 中的泛型T就是以上這些類

2、通過GeometryFactory創建地理數據

所有地理對象都是通過com.vividsolutions.jts.geom包下的GeometryFactory工廠類完成創建

package com.suddev.bigdata.core
import com.vividsolutions.jts.geom.{Coordinate, GeometryFactory}

object GeoDemoApp {
  def main(args: Array[String]): Unit = {
    // 創建一個座標
    val coord = new Coordinate(-84.01, 34.01)
    // 實例化Geometry工廠類
    val factory = new GeometryFactory()
    // 創建Point
    val pointObject = factory.createPoint(coord)
    // 創建Polygon
    val coordinates = new Array[Coordinate](5)
    coordinates(0) = new Coordinate(0,0)
    coordinates(1) = new Coordinate(0,4)
    coordinates(2) = new Coordinate(4,4)
    coordinates(3) = new Coordinate(4,0)
    // 多邊形是閉合的,所有最後一個點就是第一個點
    coordinates(4) = coordinates(0) 
    val polygonObject = factory.createPolygon(coordinates)
    // 創建LineString
    val coordinates2 = new Array[Coordinate](4)
    coordinates2(0) = new Coordinate(0,0)
    coordinates2(1) = new Coordinate(0,4)
    coordinates2(2) = new Coordinate(4,4)
    coordinates2(3) = new Coordinate(4,0)
    val linestringObject = factory.createLineString(coordinates2)
  }
}

3、創建SpatialRDD(SRDD)

GeoSpark-Core 提供了三種特殊的SpatialRDD: PointRDDPolygonRDDLineStringRDD
SRDD
⚠️注意: GeoSpark定義的SpatialRDD是對sparkRDD的進一步封裝(並不是RDD的實現),原RDD被放到了SpatialRDD之內了
raw
它們可以從Spark RDD,CSV,TSV,WKT,WKB,Shapefiles,GeoJSON和NetCDF / HDF格式加載。
這裏給出幾種常用場景示例

3.1 初始化SparkContext

val conf = new SparkConf().
	 setAppName("GeoSparkDemo2").
	 setMaster("local[*]").
	 set("spark.serializer", classOf[KryoSerializer].getName).
	 set("spark.kryo.registrator", classOf[GeoSparkKryoRegistrator].getName)
val sc = new SparkContext(conf)

3.2 創建typed Spatial RDD

3.2.1 通過已有Spark RDD創建PointRDD

// 數據準備
val data = Array(
      (-88.331492,32.324142,"hotel"),
      (-88.175933,32.360763,"gas"),
      (-88.388954,32.357073,"bar"),
      (-88.221102,32.35078,"restaurant")
    )
val geometryFactory = new GeometryFactory()
// 創建Spark RDD[Point]
val pointsRowSpatialRDD = sc.parallelize(data)
      .map(x => {
      	// 創建座標
        val coord = new Coordinate(x._1, x._2)
        // 用戶定義數據
        val userData = x._3
        // 創建Point
        val point = geometryFactory.createPoint(coord)
        // Point支持攜帶用戶數據
        point.setUserData(userData)
        point
       })
// 創建PointRDD 
val pointRDD = new PointRDD(pointsRowSpatialRDD)

3.2.2 通過CSV/TSV創建PointRDD

創建checkin.csvdata/checkin.csv路徑下:

-88.331492,32.324142,hotel
-88.175933,32.360763,gas
-88.388954,32.357073,bar
-88.221102,32.35078,restaurant

checkin.csv一共有三列(Column IDs) 爲 0, 1, 2.
第0,1 列是座標
第2列是用戶定義數據
pointRDDOffset 控制地理座標從第幾列開始,故offset=0

val pointRDDInputLocation = "data/checkin.csv"
val pointRDDOffset = 0  // The coordinates start from Column 0
val pointRDDSplitter = FileDataSplitter.CSV // or use  FileDataSplitter.TSV
val carryOtherAttributes = true // 支持攜帶用戶定義數據 (hotel, gas, bar...)
var objectRDD = new PointRDD(sc, pointRDDInputLocation, pointRDDOffset, pointRDDSplitter, carryOtherAttributes)

3.2.3 通過CSV/TSV創建PolygonRDD/LineStringRDD

創建checkinshape.csvdata/checkin.csv路徑下:

-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,-88.331492,32.324142,hotel
-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,-88.175933,32.360763,gas
-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,-88.388954,32.357073,bar
-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,-88.221102,32.35078,restaurant

checkinshape.csv一共有11列(Column IDs) 爲 0~10
第0 - 9 列是5個座標
第10列是用戶定義數據
polygonRDDStartOffset 控制地理座標從第幾列開始,故StartOffset = 0
polygonRDDStartOffset 控制地理座標從第幾列結束,故EndOffset = 8

val polygonRDDInputLocation = "data/checkinshape.csv"
val polygonRDDStartOffset = 0 // The coordinates start from Column 0
val polygonRDDEndOffset = 8 // The coordinates end at Column 8
val polygonRDDSplitter = FileDataSplitter.CSV // or use  FileDataSplitter.TSV
val carryOtherAttributes = true
var objectRDD = new PolygonRDD(sc, polygonRDDInputLocation, polygonRDDStartOffset, polygonRDDEndOffset, polygonRDDSplitter, carryOtherAttributes)

3.3 創建通用Spatial RDD

通用SpatialRDD不同於PointRDDPolygonRDDLineStringRDD,它允許輸入數據文件包含混合的幾何類型,能夠適用更多場景。
WKT/WKB/GeoJson/Shapefile等文件類型就可以支持保存多種地理數據如 LineString, PolygonMultiPolygon

3.3.1 通過WKT/WKB創建

checkin.tsv

POINT(-88.331492 32.324142)	hotel
POINT(-88.175933 32.360763)	gas
POINT(-88.388954 32.357073)	bar
POINT(-88.221102 32.35078)	restaurant

代碼:

val inputLocation = "data/checkin.tsv"
val wktColumn = 0 // The WKT string starts from Column 0
val allowTopologyInvalidGeometries = true 
val skipSyntaxInvalidGeometries = false  
val spatialRDD = WktReader.readToGeometryRDD(sc, inputLocation, wktColumn, allowTopologyInvalidGeometries, skipSyntaxInvalidGeometries)

3.3.2 通過GeoJSON創建

polygon.json

{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "077", "TRACTCE": "011501", "BLKGRPCE": "5", "AFFGEOID": "1500000US010770115015", "GEOID": "010770115015", "NAME": "5", "LSAD": "BG", "ALAND": 6844991, "AWATER": 32636 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -87.621765, 34.873444 ], [ -87.617535, 34.873369 ], [ -87.6123, 34.873337 ], [ -87.604049, 34.873303 ], [ -87.604033, 34.872316 ], [ -87.60415, 34.867502 ], [ -87.604218, 34.865687 ], [ -87.604409, 34.858537 ], [ -87.604018, 34.851336 ], [ -87.603716, 34.844829 ], [ -87.603696, 34.844307 ], [ -87.603673, 34.841884 ], [ -87.60372, 34.841003 ], [ -87.603879, 34.838423 ], [ -87.603888, 34.837682 ], [ -87.603889, 34.83763 ], [ -87.613127, 34.833938 ], [ -87.616451, 34.832699 ], [ -87.621041, 34.831431 ], [ -87.621056, 34.831526 ], [ -87.62112, 34.831925 ], [ -87.621603, 34.8352 ], [ -87.62158, 34.836087 ], [ -87.621383, 34.84329 ], [ -87.621359, 34.844438 ], [ -87.62129, 34.846387 ], [ -87.62119, 34.85053 ], [ -87.62144, 34.865379 ], [ -87.621765, 34.873444 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "045", "TRACTCE": "021102", "BLKGRPCE": "4", "AFFGEOID": "1500000US010450211024", "GEOID": "010450211024", "NAME": "4", "LSAD": "BG", "ALAND": 11360854, "AWATER": 0 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -85.719017, 31.297901 ], [ -85.715626, 31.305203 ], [ -85.714271, 31.307096 ], [ -85.69999, 31.307552 ], [ -85.697419, 31.307951 ], [ -85.675603, 31.31218 ], [ -85.672733, 31.312876 ], [ -85.672275, 31.311977 ], [ -85.67145, 31.310988 ], [ -85.670622, 31.309524 ], [ -85.670729, 31.307622 ], [ -85.669876, 31.30666 ], [ -85.669796, 31.306224 ], [ -85.670356, 31.306178 ], [ -85.671664, 31.305583 ], [ -85.67177, 31.305299 ], [ -85.671878, 31.302764 ], [ -85.671344, 31.302123 ], [ -85.668276, 31.302076 ], [ -85.66566, 31.30093 ], [ -85.665687, 31.30022 ], [ -85.669183, 31.297677 ], [ -85.668703, 31.295638 ], [ -85.671985, 31.29314 ], [ -85.677177, 31.288211 ], [ -85.678452, 31.286376 ], [ -85.679236, 31.28285 ], [ -85.679195, 31.281426 ], [ -85.676865, 31.281049 ], [ -85.674661, 31.28008 ], [ -85.674377, 31.27935 ], [ -85.675714, 31.276882 ], [ -85.677938, 31.275168 ], [ -85.680348, 31.276814 ], [ -85.684032, 31.278848 ], [ -85.684387, 31.279082 ], [ -85.692398, 31.283499 ], [ -85.705032, 31.289718 ], [ -85.706755, 31.290476 ], [ -85.718102, 31.295204 ], [ -85.719132, 31.29689 ], [ -85.719017, 31.297901 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "055", "TRACTCE": "001300", "BLKGRPCE": "3", "AFFGEOID": "1500000US010550013003", "GEOID": "010550013003", "NAME": "3", "LSAD": "BG", "ALAND": 1378742, "AWATER": 247387 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -86.000685, 34.00537 ], [ -85.998837, 34.009768 ], [ -85.998012, 34.010398 ], [ -85.987865, 34.005426 ], [ -85.986656, 34.004552 ], [ -85.985, 34.002659 ], [ -85.98851, 34.001502 ], [ -85.987567, 33.999488 ], [ -85.988666, 33.99913 ], [ -85.992568, 33.999131 ], [ -85.993144, 33.999714 ], [ -85.994876, 33.995153 ], [ -85.998823, 33.989548 ], [ -85.999925, 33.994237 ], [ -86.000616, 34.000028 ], [ -86.000685, 34.00537 ] ] ] } },
{ "type": "Feature", "properties": { "STATEFP": "01", "COUNTYFP": "089", "TRACTCE": "001700", "BLKGRPCE": "2", "AFFGEOID": "1500000US010890017002", "GEOID": "010890017002", "NAME": "2", "LSAD": "BG", "ALAND": 1040641, "AWATER": 0 }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -86.574172, 34.727375 ], [ -86.562684, 34.727131 ], [ -86.562797, 34.723865 ], [ -86.562957, 34.723168 ], [ -86.562336, 34.719766 ], [ -86.557381, 34.719143 ], [ -86.557352, 34.718322 ], [ -86.559921, 34.717363 ], [ -86.564827, 34.718513 ], [ -86.567582, 34.718565 ], [ -86.570572, 34.718577 ], [ -86.573618, 34.719377 ], [ -86.574172, 34.727375 ] ] ] } },

代碼:

val inputLocation = "data/polygon.json"
val allowTopologyInvalidGeometries = true 
val skipSyntaxInvalidGeometries = false
val spatialRDD = GeoJsonReader.readToGeometryRDD(sc, inputLocation, allowTopologyInvalidGeometries, skipSyntaxInvalidGeometries)

3.3.3 通過Shapefile創建

val shapefileInputLocation="data/myshapefile"
// System.setProperty("geospark.global.charset", "utf8")
val spatialRDD = ShapefileReader.readToGeometryRDD(sc, shapefileInputLocation)

⚠️注意:
.shp, .shx, .dbf 文件後綴必須是小寫. 並且 shapefile 文件必須命名爲myShapefile, 文件夾結構如下:

- shapefile1
- shapefile2
- myshapefile
    - myshapefile.shp
    - myshapefile.shx
    - myshapefile.dbf
    - myshapefile...
    - ...

如果出現亂碼問題可以在ShapefileReader.readToGeometryRDD方法調用之前設置編碼參數

System.setProperty("geospark.global.charset", "utf8")

4、座標系轉換

GeoSpark採用EPGS標準座標系,其座標系也可參考EPSG官網:https://epsg.io/
如果需要轉換成其他標準的座標系,可以通過以下方法

// 源標準
val sourceCrsCode = "epsg:4326"
// 目標標準
val targetCrsCode = "epsg:3857"
objectRDD.CRSTransform(sourceCrsCode, targetCrsCode)

參考

https://datasystemslab.github.io/GeoSpark/tutorial/
https://www.cnblogs.com/denny402/p/4967049.html

發佈了33 篇原創文章 · 獲贊 20 · 訪問量 10萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章