[親身實踐]pyspark讀取hive中的表的兩種方式

1.window上無法連接到hive,linux上能連接到hive(推薦法二)
法一
(1)把/opt/soft/hive110/conf/hive-site.xml複製到/opt/soft/spark234/conf/hive-site.xml
hive-site.xml不用改變任何東西

在這裏插入圖片描述
在這裏插入圖片描述
(2)把mysql的驅動包複製到/opt/soft/spark234/jar下面
在這裏插入圖片描述
(3)開始啓動pyspark






[root@joy sbin]# ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /opt/soft/spark234/logs/spark-root-org.apache.spark.deploy.master.Master-1-joy.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/soft/spark234/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-joy.out
[root@joy sbin]# cd ../bin/
[root@joy bin]# ./pyspark
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
2020-12-24 11:20:23 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:20:24 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.4
      /_/

Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
SparkSession available as 'spark'.
>>> spark.sql("select * from dwd_events.dwd_events limit 3").show
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:21:29 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.user does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.metastore.local does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client does not exist
2020-12-24 11:21:30 WARN  HiveConf:2753 - HiveConf of name hive.server2.thrift.client.password does not exist
2020-12-24 11:21:31 ERROR ObjectStore:6684 - Version information found in metastore differs 1.1.0-cdh5.14.2 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
2020-12-24 11:21:32 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
<bound method DataFrame.show of DataFrame[eventid: string, starttime: bigint, city: string, province: string, country: string, lat: string, lng: string, userid: string, features: string]>
>>> spark.sql("select * from dwd_events.dwd_events limit 3").show()
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+
|   eventid| starttime|  city|province|  country|    lat|    lng|    userid|            features|
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+
|1000000778|1349348400|  Iasi|        |  Romania| 47.162| 27.587| 781622845|0, 0, 0, 0, 0, 0,...|
|1000001188|1350738000|Kassel|        |  Germany| 51.315|   9.48|4191368038|0, 0, 0, 0, 0, 0,...|
|1000003504|1353632400|Sydney|     NSW|Australia|-33.883|151.217|1445909915|2, 0, 2, 1, 1, 0,...|
+----------+----------+------+--------+---------+-------+-------+----------+--------------------+

(4)把/opt/soft/hive110/conf的conf全部複製到本地的D:\soft\spark-2.3.4-bin-hadoop2.6/conf下
修改爲本地的路徑

在這裏插入圖片描述
在這裏插入圖片描述
(5)把mysql的驅動包複製到D:\soft\spark-2.3.4-bin-hadoop2.6/jar下面
在這裏插入圖片描述
(6)關掉python,配置spark的環境變量
在這裏插入圖片描述
在這裏插入圖片描述
法二







if __name__ == '__main__':
    spark = SparkSession.builder.appName("test")\
    .master("local[*]")\
    .enableHiveSupport().getOrCreate()
    df = spark.sql("select * from dws_events.dws_temp_train limit 3")
    df.show()


if __name__ == '__main__':
    spark = SparkSession.builder.appName("test")\
     .master("local[*]")\
     .config("hive.metastore.uris","thrift://192.168.72.170:9083")\        #加上這個hive.metastore.uris,thrift://
     .enableHiveSupport().getOrCreate()
    df = spark.sql("select * from dws_events.dws_temp_train limit 3")
    df.show()

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章