python3操作hive

原創

2020-02-22 19:02

1. 前言

目前python3連接hive的方法主要是使用cloudera開發的impyla包，但是要安裝impyla也不是那麼容易的事情，因爲impyla要使用系統底層模塊，所以就要先安裝對應的模塊，而不僅僅是安裝impyla就可以了。如果是想hdfs-server就好了，一個http就能搞定。

在過大網友的無私奉獻，以及Google和Baidu的幫助下，終於解決了python3連接hive的問題。
需要注意的是，centos和Ubuntu的系統底層模塊安裝和更新是不一樣的，這個需要注意。

後來cloudera開發了一個新的python包叫ibis,也是用來操作hive數據的，看其博客說得神乎其神，似乎是impyla的改進升級版，將來要用ibis替換掉impyla，也使用了一下，還是可以的。

下面先說impyla的安裝和使用，再說ibis的使用。

2. centos安裝

先安裝和更新cyrus，sasl，和gcc。這些方法在centos容器中也是一樣可行的。

sudo yum install cyrus-sasl-devel
sudo yum install gcc-c++

安裝python模塊，注意，thrift-sasl版本和hive版本要對應，如果不知道多試幾次就好了。

pip install thrift-sasl==0.2.1
pip install sasl
pip install impyla

3. Ubuntu安裝

我的Ubuntu使用的是win10的linux子系統，用起來挺方便的，不用裝虛擬機，啓動快，就像容器一樣。
下載過程中，如果網速慢，可以切換國內鏡像。

sudo apt-get install libsasl2-dev
sudo apt-get install python3-dev
sudo apt-get -y install build-essential

安裝python模塊，同centos。

pip install thrift-sasl==0.2.1
pip install sasl
pip install impyla

4. 測試impyla

測試連接hive：

from impala.dbapi import connect

# 需要注意的是這裏的auth_mechanism必須有，但database不必須
conn = connect(host='xxx.xxx.xxx.xxx', port=10000, database='default', auth_mechanism='PLAIN')
cur = conn.cursor()

cur.execute('SHOW DATABASES')
print(cur.fetchall())

cur.execute('SHOW Tables')
print(cur.fetchall())

cur.close()
conn.close()

5. ibis安裝

如果裝好了impyla，那麼ibis就很好安裝了，就是簡單的pip命令就可以，其中ibis會引用impyla。

pip install hdfs
pip install ibis-framework

6. ibis使用

下面是簡單的使用案例，基本覆蓋常用的方法。

# 1.查詢hdfs數據
hdfs = ibis.hdfs_connect(host='xxx.xxx.xxx.xxx', port=50070)
hdfs.ls('/')
hdfs.ls('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag')
hdfs.get('/apps/hive/warehouse/ai.db/tmp_ys_sku_season_tag/000000_0', 'parquet_dir')

# 2.查詢數據到python dataframe
from ibis.impala.api import connect
conn= connect('xxx.xxx.xxx.xxx', 10000, auth_mechanism='PLAIN',database='ai')
conn.exists_table('helloworld')

# 執行SQL
sql='set mapreduce.job.queuename=ai'
conn.raw_sql(sql)

# 將SQL結果導出到python dataframe
requete = conn.sql('select * from ai.da_aipurchase_dailysale_for_ema_predict')
df = requete.execute(limit=None)

7. 參考：

振裕

發佈了61 篇原創文章 · 獲贊 163 · 訪問量 42萬+

私信關注

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

python3操作hive

1. 前言

2. centos安裝

3. Ubuntu安裝

4. 測試impyla

5. ibis安裝

6. ibis使用

7. 參考：

Win10 LTSC 2019 安裝後的一些步驟

Python 潮流週刊#52：Python 處理 Excel 的資源

Flask快速搭建簡單服務器

py-charm延長試用期限

python中的多線程和多進程

python繪製動態模擬圖

python3操作hive

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結