python2/python3 連接 hive/impala 的問題彙總

'TSocket' object has no attribute 'isOpen bug: https://github.com/cloudera/impyla/issues/268

'TSaslClientTransport' object has no attribute 'readAll': https://github.com/dropbox/PyHive/issues/151

解決方案:

https://github.com/dropbox/PyHive/commit/5322d8f1420b033ba7446449b5cca2cbf9f6fbc4

pip3 install git+https://github.com/cloudera/thrift_sasl

同時使用impala和pyHive請注意import順序

from pyhive import hive
from impala.dbapi import connect

python library 版本:

thrift                    0.11.0
thrift-sasl            0.3.0 (使用非release版本, 而是用上面的URL來安裝)
thriftpy                0.3.9
PyHive                0.6.1
impyla                 0.14.2.2

 

連Hive,

kerberos + LDAP 的權限體系 

from pyhive import hive
mcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,username='someone',password='password',auth='LDAP')
cs = mcon.cursor()
cs.execute('show database')
print(cs.fetchall())
cs.close()
mcon.close()

Kerberos權限體系

from pyhive import hive
import pandas as pd
hcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,auth ='KERBEROS',kerberos_service_name='hive')
hdata = pd.read_sql('show databases',hcon)
print(hdata)

連Impala

connect函數的源代碼: https://github.com/cloudera/impyla/blob/master/impala/dbapi.py

示例:

from impala.dbapi import connect
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='PLAIN')

mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='GSSAPI')

如果VM上有kerberos權限 那麼可以用 auth_mechanism='GSSAPI' 或 auth_mechanism='PLAIN'

如果美譽kerberos權限, 請用auth_mechanism='PLAIN'.

另附一句命令行的連接impalad的方法:

#kinit first
impala-shell  -u username  -k

https://github.com/cloudera/thrift_sasl/releases

 

Python3 連接impala正解

pip3 install pure-sasl==0.5.1
pip3 install thrift-sasl==0.2.1 --no-deps
pip3 install thrift==0.9.3
pip3 install impyla==0.14.1
pip3 install bitarray==0.8.3
pip3 install thriftpy==0.3.9

# TypeError: can't concat str to bytes

vi /opt/python3.5/lib/python3.5/site-packages/thrift_sasl/__init__.py

# 定位到錯誤的最後一條,在init.py第94行 (注意代碼的縮進)
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)

更改爲:
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
    body = body.encode() 
self._trans.write(header + body)

 

python2 裝 impyla前:

yum install -y gcc libffi-devel python-devel openssl-devel gcc-c++

在終端裏輸入下列命令

pip install pyhive[hive]

注意這裏要加上[hive]後綴,否則有些關聯的包裝不上,會導致報錯,我就遇到如下報錯信息:

ImportError: cannot import name TFrozenDict 錯誤

 

impyla 對 thrift 庫的要求是<=0.9.3, 而pyhive 0.6.1不兼容thrift 0.9.3 ,pyhive用的是0.13.0

impyla 0.14.2.2 has requirement thrift<=0.9.3, but you'll have thrift 0.13.0 which is incompatible.

所以 impyla和pyhive 不兼容

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章