'TSocket' object has no attribute 'isOpen bug: https://github.com/cloudera/impyla/issues/268
'TSaslClientTransport' object has no attribute 'readAll': https://github.com/dropbox/PyHive/issues/151
解決方案:
https://github.com/dropbox/PyHive/commit/5322d8f1420b033ba7446449b5cca2cbf9f6fbc4
pip3 install git+https://github.com/cloudera/thrift_sasl
同時使用impala和pyHive請注意import順序
from pyhive import hive
from impala.dbapi import connect
python library 版本:
thrift 0.11.0
thrift-sasl 0.3.0 (使用非release版本, 而是用上面的URL來安裝)
thriftpy 0.3.9
PyHive 0.6.1
impyla 0.14.2.2
連Hive,
kerberos + LDAP 的權限體系
from pyhive import hive
mcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,username='someone',password='password',auth='LDAP')
cs = mcon.cursor()
cs.execute('show database')
print(cs.fetchall())
cs.close()
mcon.close()
Kerberos權限體系
from pyhive import hive
import pandas as pd
hcon=hive.connect(host='bd-master01-pe2.f.cn',port=10000,auth ='KERBEROS',kerberos_service_name='hive')
hdata = pd.read_sql('show databases',hcon)
print(hdata)
連Impala
connect函數的源代碼: https://github.com/cloudera/impyla/blob/master/impala/dbapi.py
示例:
from impala.dbapi import connect
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='PLAIN')
mcon=connect(host='bd-slave01-pe2.f-pro.cn',port=21050,user='username',password='password',auth_mechanism='GSSAPI')
如果VM上有kerberos權限 那麼可以用 auth_mechanism='GSSAPI' 或 auth_mechanism='PLAIN'
如果美譽kerberos權限, 請用auth_mechanism='PLAIN'.
另附一句命令行的連接impalad的方法:
#kinit first
impala-shell -u username -k
https://github.com/cloudera/thrift_sasl/releases
Python3 連接impala正解
pip3 install pure-sasl==0.5.1
pip3 install thrift-sasl==0.2.1 --no-deps
pip3 install thrift==0.9.3
pip3 install impyla==0.14.1
pip3 install bitarray==0.8.3
pip3 install thriftpy==0.3.9
# TypeError: can't concat str to bytes
vi /opt/python3.5/lib/python3.5/site-packages/thrift_sasl/__init__.py
# 定位到錯誤的最後一條,在init.py第94行 (注意代碼的縮進)
header = struct.pack(">BI", status, len(body))
self._trans.write(header + body)
更改爲:
header = struct.pack(">BI", status, len(body))
if(type(body) is str):
body = body.encode()
self._trans.write(header + body)
python2 裝 impyla前:
yum install -y gcc libffi-devel python-devel openssl-devel gcc-c++
在終端裏輸入下列命令
pip install pyhive[hive]
注意這裏要加上[hive]後綴,否則有些關聯的包裝不上,會導致報錯,我就遇到如下報錯信息:
ImportError: cannot import name TFrozenDict 錯誤
impyla 對 thrift 庫的要求是<=0.9.3, 而pyhive 0.6.1不兼容thrift 0.9.3 ,pyhive用的是0.13.0
impyla 0.14.2.2 has requirement thrift<=0.9.3, but you'll have thrift 0.13.0 which is incompatible.