Flink 使用介紹相關文檔目錄
背景
Flink 1.16.0整合了SQL Gateway,提供了多種客戶端遠程併發執行SQL的能力。Flink終於擁有了類似於Spark Thrift server的能力。
本篇爲大家帶來Flink SQL Gateway的部署、配置和使用。
作者使用的環境信息:
- Flink 1.16.0
- Hadoop 3.1.1
- Hive 3.1.2
官網關於SQL Gateway的講解參見https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/sql-gateway/overview/。
部署服務
SQL Gateway提交作業的執行後端可以是Flink的standalone集羣或者是Yarn集羣。
Standalone 集羣
部署standalone集羣可參見官網https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/resource-providers/standalone/overview/。
簡單來說有如下步驟:
- 建立集羣主節點到各個子節點的免密。
- 解壓Flink 1.16.0安裝包到主節點。
- 編輯
$FLINK_HOME/conf/masters
和$FLINK_HOME/conf/workers
文件,分別填寫job manager和task manager的ip或者hostname,一行填寫一個。通過這種方式手工指定Flink結羣各角色在集羣中的分佈情況。 - 切換到需要運行Flink集羣的用戶,在主節點執行
$FLINK_HOME/bin/start-cluster.sh
,啓動集羣。
關閉standalone集羣可以執行
$FLINK_HOME/bin/stop-cluster.sh
。
集羣成功啓動之後可以接着啓動sql-client。執行:
$FLINK_HOME/bin/sql-gateway.sh start -Dsql-gateway.endpoint.rest.address=xxx.xxx.xxx.xxx
其中-Dsql-gateway.endpoint.rest.address
用來指定SQL Gateway服務綁定的地址。注意如果指定爲localhost則SQL Gateway只能通過本機訪問,無法對外提供服務。SQL Gateway服務日誌文件在$FLINK_HOME/log
目錄中。
可以執行$FLINK_HOME/bin/sql-gateway.sh -h
獲取sql-gateway.sh
命令更多的使用方式:
Usage: sql-gateway.sh [start|start-foreground|stop|stop-all] [args]
commands:
start - Run a SQL Gateway as a daemon
start-foreground - Run a SQL Gateway as a console application
stop - Stop the SQL Gateway daemon
stop-all - Stop all the SQL Gateway daemons
-h | --help - Show this help message
建議調試運行的時候使用start-foreground
前臺運行,方便查看運行日誌和故障重啓服務。
Yarn 集羣
將Flink 1.16.0安裝包解壓在Yarn集羣任意節點,然後切換Flink用戶執行:
export HADOOP_CLASSPATH=`hadoop classpath`
$FLINK_HOME/bin/yarn-session.sh -d -s 2 -jm 2048 -tm 2048
啓動Flink Yarn集羣。yarn-session.sh
後面的參數按照實際情況修改。最後需要在Yarn管理頁面的RUNNING Applications頁面檢查Flink Yarn集羣是否正常啓動。
要求Flink用戶必須擁有提交Yarn作業的權限。如果沒有,需要切換用戶或者使用Ranger賦權。
Yarn啓動成功之後接着啓動SQL Gateway。務必使用和啓動yarn-session相同的用戶來啓動SQL Gateway。否則SQL Gateway無法找到yarn application id。儘管能正常啓動,但是執行SQL提交任務的時候會失敗。
SQL Gateway正常啓動後應能看到類似如下的日誌:
INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli [] - Found Yarn properties file under /tmp/.yarn-properties-flink
Yarn properties file命名格式爲.yarn-properties-{用戶名}
。本文作者使用flink用戶,所以文件名爲.yarn-properties-flink
。如果有這一行日誌,說明SQL Gateway找到了Flink Yarn集羣。
在後面使用過程中,作業成功提交之後,日誌中可以看到類似如下內容:
INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface xxx.xxx.xxx.xxx:40494 of application 'application_1670204805747_0006'.
INFO org.apache.flink.client.program.rest.RestClusterClient [] - Submitting job 'collect' (8bbea014547408c4716a483a701af8ab).
INFO org.apache.flink.client.program.rest.RestClusterClient [] - Successfully submitted job 'collect' (8bbea014547408c4716a483a701af8ab) to 'http://ip:40494'.
SQL Gateway能夠找到Flink Yarn集羣對應的application id,並且將作業提交給這個集羣。
配置項
可以通過如下方式動態指定SQL Gateway的配置項
$FLINK_HOME/bin/sql-gateway.sh -Dkey=value
官網給出的配置項列表如下:
Key | Default | Type | Description |
---|---|---|---|
sql-gateway.session.check-interval | 1 min | Duration | The check interval for idle session timeout, which can be disabled by setting to zero or negative value. |
sql-gateway.session.idle-timeout | 10 min | Duration | Timeout interval for closing the session when the session hasn't been accessed during the interval. If setting to zero or negative value, the session will not be closed. |
sql-gateway.session.max-num | 1000000 | Integer | The maximum number of the active session for sql gateway service. |
sql-gateway.worker.keepalive-time | 5 min | Duration | Keepalive time for an idle worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval. |
sql-gateway.worker.threads.max | 500 | Integer | The maximum number of worker threads for sql gateway service. |
sql-gateway.worker.threads.min | 5 | Integer | The minimum number of worker threads for sql gateway service. |
- sql-gateway.session.check-interval: 多長時間檢查一次session是否超時。配置爲0或者負數可以禁止這個行爲。
- sql-gateway.session.idle-timeout: session的超時時間,超時的session會被自動關閉。同樣配置爲0或者負數可以禁止這個行爲。
- sql-gateway.session.max-num: 活躍session數量的最大值。
- sql-gateway.worker.keepalive-time: 空閒的worker線程保活時間。當實際worker線程數超過最小worker線程數之時,多出來的線程會在這個時間之後被kill掉。
- sql-gateway.worker.threads.max: 最大worker線程數。
- sql-gateway.worker.threads.min: 最小worker線程數。
使用
Flink SQL Gateway支持Rest API模式和hiveserver2模式。下面分別介紹它們的使用方式。
Rest API
前面部署過程中SQL Gateway默認是以Rest API的形式提供服務,這裏直接講解使用方式。假設在我們的測試環境SQL Gateway運行的IP和端口爲sql-gateway-ip:8083
。
首先執行:
curl --request POST http://sql-gateway-ip:8083/v1/sessions
創建並獲取到一個sessionHandle
。示例返回如下:
{"sessionHandle":"2f35eb7e-97f0-40a4-b22d-f49c3a8fe7ef"}
然後以執行SQL SELECT 1
語句爲例。格式爲:
curl --request POST http://sql-gateway-ip:8083/v1/sessions/${sessionHandle}/statements/ --data '{"statement": "SELECT 1"}'
我們替換sessionHandle
爲上面返回的sessionHandle
,實際命令如下:
curl --request POST http://sql-gateway-ip:8083/v1/sessions/2f35eb7e-97f0-40a4-b22d-f49c3a8fe7ef/statements/ --data '{"statement": "SELECT 1"}'
得到的返回值包含一個operationHandle
,如下所示:
{"operationHandle":"7dcb0266-ed64-423d-a984-310dc6398e5e"}
最後我們使用sessionHandle
和operationHandle
來獲取運行結果。格式爲:
curl --request GET http://sql-gateway-ip:8083/v1/sessions/${sessionHandle}/operations/${operationHandle}/result/0
其中最後一個0
爲token。可以理解爲查詢結果是分頁(分批)返回,token爲頁碼。
替換sessionHandle
和operationHandle
爲前面獲取的真實值,實際命令如下:
curl --request GET http://localhost:8083/v1/sessions/2f35eb7e-97f0-40a4-b22d-f49c3a8fe7ef/operations/7dcb0266-ed64-423d-a984-310dc6398e5e/result/0
得到結果如下:
{"results":{"columns":[{"name":"EXPR$0","logicalType":{"type":"INTEGER","nullable":false},"comment":null}],"data":[{"kind":"INSERT","fields":[1]}]},"resultType":"PAYLOAD","nextResultUri":"/v1/sessions/2f35eb7e-97f0-40a4-b22d-f49c3a8fe7ef/operations/7dcb0266-ed64-423d-a984-310dc6398e5e/result/1"}
我們從result -> data -> fields 可以得到SELECT 1
的運行結果爲1。
前面提到token的作用類似於分頁。上面JSON的nextResultUri
告訴我們獲取下一批結果的URL。發現token從0變成了1。我們訪問這個nextResultUri
:
curl --request GET http://localhost:8083/v1/sessions/2f35eb7e-97f0-40a4-b22d-f49c3a8fe7ef/operations/7dcb0266-ed64-423d-a984-310dc6398e5e/result/1
返回如下內容:
{"results":{"columns":[{"name":"EXPR$0","logicalType":{"type":"INTEGER","nullable":false},"comment":null}],"data":[]},"resultType":"EOS","nextResultUri":null}
可以看到resultType
爲EOS
,表示所有結果都已經獲取到了。此時nextResultUri
爲null,沒有下一頁結果。
hiveserver2
除了上述的Rest API之外,SQL Gateway還支持hiveserver2模式。
官網SQL Gateway hiveserver2模式相關內容參見https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/table/hive-compatibility/hiveserver2/。
要支持hiveserver2模式要求配置相關的依賴。首先需要添加flink-connector-hive_2.12-1.16.0.jar
到Flink的lib
目錄中。jar下載地址爲:https://repo1.maven.org/maven2/org/apache/flink/flink-connector-hive_2.12/1.16.0/flink-connector-hive_2.12-1.16.0.jar
除此之外還需要Hive的相關依賴:
- hive-common.jar
- hive-service-rpc.jar
- hive-exec.jar
- libthrift.jar
- libfb303.jar
- antlr-runtime.jar
這些包的版本需要和集羣內的Hive保持一致,建議從集羣Hive安裝位置的lib
目錄直接複製。
以hiveserver2模式啓動SQL Gateway的命令爲:
$FLINK_HOME/bin/sql-gateway.sh start -Dsql-gateway.endpoint.rest.address=xxx.xxx.xxx.xxx -Dsql-gateway.endpoint.type=hiveserver2 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/path/to/hive/conf -Dsql-gateway.endpoint.hiveserver2.thrift.port=10000
其參數的含義爲:
- -Dsql-gateway.endpoint.rest.address: SQL Gateway服務綁定地址。
- -Dsql-gateway.endpoint.type: 指定endpoint類型。默認值爲
rest
即Rest API。使用hiveserver2
類型必須顯式配置。 - -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir:
hive-site.xml
配置文件所在目錄。方便連接到Hive metastore,獲取表的元數據信息。 - -Dsql-gateway.endpoint.hiveserver2.thrift.port: hiveserver2模式SQL Gateway使用的端口。相當於Hive thriftserver的端口。
除了上面列舉出的之外,hiveserver2模式還有很多配置項,參見https://nightlies.apache.org/flink/flink-docs-release-1.16/zh/docs/dev/table/hive-compatibility/hiveserver2/#endpoint-options。這裏不再一一列出。
現在啓動SQL Gateway可能出現下面的錯誤:
org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'hive' that implements 'org.apache.flink.table.planner.delegation.DialectFactory' in the classpath.
Available factory identifiers are:
Note: if you want to use Hive dialect, please first move the jar `flink-table-planner_2.12` located in `FLINK_HOME/opt` to `FLINK_HOME/lib` and then move out the jar `flink-table-planner-loader` from `FLINK_HOME/lib`.
at org.apache.flink.table.factories.FactoryUtil.discoverFactory(FactoryUtil.java:545) ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
at org.apache.flink.table.planner.delegation.PlannerBase.getDialectFactory(PlannerBase.scala:161) ~[?:?]
at org.apache.flink.table.planner.delegation.PlannerBase.getParser(PlannerBase.scala:171) ~[?:?]
at org.apache.flink.table.api.internal.TableEnvironmentImpl.getParser(TableEnvironmentImpl.java:1694) ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
at org.apache.flink.table.api.internal.TableEnvironmentImpl.<init>(TableEnvironmentImpl.java:240) ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
at org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.<init>(AbstractStreamTableEnvironmentImpl.java:89) ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
at org.apache.flink.table.api.bridge.java.internal.StreamTableEnvironmentImpl.<init>(StreamTableEnvironmentImpl.java:84) ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.context.SessionContext.createStreamTableEnvironment(SessionContext.java:309) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.context.SessionContext.createTableEnvironment(SessionContext.java:269) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.operation.OperationExecutor.getTableEnvironment(OperationExecutor.java:218) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:89) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$0(SqlGatewayServiceImpl.java:182) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:111) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:239) ~[flink-sql-gateway-1.16.0.jar:1.16.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2022-12-08 17:42:03,007 INFO org.apache.flink.table.catalog.hive.HiveCatalog [] - Created HiveCatalog 'hive'
2022-12-08 17:42:03,008 INFO org.apache.hadoop.hive.metastore.HiveMetaStoreClient [] - Trying to connect to metastore with URI thrift://xxx.xxx.xxx.xxx:9083
2022-12-08 17:42:03,008 INFO org.apache.hadoop.hive.metastore.HiveMetaStoreClient [] - Opened a connection to metastore, current connections: 3
2022-12-08 17:42:03,009 INFO org.apache.hadoop.hive.metastore.HiveMetaStoreClient [] - Connected to metastore.
2022-12-08 17:42:03,010 INFO org.apache.hadoop.hive.metastore.RetryingMetaStoreClient [] - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=yarn (auth:SIMPLE) retries=24 delay=5 lifetime=0
2022-12-08 17:42:03,010 INFO org.apache.flink.table.catalog.hive.HiveCatalog [] - Connected to Hive metastore
2022-12-08 17:42:03,026 INFO org.apache.flink.table.module.ModuleManager [] - Loaded module 'hive' from class org.apache.flink.table.module.hive.HiveModule
2022-12-08 17:42:03,030 INFO org.apache.flink.table.gateway.service.session.SessionManager [] - Session f3f6f339-f5b0-425f-94ad-3e9ad11981c1 is opened, and the number of current sessions is 3.
2022-12-08 17:42:03,043 ERROR org.apache.flink.table.gateway.service.operation.OperationManager [] - Failed to execute the operation 7922e186-8110-4bb8-b93d-db17d88eac48.
org.apache.flink.table.api.ValidationException: Could not find any factory for identifier 'hive' that implements 'org.apache.flink.table.planner.delegation.DialectFactory' in the classpath.
如果遇到這個錯誤,說明Flink沒有發現Hive方言,需要將Flink opt
目錄中的flink-table-planner_2.12-1.16.0.jar
到lib
目錄,然後將lib
目錄中的flink-table-planner-loader-1.16.0.jar
移除掉。
到目前爲止Flink的lib
目錄內容爲:
antlr-runtime-3.5.2.jar
flink-cep-1.16.0.jar
flink-connector-files-1.16.0.jar
flink-connector-hive_2.12-1.16.0.jar
flink-csv-1.16.0.jar
flink-dist-1.16.0.jar
flink-json-1.16.0.jar
flink-scala_2.12-1.16.0.jar
flink-shaded-zookeeper-3.5.9.jar
flink-table-api-java-uber-1.16.0.jar
flink-table-planner_2.12-1.16.0.jar
flink-table-runtime-1.16.0.jar
hive-common-3.1.0.3.0.1.0-187.jar
hive-exec-3.1.0.3.0.1.0-187.jar
hive-service-rpc-3.1.0.3.0.1.0-187.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
log4j-1.2-api-2.17.1.jar
log4j-api-2.17.1.jar
log4j-core-2.17.1.jar
log4j-slf4j-impl-2.17.1.jar
此時已經可以正常使用SQL Gateway。但是使用Flink查詢Hive表仍會出現缺少依賴問題。還需要添加Hadoop相關依賴:
- hadoop-common.jar
- hadoop-mapreduce-client-common.jar
- hadoop-mapreduce-client-core.jar
- hadoop-mapreduce-client-jobclient.jar
最終lib
目錄內容爲:
antlr-runtime-3.5.2.jar
flink-cep-1.16.0.jar
flink-connector-files-1.16.0.jar
flink-connector-hive_2.12-1.16.0.jar
flink-csv-1.16.0.jar
flink-dist-1.16.0.jar
flink-json-1.16.0.jar
flink-scala_2.12-1.16.0.jar
flink-shaded-zookeeper-3.5.9.jar
flink-table-api-java-uber-1.16.0.jar
flink-table-planner_2.12-1.16.0.jar
flink-table-runtime-1.16.0.jar
hadoop-common-3.1.1.3.0.1.0-187.jar
hadoop-mapreduce-client-common-3.1.1.3.0.1.0-187.jar
hadoop-mapreduce-client-core-3.1.1.3.0.1.0-187.jar
hadoop-mapreduce-client-jobclient-3.1.1.3.0.1.0-187.jar
hive-common-3.1.0.3.0.1.0-187.jar
hive-exec-3.1.0.3.0.1.0-187.jar
hive-service-rpc-3.1.0.3.0.1.0-187.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
log4j-1.2-api-2.17.1.jar
log4j-api-2.17.1.jar
log4j-core-2.17.1.jar
log4j-slf4j-impl-2.17.1.jar
最後再次嘗試啓動,筆者測試能夠啓動成功。
接下來的工作是使用JDBC連接SQL Gateway。需要注意的是連接URL必須添加auth=noSasl
屬性。比如:
jdbc:hive2://sql-gateway-ip:10000/default;auth=noSasl
否則SQL Gateway會出現下面錯誤:
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
接下來分別介紹使用DBeaver,Java代碼和Beeline方式連接Flink SQL Gateway。
DBeaver
依次點擊 新建連接 -> Apache Hive(可以搜索出來)。在主要 -> 一般窗格中填寫主機端口號和數據庫(可不寫)。然後在驅動屬性tab頁,添加名稱爲auth
的用戶屬性,值爲noSasl
。點擊完成按鈕,連接創建完畢,可以點擊工具欄SQL按鈕打開SQL窗口編寫SQL。
注意:在創建連接的最後異步需要從GitHub上下載Hive JDBC驅動。可能會因爲網絡問題下載超時,在DBeaver中點擊重試也沒辦法解決。我們可以手動下載。方法爲在連接到數據庫嚮導中點擊編輯驅動,點擊庫這個tab頁。可以看到驅動的下載鏈接。將其複製到瀏覽器下載。然後我們進入
C:\Users\xxx\AppData\Roaming\DBeaverData\drivers\remote\
目錄逐層向下查找驅動類的存放路徑,例如C:\Users\xxx\AppData\Roaming\DBeaverData\drivers\remote\timveil\hive-jdbc-uber-jar\releases\download\v1.9-2.6.5
。將瀏覽器下載好的驅動放置到這個目錄(如果目錄中有DBeaver下載了一半失敗的驅動文件,需要先刪除掉)。點擊在連接到數據庫嚮導的完成按鈕關閉嚮導就可以了。
使用Java代碼
Maven需要添加如下依賴:
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>3.1.2</version>
</dependency>
然後編寫Java代碼:
public static void main(String[] args) throws Exception {
Class.forName("org.apache.hive.jdbc.HiveDriver");
try (
// Please replace the JDBC URI with your actual host, port and database.
Connection connection = DriverManager.getConnection("jdbc:hive2://sql-gateway-ip:10000/default;auth=noSasl");
Statement statement = connection.createStatement()) {
statement.execute("select * from some_table");
ResultSet resultSet = statement.getResultSet();
while (resultSet.next()) {
System.out.println(resultSet.getString(1));
}
}
}
和傳統JDBC使用方式沒有任何區別。需要注意Hive驅動的類名爲org.apache.hive.jdbc.HiveDriver
。
使用 Beeline
啓動beeline並使用如下命令連接SQL Gateway:
./beeline
!connect jdbc:hive2://sql-gateway-ip:10000/default;auth=noSasl
接下來會詢問使用的用戶名和密碼。由於當前版本不支持認證,可直接回車略過。連接成功之後可以像使用Hive一樣使用SQL語句。
上面是官網給出的使用beeline工具的方式。但本人在驗證的過程中遇到了如下錯誤:
2022-12-09 10:24:28,600 ERROR org.apache.flink.table.endpoint.hive.HiveServer2Endpoint [] - Failed to GetInfo.
java.lang.UnsupportedOperationException: Unrecognized TGetInfoType value: CLI_ODBC_KEYWORDS.
at org.apache.flink.table.endpoint.hive.HiveServer2Endpoint.GetInfo(HiveServer2Endpoint.java:371) [flink-connector-hive_2.12-1.16.0.jar:1.16.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo.getResult(TCLIService.java:1537) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo.getResult(TCLIService.java:1522) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2022-12-09 10:24:28,600 ERROR org.apache.thrift.server.TThreadPoolServer [] - Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Required field 'infoValue' is unset! Struct:TGetInfoResp(status:TStatus(statusCode:ERROR_STATUS, infoMessages:[*java.lang.UnsupportedOperationException:Unrecognized TGetInfoType value: CLI_ODBC_KEYWORDS.:9:8, org.apache.flink.table.endpoint.hive.HiveServer2Endpoint:GetInfo:HiveServer2Endpoint.java:371, org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo:getResult:TCLIService.java:1537, org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetInfo:getResult:TCLIService.java:1522, org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39, org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39, org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286, java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142, java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617, java.lang.Thread:run:Thread.java:745], errorMessage:Unrecognized TGetInfoType value: CLI_ODBC_KEYWORDS.), infoValue:null)
at org.apache.hive.service.rpc.thrift.TGetInfoResp.validate(TGetInfoResp.java:379) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.hive.service.rpc.thrift.TCLIService$GetInfo_result.validate(TCLIService.java:5228) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.hive.service.rpc.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:5285) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.hive.service.rpc.thrift.TCLIService$GetInfo_result$GetInfo_resultStandardScheme.write(TCLIService.java:5254) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.hive.service.rpc.thrift.TCLIService$GetInfo_result.write(TCLIService.java:5205) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
2022-12-09 10:24:28,600 WARN org.apache.thrift.transport.TIOStreamTransport [] - Error closing output stream.
java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) ~[?:1.8.0_121]
at java.net.SocketOutputStream.write(SocketOutputStream.java:155) ~[?:1.8.0_121]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[?:1.8.0_121]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[?:1.8.0_121]
at java.io.FilterOutputStream.close(FilterOutputStream.java:158) ~[?:1.8.0_121]
at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.transport.TSocket.close(TSocket.java:235) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:303) [hive-exec-3.1.0.3.0.1.0-187.jar:3.1.0.3.0.1.0-187]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
調查這個錯誤發現是Flnk 1.16.0版本的bug。這個問題鏈接爲FLINK-29839。社區已經在1.16.1版本中解決。
本博客爲作者原創,歡迎大家參與討論和批評指正。如需轉載請註明出處。