環境:
Spark版本: 2.4.3
Kubernetes版本:v1.16.2
問題:
提交spark-submit example.jar 以cluster方式到k8s集羣,driver-pod報錯如下:
19/11/06 07:06:54 INFO ExecutorPodsAllocator: Going to request 5 executors from Kubernetes.
19/11/06 07:06:54 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/11/06 07:06:54 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/11/06 07:06:54 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
原因:
查了下,發現是EKS安全補丁導致Apache Spark作業失敗並出現權限錯誤。
EKS security patches cause Apache Spark jobs to fail with permissions error
Spark社區patch:
https://github.com/apache/spark/pull/25641
https://github.com/apache/spark/pull/25640
解決:
方法1. 該版本已在spark-2.4.4-release及之後版本修復,測試環境的話,直接替換修復後的spark版本或cherry-pick相關commit即可解決;
方法2. 問題的根本原因是spark依賴的jar包問題,因此可將spark/jars下的三個jar包,替換爲4.4.0 及更高版本即可。
kubernetes-client-4.4.2.jar
kubernetes-model-4.4.2.jar
kubernetes-model-common-4.4.2.jar
jar包可通過maven倉庫獲取,如:
wget https://repo1.maven.org/maven2/io/fabric8/kubernetes-model/4.4.2/kubernetes-model-4.4.2.jar
補充:
1. 通過替換jar包方式,重新build並push鏡像後,重新spark-submit提交任務,發現仍報相同錯誤;
2. 原因應該是本地鏡像沒更新,仍然用的是舊的鏡像;
3. spark-submit 命令中添加: --conf spark.kubernetes.container.image.pullPolicy=Always,使用修改後新的image,問題解決。
至此,spark on kubernetes 官方demo完整提交命令如下:
spark-submit \
--master k8s://https://172.16.192.128:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=merrily01/repo:spark-2.4.3-image-merrily01 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image.pullPolicy=Always \
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar