Spark on Kubernetes提交任務報錯:Expected HTTP 101 response but was '403 Forbidden'

環境:

Spark版本: 2.4.3

Kubernetes版本:v1.16.2

問題:

提交spark-submit example.jar 以cluster方式到k8s集羣,driver-pod報錯如下:

19/11/06 07:06:54 INFO ExecutorPodsAllocator: Going to request 5 executors from Kubernetes.
19/11/06 07:06:54 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 -
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
	at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
19/11/06 07:06:54 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
19/11/06 07:06:54 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException:
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
	at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

原因:

查了下,發現是EKS安全補丁導致Apache Spark作業失敗並出現權限錯誤。

Stack Overflow

EKS security patches cause Apache Spark jobs to fail with permissions error

Spark社區patch:

https://github.com/apache/spark/pull/25641

https://github.com/apache/spark/pull/25640

解決:

方法1. 該版本已在spark-2.4.4-release及之後版本修復,測試環境的話,直接替換修復後的spark版本或cherry-pick相關commit即可解決;

方法2. 問題的根本原因是spark依賴的jar包問題,因此可將spark/jars下的三個jar包,替換爲4.4.0 及更高版本即可。

kubernetes-client-4.4.2.jar
kubernetes-model-4.4.2.jar
kubernetes-model-common-4.4.2.jar

jar包可通過maven倉庫獲取,如:

wget  https://repo1.maven.org/maven2/io/fabric8/kubernetes-model/4.4.2/kubernetes-model-4.4.2.jar

補充:

1. 通過替換jar包方式,重新build並push鏡像後,重新spark-submit提交任務,發現仍報相同錯誤;

2. 原因應該是本地鏡像沒更新,仍然用的是舊的鏡像;

3. spark-submit 命令中添加: --conf spark.kubernetes.container.image.pullPolicy=Always,使用修改後新的image,問題解決。

至此,spark on kubernetes 官方demo完整提交命令如下:

spark-submit \
    --master k8s://https://172.16.192.128:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=merrily01/repo:spark-2.4.3-image-merrily01 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章