通過Device Plugins來使用NVIDIA GPU
1、簡介:
在Kubernetes 1.10版本中,默認並推薦使用DevicePlugins Feature Gate來發現和使用Nvidia GPU資源,拋棄了Kubernetes 1.8之前推薦使用的Accelerators Feature Gate的built-in方式,
繼承了Kubernetes的插件化的思想,把專業的事交給專業的廠商去做。本文將介紹Device Plugins的的原理和工作機制、Extended Resource、異常處理及改進、如何使用和調度GPU等內容。
2、部署步驟:
下載鏡像:
在能上網的linux操作系統並安裝有docker外網環境
# docker pull nvidia/k8s-device-plugin:beta
保存鏡像:
# docker save -o k8s-device-plugin-beta.tar docker.io/nvidia/k8s-device-plugin:beta
將其上傳待部署的環境:
docker load -i k8s-device-plugin-beta.tar
3、開始部署:
部署腳本:nvidia-device-plugin.yaml
#########################
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- image: nvidia/k8s-device-plugin-amd64:1.9 ####將其改爲下載的鏡像名稱: docker.io/nvidia/k8s-device-plugin:beta
name: nvidia-device-plugin-ctr
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
###############
部署:
# kubectl create -f nvidia-device-plugin.yaml
通過kubect查看部署情況:
# kubectl get pod -n kube-system
部署pod成功顯示如下:
nvidia-device-plugin-daemonset-rm822 1/1 Running 2 2d4h
查看是否部署成功:
# kubectl describe node
顯示如下說明部署成功:
備註:
問題處理:
如果pod部署失敗查看:
# kubectl describe pod nvidia-device-plugin-daemonset-rm822 -n kube-system
如果pod部署成功但是無法使用gpu通過如下查看:
# kubectl logs nvidia-device-plugin-daemonset-rm822 -n kube-system
可能是鏡像問題、或者是該節點沒有GPU或者前面部署問題