文章目錄
1. 安裝 Nvidia 相關依賴
CentOS7
yum install -y nvidia-container-toolkit nvidia-container-runtime
2. 配置 dockerd 啓動命令
修改 Systemd Service 配置文件
mkdir -p /etc/systemd/system/docker.service.d
tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
systemctl daemon-reload
systemctl restart docker
修改 Docker Daemon configuration file
tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
pkill -SIGHUP dockerd
你也可以將 nvidia
配置爲默認的 runtime
,將下面的配置加入 /etc/docker/daemon.json
:
"default-runtime": "nvidia"
或者執行命令:
dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
重啓 dockerd
systemctl restart docker
3. 測試
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
4. 踩坑筆記
-
異常1:啓動 docker 之後,運行腳本,出現
open3d-python==0.3.0.0
依賴庫異常ImportError: libGL.so.1: cannot open shared object file: No such file or direc tory
。安裝
libgl1-mesa-glx
庫:apt-get update && apt-get install libgl1-mesa-glx
如果輸出以下
is not signed
錯誤,則需要首先將nvidia
的apt-get
軟件源移走,然後再更新Ign:10 https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg Reading package lists... Done W: GPG error: https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <[email protected]> E: The repository 'https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details.
執行:
mv /etc/apt/sources.list.d/cu͚da.list /etc/apt/sources.list.d/nvidia-ml.list /tmp