基於python3.8.3鏡像創建深度學習的docker環境(tensorflow/pytorch)

基於鏡像 python:3.8.3-buster 創建tensorflow的docker環境。

我的系統環境是 Ubuntu18.04, Nvidia驅動版本是最新版 440.64.00

我爲了測試方便, 直接在docker容器中配置, 而沒有采用 Dockerfile 的方式打包鏡像。

1 基礎配置

拉取並運行鏡像 (docker源儘量配置國內的通道, 快很多)

# 拉取鏡像
docker pull python:3.8.3-buster

# 運行並進入鏡像
docker run -it --name=py-3.8.3 python:3.8.3-buster bash

配置 apt 源、時區 等

# apt 源  使用清華源
echo -e "deb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-updates main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian/ buster-backports main contrib non-free\ndeb https://mirrors.tuna.tsinghua.edu.cn/debian-security buster/updates main contrib non-free" > /etc/apt/sources.list
apt update 

# 配置時區
ln -fs /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
dpkg-reconfigure --frontend noninteractive tzdata

# openssl 設置, 有些場景使用 TLSv1.2 會報錯, 比如 pyodbc 連接 mssql
sed -i 's/TLSv1.[0-9]/TLSv1.0/g' /etc/ssl/openssl.cnf

# 安裝vim curl 
apt install -y vim curl ca-certificates

配置 zsh 的默認環境和主題

# 安裝 zsh
apt install -y zsh 
# 安裝 oh-my-zsh
cd /usr/share/zsh
curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh > oh-my-zsh_install.sh    # 有可能下載有問題, 可以通過其他方式下載然後放到這個目錄
vim oh-my-zsh_install.sh    # 修改安裝目錄 改爲以下這一行
ZSH=/usr/share/zsh/oh-my-zsh

# 執行 oh-my-zsh 安裝腳本
bash oh-my-zsh_install.sh

# 下載 oh-my-zsh 主題
git clone --depth=1 https://github.com/romkatv/powerlevel10k.git /usr/share/zsh/oh-my-zsh/themes/powerlevel10k
# 下載 兩個插件
git clone https://github.com/zsh-users/zsh-syntax-highlighting.git /usr/share/zsh/oh-my-zsh/plugins/zsh-syntax-highlighting
git clone https://github.com/zsh-users/zsh-autosuggestions.git /usr/share/zsh/oh-my-zsh/plugins/zsh-autosuggestions

# 將 zsh.zshrc 文件添加到 /etc/zsh 目錄中, 這個文件是我創建的 zshrc 的默認配置, 拷貝其他主機的然後修改得到, 具體內容放在本文末. 

# 編輯 /etc/profile 
vim /etc/profile    # 添加以下內容 主要使在引用該文件時自動加載zsh配置
if [[ "$SHELL" =~ "zsh" ]] && [ -f /etc/zsh/zshrc ]; then 
    . /etc/zsh/zshrc
fi

# 編輯 /etc/zsh/zshrc
vim /etc/zsh/zshrc    # 添加以下內容 主要用來自動加載zsh配置
if [ -f /etc/zsh/zsh.zshrc ]; then 
    . /etc/zsh/zsh.zshrc
fi

# 編輯 /etc/zsh/zshenv
vim /etc/zsh/zshenv    # 添加以下內容 避免首次進入zsh是進入zsh設置嚮導
if [[ ! -f ~/.zshrc ]]; then 
    touch ~/.zshrc
fi

配置 ssh, 以後使用ssh連接容器

# 安裝 ssh 和 supervisor 等工具
apt install -y net-tools sudo openssh-client openssh-server supervisor

# 修改 supervisor 配置, 添加 sshd 服務
vim /etc/supervisor/supervisord.conf    # 按以下修改
[supervisord]
nodaemon=true
[program:sshd]
command=/usr/sbin/sshd -D

# 創建 /run/sshd
mkdir -p /run/sshd

# 修改 sudoers, 添加users用戶組 使其可以免密sudo. 這個是爲了我把這個鏡像給其他同事建容器使用
vim /etc/sudoers    
%users         ALL=(ALL)       NOPASSWD: ALL

保存一下鏡像

# 退出容器 保存容器爲鏡像
docker commit py-3.8.3 python:3.8.3-ssh

2 如何配置cuda支持

根據 nvidia docker 的說明, 系統驅動配置好, 然後安裝 nvidia-container-toolkit 之後, 只需要在啓動 docker 容器時 添加 --gpus all 即可讓容器支持調用顯卡. 但是我在使用上述鏡像測試 tensorflow 時始終無法成功, 報錯 failed call to cuInit: CUDA_ERROR_UNKNOWN, UNKNOWN 這個真的不好定位問題. 好在經過探索(探索過程就不在這裏說了), 問題可以解決.

我的解決方式是, 使用 dockerfile 在以上的鏡像基礎上添加環境(目前不清楚dockerfile ENV命令具體帶來的影響是啥, 參考的 nvidia 官方 cuda-base 鏡像的 dockerfile

Dockerfile

FROM python:3.8.3-ssh
# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419"
CMD [ "/usr/bin/supervisord" ]

使用 Dockerfile 構建鏡像

# 在 Dockerfile 的目錄下運行
docker build -t python:3.8.3-cuda .

3 安裝cuda

# 啓動容器 鏡像配置的默認入口爲 supervisor, supervisor 啓動ssh, 所以後面通過 ssh 登錄, 順便測試 ssh
# 映射 /etc 的三個文件, 主要是爲了使容器和當前系統的用戶及密碼一致, 以便可以使用當前系統賬號ssh登錄容器. 記得將 用戶 歸屬到 users 組中, 以便在容器中可以使用sudo(上面配置了 users 組可以免密sudo, 不影響容器外)
# 下載 目錄映射到 software, 主要我的cuda安裝文件都在這裏
# 假設當前系統賬號爲 user, 並且稍後使用該賬號ssh登錄容器, 爲user賬號映射home目錄
mkdir -p /home/user/user
docker run -d \
    -p 20022:22 \
    -h cuda10 \
    -v /etc/passwd:/etc/passwd:ro\
    -v /etc/group:/etc/group:ro\
    -v /etc/shadow:/etc/shadow:ro\
    -v /home/user/下載:/home/user/software \
    -v /home/user/user:/home/user \
    --name=cuda10 \
    --gpus all \
    python:3.8.3-cuda 

# 使用 user 賬號登錄容器
ssh -p 20022 localhost    # 默認賬號爲當前賬號, 即 user

安裝 cuda, 這些文件都在 nvidia 官方即可下載

# cuda 按提示安裝即可, 注意不要安裝 driver
sudo sh ~/software/cuda_10.1.243_418.87.00_linux.run

# cudnn
cd ~/software
tar -zxf cudnn-10.1-linux-x64-v7.6.5.32.tgz
sudo cp -r ~/software/cuda/include/* /usr/local/cuda-10.1/include/
sudo cp -r ~/software/cuda/lib64/* /usr/local/cuda-10.1/lib64/
rm -rf -r ~/software/cuda

# TensorRT
tar -zxf TensorRT-6.0.1.5.Ubuntu-18.04.x86_64-gnu.cuda-10.1.cudnn7.6.tar.gz
sudo cp -r ~/software/TensorRT-6.0.1.5 /usr/local/
sudo ln -sf /usr/local/TensorRT-6.0.1.5 /usr/local/TensorRT
# 安裝 tensorrt
cd /usr/local/TensorRT/python
sudo pip install ./tensorrt-6.0.1.5-cp37-none-linux_x86_64.whl    # 安裝不上, 沒有py38版本
# 安裝 uff
cd ../uff/
sudo pip install uff-0.6.5-py2.py3-none-any.whl
which convert-to-uff
# 安裝 graphsurgeon
cd ../graphsurgeon
sudo pip install graphsurgeon-0.4.1-py2.py3-none-any.whl

# 添加環境變量
sudo vim /etc/profile 和 sudo vim /etc/zsh/zshrc    # 添加以下三行
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/TensorRT/lib:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:$PATH

# 刪除 pip 緩存
rm -rf ~/.cache/pip/http/*

退出容器, 保存一個包含cuda10.1的鏡像

# 退出並關閉容器 
docker stop cuda10    # 以 cuda10 作爲名字啓動的容器
# 保存容器爲鏡像
docker commit cuda10 python:3.8.3-cuda10.1

4 安裝tensorflow或pytorch

docker run -d \
    -p 20022:22 \
    -h tf2 \
    -v /etc/passwd:/etc/passwd:ro\
    -v /etc/group:/etc/group:ro\
    -v /etc/shadow:/etc/shadow:ro\
    -v /home/user/user:/home/user \
    --name=tf2 \
    --gpus all \
    python:3.8.3-cuda10.1

# 使用 user 賬號登錄容器
ssh -p 20022 localhost    # 默認賬號爲當前賬號, 即 user

# 修改pip源
mkdir -p ~/.pip
echo '[global]\nindex-url = https://pypi.tuna.tsinghua.edu.cn/simple' > ~/.pip/pip.conf

# 安裝tensorflow
pip install tensorflow    # 這裏只是安裝到個人目錄下, 因爲使用的是普通賬號登錄的

# 測試 tf 顯卡是否可用
python -c "import tensorflow as tf; print(tf.test.is_gpu_available())"    # 結果應爲 True
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"    # 結果應爲 具體的顯卡 list

# 安裝pytorch 
pip install torch

# 測試 pytorch 顯卡是否能用
python -c "import torch; print(torch.cuda.is_available())"     # 結果應爲 True

# 結果正常了

如此, 環境便配置好了.

5 附件 zsh.zshrc 的內容

# Enable Powerlevel10k instant prompt. Should stay close to the top of ~/.zshrc.
# Initialization code that may require console input (password prompts, [y/n]
# confirmations, etc.) must go above this block; everything else may go below.
if [[ -r "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh" ]]; then
  source "${XDG_CACHE_HOME:-$HOME/.cache}/p10k-instant-prompt-${(%):-%n}.zsh"
fi

# If you come from bash you might have to change your $PATH.
# export PATH=$HOME/bin:/usr/local/bin:$PATH

# Path to your oh-my-zsh installation.
export ZSH="/usr/share/zsh/oh-my-zsh"

# Set name of the theme to load --- if set to "random", it will
# load a random theme each time oh-my-zsh is loaded, in which case,
# to know which specific one was loaded, run: echo $RANDOM_THEME
# See https://github.com/ohmyzsh/ohmyzsh/wiki/Themes
#ZSH_THEME="robbyrussell"
ZSH_THEME="powerlevel10k/powerlevel10k"

# Set list of themes to pick from when loading at random
# Setting this variable when ZSH_THEME=random will cause zsh to load
# a theme from this variable instead of looking in $ZSH/themes/
# If set to an empty array, this variable will have no effect.
# ZSH_THEME_RANDOM_CANDIDATES=( "robbyrussell" "agnoster" )

# Uncomment the following line to use case-sensitive completion.
# CASE_SENSITIVE="true"

# Uncomment the following line to use hyphen-insensitive completion.
# Case-sensitive completion must be off. _ and - will be interchangeable.
# HYPHEN_INSENSITIVE="true"

# Uncomment the following line to disable bi-weekly auto-update checks.
# DISABLE_AUTO_UPDATE="true"

# Uncomment the following line to automatically update without prompting.
# DISABLE_UPDATE_PROMPT="true"

# Uncomment the following line to change how often to auto-update (in days).
# export UPDATE_ZSH_DAYS=13

# Uncomment the following line if pasting URLs and other text is messed up.
# DISABLE_MAGIC_FUNCTIONS=true

# Uncomment the following line to disable colors in ls.
# DISABLE_LS_COLORS="true"

# Uncomment the following line to disable auto-setting terminal title.
# DISABLE_AUTO_TITLE="true"

# Uncomment the following line to enable command auto-correction.
# ENABLE_CORRECTION="true"

# Uncomment the following line to display red dots whilst waiting for completion.
# COMPLETION_WAITING_DOTS="true"

# Uncomment the following line if you want to disable marking untracked files
# under VCS as dirty. This makes repository status check for large repositories
# much, much faster.
# DISABLE_UNTRACKED_FILES_DIRTY="true"

# Uncomment the following line if you want to change the command execution time
# stamp shown in the history command output.
# You can set one of the optional three formats:
# "mm/dd/yyyy"|"dd.mm.yyyy"|"yyyy-mm-dd"
# or set a custom format using the strftime function format specifications,
# see 'man strftime' for details.
# HIST_STAMPS="mm/dd/yyyy"

# Would you like to use another custom folder than $ZSH/custom?
# ZSH_CUSTOM=/path/to/new-custom-folder

# Which plugins would you like to load?
# Standard plugins can be found in $ZSH/plugins/
# Custom plugins may be added to $ZSH_CUSTOM/plugins/
# Example format: plugins=(rails git textmate ruby lighthouse)
# Add wisely, as too many plugins slow down shell startup.
plugins=(git)

source $ZSH/oh-my-zsh.sh

# User configuration

# export MANPATH="/usr/local/man:$MANPATH"

# You may need to manually set your language environment
# export LANG=en_US.UTF-8

# Preferred editor for local and remote sessions
# if [[ -n $SSH_CONNECTION ]]; then
#   export EDITOR='vim'
# else
#   export EDITOR='mvim'
# fi

# Compilation flags
# export ARCHFLAGS="-arch x86_64"

# Set personal aliases, overriding those provided by oh-my-zsh libs,
# plugins, and themes. Aliases can be placed here, though oh-my-zsh
# users are encouraged to define aliases within the ZSH_CUSTOM folder.
# For a full list of active aliases, run `alias`.
#
# Example aliases
# alias zshconfig="mate ~/.zshrc"
# alias ohmyzsh="mate ~/.oh-my-zsh"

# key bindings
bindkey "\e[1~" beginning-of-line
bindkey "\e[4~" end-of-line
bindkey "\e[5~" beginning-of-history
bindkey "\e[6~" end-of-history
# for rxvt
bindkey "\e[8~" end-of-line
bindkey "\e[7~" beginning-of-line
# for non RH/Debian xterm, can't hurt for RH/DEbian xterm
bindkey "\eOH" beginning-of-line
bindkey "\eOF" end-of-line
# for freebsd console
bindkey "\e[H" beginning-of-line
bindkey "\e[F" end-of-line
# completion in the middle of a line
# bindkey '^i' expand-or-complete-prefix
# Fix numeric keypad  
# # 0 . Enter  
bindkey -s "^[Op" "0"
bindkey -s "^[On" "."
bindkey -s "^[OM" "^M"
# # 1 2 3  
bindkey -s "^[Oq" "1"
bindkey -s "^[Or" "2"
bindkey -s "^[Os" "3"
# # 4 5 6  
bindkey -s "^[Ot" "4"
bindkey -s "^[Ou" "5"
bindkey -s "^[Ov" "6"
# # 7 8 9  
bindkey -s "^[Ow" "7"
bindkey -s "^[Ox" "8"
bindkey -s "^[Oy" "9"
# # + - * /  
bindkey -s "^[Ol" "+"
bindkey -s "^[Om" "-"
bindkey -s "^[Oj" "*"
bindkey -s "^[Oo" "/"

# To customize prompt, run `p10k configure` or edit ~/.p10k.zsh.
[[ ! -f ~/.p10k.zsh ]] || source ~/.p10k.zsh

. /usr/share/zsh/oh-my-zsh/plugins/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh
. /usr/share/zsh/oh-my-zsh/plugins/zsh-autosuggestions/zsh-autosuggestions.zsh

POWERLEVEL9K_DISABLE_GITSTATUS=true
typeset -g POWERLEVEL9K_INSTANT_PROMPT=quiet

echo -e "\n提示: 當前運行的shell是zsh, 主題爲powerlevel10k, 如果想要修改主題的效果, 請運行: p10k configure\n"
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章