一 安裝NVIDIA GEFORCE 930M顯卡驅動(網上有很多參考文章)
1.下載驅動
2. 禁止集成的nouveau驅動
Ubuntu系統集成的顯卡驅動程序是nouveau,它是第三方爲NVIDIA開發的開源驅動,我們需要先將其屏蔽才能安裝NVIDIA官方驅動。
將驅動添加到黑名單blacklist.conf中,但是由於該文件的屬性不允許修改。所以需要先修改文件屬性。
查看屬性
$sudo ls -lh /etc/modprobe.d/blacklist.conf
修改屬性
$sudo chmod 666 /etc/modprobe.d/blacklist.conf
用gedit編輯器打開
$sudo gedit /etc/modprobe.d/blacklist.conf
在該文件後添加一下幾行:
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist rivatv
blacklist nvidiafb
3.開始安裝
先按Ctrl + Alt + F1到控制檯,關閉當前圖形環境
sudo init 3
sudo rm -r /tmp/.X*
$sudo service lightdm stop
再安裝驅動程序
$sudo sh NVIDIA-Linux-x86_64-xxx.run
最後重新啓動圖形環境
$sudo service lightdm start
4.查看顯卡驅動版本
可以通過以下命令確認驅動是否正確安裝
$cat /proc/driver/nvidia/version
輸入nvidia-smi查看:
wu@wu-X555LF:/$ nvidia-smi
Sun Apr 21 15:26:48 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 930M Off | 00000000:04:00.0 Off | N/A |
| N/A 44C P8 N/A / N/A | 131MiB / 2002MiB | 20% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1022 G /usr/lib/xorg/Xorg 93MiB |
| 0 2047 G compiz 36MiB |
+-----------------------------------------------------------------------------+
在Ubuntu中可以終端輸入sudo nvidia-settings 進入圖形話界面,另外可以通過搜索框進入查看:
二 安裝cuda
wu@wu-X555LF:/$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
wu@wu-X555LF:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
wu@wu-X555LF:~/NVIDIA_CUDA-10.0_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
最開始安裝cuda,隨便去安裝了一個最新版的,導致./deviceQuery測試爲:
CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
測試失敗
CUDA driver version is insufficient for CUDA runtime version
翻譯過來就是CUDA的驅動程序版本跟CUDA的運行時版本不匹配!
1.CUDA driver version(驅動版本):就是NVIDIA GPU的驅動程序版本;
查看命令:nvidia-smi
我們看到我的GPU的驅動程序版本是:384.81
2.CUDA runtime version(運行時版本):是在python中安裝的cudatoolkit和cudnn程序包的版本
查看命令:pip list
python安裝的cudatoolkit和cudnn程序包版本是:10
3.nvidia 驅動和cuda runtime 版本對應關係
運行時版本 驅動版本
CUDA 9.1 387.xx
CUDA 9.0 384.xx
CUDA 8.0 375.xx (GA2)
CUDA 8.0 367.4x
CUDA 7.5 352.xx
CUDA 7.0 346.xx
CUDA 6.5 340.xx
CUDA 6.0 331.xx
CUDA 5.5 319.xx
CUDA 5.0 304.xx
CUDA 4.2 295.41
CUDA 4.1 285.05.33
CUDA 4.0 270.41.19
CUDA 3.2 260.19.26
CUDA 3.1 256.40
CUDA 3.0 195.36.15
1、Linux安裝CUDA後運行deviceQuery出現瞭如下問題,網上尋找答案給出多種解決方案
deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
2、網上大多的解決的文章爲:
重新更新驅動(測試了多個版本的驅動,問題依舊)
雙系統設置NVIDIA顯卡爲默認(無效)
3、我的方案
後來發現我的CUDA tookit安裝包型號和NVIDIA顯卡驅動不是同一個型號。
CUDA tookit的版本類型是cuda_10.0.130_410.48_linux.run,
顯卡驅動是NVIDIA-Linux-x86_64-384.130.run。
一個410,一個384,明顯不對。
於是嘗試顯卡驅動和CUDA tookit型號保持一致,都使用384類型的,結果發現deviceQuery能跑通了。
把CUDA tookit的版本類型改爲 cuda_9.0.176_384.81_linux.run
下載地址:https://developer.nvidia.com/cuda-downloads
安裝成功後:
在 bashrc 中添加安裝位置
安裝位置應該被添加到 bashrc 文件中,以便系統下一次知道如何找到這些用於 CUDA 的文件。使用下面的命令打開 bashrc 文件:
sudo vim ~/.bashrc
文件打開後,添加下面兩行到文件的末尾:
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda
嘗試編譯cuda提供的例子
1)打開終端輸入:$ cd /home/xxx/NVIDIA_CUDA-9.0_Samples 其中xxx是你自己的用戶名,通過命令cd進入NVIDIA_CUDA-9.0_Samples目錄。
然後終端輸入:$ make
系統就會自動進入到編譯過程,整個過程大概需要十幾到二十分鐘,請耐心等待。如果出現錯誤的話,系統會立即報錯停止。
第一次運行時可能會報錯,提示的錯誤信息可能會是系統中沒有gcc,
解決辦法就是通過命令重新安裝gcc就行,在終端輸入:$ sudo apt-get install gcc 安裝完gcc後, 再make就正常了。
如果編譯成功,最後會顯示Finished building CUDA samples。
2)運行編譯生成的二進制文件。
編譯後的二進制文件 默認存放在NVIDIA_CUDA-9.0_Samples/bin中。
接着在上一個終端中輸入 :$ cd /home/lxxx/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release 其中xxx是你自己的用戶名
然後在終端輸入 :$ ./deviceQuery
CUDA安裝且配置成功,其中 Result = PASS代表成功,若失敗 Result = FAIL
$ cd NVIDIA_CUDA-8.0_Samples/
$ make
$ cd bin/x86_64/linux/release/
$ ./deviceQuery
wu@wu-X555LF:~/NVIDIA_CUDA-9.0_Samples$ cd bin/x86_64/linux/release/
wu@wu-X555LF:~/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 930M"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2003 MBytes (2100232192 bytes)
( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 941 MHz (0.94 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
3)最後再檢查一下系統和CUDA-Capable device的連接情況
終端輸入 : $ ./bandwidthTest
看到類似如下圖片中的顯示,則代表成功
wu@wu-X555LF:~/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release$ ./bandwidthTest[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce 930M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1410.3
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1558.4
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12727.5
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
三 安裝cudnn
1、cuDNN 的全稱是 The NVIDIA CUDA® Deep Neural Network library,是專門用來對深度學習加速的庫,它支持 Caffe2, MATLAB, Microsoft Cognitive Toolkit, TensorFlow, Theano 及 PyTorch 等深度學習的加速優化,目前最新版本是 cuDNN 7.1,接下來我們來看下它的安裝方式。
下載鏈接:https://developer.nvidia.com/rdp/cudnn-download,需要註冊之後才能打開,這裏我們選擇 Download cuDNN v7.5.0 (Feb 21, 2019), for CUDA 9.0
本人是下載這三個文件安裝的,但後續出現問題:libcudnn*.so*文件未複製到 /usr/local/cuda/lib64,cudnn.h文件未複製到/usr/local/cuda/include
libcudnn7-doc_7.5.0.56-1+cuda9.0_amd64.deb
libcudnn7-dev_7.5.0.56-1+cuda9.0_amd64.deb
libcudnn7_7.5.0.56-1+cuda9.0_amd64.deb
推薦選擇 cuDNN Library for Linux(cuDNN v7.5.0Library for Linux,爲cudnn-9.0-linux-x64-v7.5.0.56.tgz)安裝。
2、安裝 CuDNN 庫
CuDNN 下載需要花費一些功夫。Nvidia 沒有直接提供下載文件(雖然它是免費的)。通過下面的步驟獲取 CuDNN。
點擊此處進入 Nvidia 的註冊頁面並創建一個帳戶。第一頁要求你輸入你的個人資料,第二頁會要求你回答幾個調查問題。如果你不知道所有答案也沒問題,你可以隨便選擇一個選項。
通過前面的步驟,Nvidia 會向你的郵箱發送一個激活鏈接。在你激活之後,直接進入這裏的 CuDNN 下載鏈接。
登錄之後,你需要填寫另外一份類似的調查。隨機勾選複選框,然後點擊調查底部的 “proceed to Download”,在下一頁我們點擊同意使用條款。
https://developer.nvidia.com/rdp/cudnn-download
https://developer.nvidia.com/rdp/cudnn-download
執行安裝命令
sudo dpkg -i libcudnn7-doc_7.5.0.56-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.5.0.56-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7_7.5.0.56-1+cuda9.0_amd64.deb
wu@wu-X555LF:~/Downloads$ ls
cuda_9.0.176_384.81_linux.run
libcudnn7_7.5.0.56-1+cuda9.0_amd64.deb
libcudnn7-dev_7.5.0.56-1+cuda9.0_amd64.deb
libcudnn7-doc_7.5.0.56-1+cuda9.0_amd64.deb
wu@wu-X555LF:~/Downloads$ sudo dpkg -i libcudnn7_7.5.0.56-1+cuda9.0_amd64.deb
[sudo] password for wu:
Selecting previously unselected package libcudnn7.
(Reading database ... 236438 files and directories currently installed.)
Preparing to unpack libcudnn7_7.5.0.56-1+cuda9.0_amd64.deb ...
Unpacking libcudnn7 (7.5.0.56-1+cuda9.0) ...
Setting up libcudnn7 (7.5.0.56-1+cuda9.0) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
wu@wu-X555LF:~/Downloads$ sudo dpkg -i libcudnn7-dev_7.5.0.56-1+cuda9.0_amd64.deb
Selecting previously unselected package libcudnn7-dev.
(Reading database ... 236444 files and directories currently installed.)
Preparing to unpack libcudnn7-dev_7.5.0.56-1+cuda9.0_amd64.deb ...
Unpacking libcudnn7-dev (7.5.0.56-1+cuda9.0) ...
Setting up libcudnn7-dev (7.5.0.56-1+cuda9.0) ...
update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto mode
wu@wu-X555LF:~/Downloads$ sudo dpkg -i libcudnn7-doc_7.5.0.56-1+cuda9.0_amd64.deb
Selecting previously unselected package libcudnn7-doc.
(Reading database ... 236450 files and directories currently installed.)
Preparing to unpack libcudnn7-doc_7.5.0.56-1+cuda9.0_amd64.deb ...
Unpacking libcudnn7-doc (7.5.0.56-1+cuda9.0) ...
Setting up libcudnn7-doc (7.5.0.56-1+cuda9.0) ...
檢查CUDNN是否安裝
cd /usr/src/cudnn_samples_v7/mnistCUDNN
sudo make clean
sudo make
./mnistCUDNN(出錯了)
wu@wu-X555LF:~/Downloads$ cd /usr/src/cudnn_samples_v7/mnistCUDNN
wu@wu-X555LF:/usr/src/cudnn_samples_v7/mnistCUDNN$ sudo make clean
rm -rf *o
rm -rf mnistCUDNN
wu@wu-X555LF:/usr/src/cudnn_samples_v7/mnistCUDNN$ sudo make
Linking agains cublasLt = false
CUDA VERSION: 9000
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 30 35 50 53 60 61 62 70
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
In file included from /usr/local/cuda/include/channel_descriptor.h:62:0,
from /usr/local/cuda/include/cuda_runtime.h:90,
from /usr/include/cudnn.h:64,
from mnistCUDNN.cpp:30:
/usr/local/cuda/include/cuda_runtime_api.h:1683:101: error: use of enum ‘cudaDeviceP2PAttr’ without previous declaration
__ cudaError_t CUDARTAPI cudaDeviceGetP2PAttribute(int *value, enum cudaDeviceP
^
/usr/local/cuda/include/cuda_runtime_api.h:2930:102: error: use of enum ‘cudaFuncAttribute’ without previous declaration
_ cudaError_t CUDARTAPI cudaFuncSetAttribute(const void *func, enum cudaFuncAtt
^
In file included from /usr/local/cuda/include/channel_descriptor.h:62:0,
from /usr/local/cuda/include/cuda_runtime.h:90,
from /usr/include/cudnn.h:64,
from mnistCUDNN.cpp:30:
/usr/local/cuda/include/cuda_runtime_api.h:5770:92: error: use of enum ‘cudaMemoryAdvise’ without previous declaration
or_t CUDARTAPI cudaMemAdvise(const void *devPtr, size_t count, enum cudaMemoryA
^
/usr/local/cuda/include/cuda_runtime_api.h:5827:98: error: use of enum ‘cudaMemRangeAttribute’ without previous declaration
UDARTAPI cudaMemRangeGetAttribute(void *data, size_t dataSize, enum cudaMemRang
^
/usr/local/cuda/include/cuda_runtime_api.h:5864:102: error: use of enum ‘cudaMemRangeAttribute’ without previous declaration
TAPI cudaMemRangeGetAttributes(void **data, size_t *dataSizes, enum cudaMemRang
^
Makefile:226: recipe for target 'mnistCUDNN.o' failed
make: *** [mnistCUDNN.o] Error 1
If there is any error associated with running cuDNN, check the libcudnn*.so* files are present in /usr/local/cuda/lib64 and cudnn.h file is present in /usr/local/cuda/include
如果運行cuDNN出現錯誤,確認libcudnn*.so*文件是否已經存在於 /usr/local/cuda/lib64,cudnn.h文件是否已經存在於/usr/local/cuda/include
下載完cuDNN7.5的壓縮包之後解壓,然後將相關文件拷貝到cuda的系統路徑下即可:
tar -zxvf cudnn-9.0-linux-x64-v7.5.0.56.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/ -d
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
然後重新測試、成功:
cd /usr/src/cudnn_samples_v7/mnistCUDNN
sudo make clean
sudo make
./mnistCUDNN
wu@wu-X555LF:~/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release$ cd /usr/src/cudnn_samples_v7/mnistCUDNN
wu@wu-X555LF:/usr/src/cudnn_samples_v7/mnistCUDNN$ sudo make clean
[sudo] password for wu:
Sorry, try again.
[sudo] password for wu:
rm -rf *o
rm -rf mnistCUDNN
wu@wu-X555LF:/usr/src/cudnn_samples_v7/mnistCUDNN$ sudo make
Linking agains cublasLt = false
CUDA VERSION: 9000
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 30 35 50 53 60 61 62 70
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -IFreeImage/include -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
wu@wu-X555LF:/usr/src/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7500 , CUDNN_VERSION from cudnn.h : 7500 (7.5.0)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 3 Capabilities 5.0, SmClock 941.0 Mhz, MemSize (Mb) 2002, MemClock 900.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.047968 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.059104 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.069792 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.254816 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.582880 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.052800 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065120 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065280 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.236288 time requiring 207360 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.588448 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
三 安裝Tensorflow-GPU
官網下載選擇版本
CUDA 8.0→cuDNN v5.1 / CUDA 8.0→cuDNN v6.0 / CUDA 9.0→cuDNN v7.0.5
另外,tensorflow 1.6/1.5和CUDA 9.0對應,1.4/1.3和CUDA 8.0對應
執行安裝命令
pip install tensorflow-gpu==1.6.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
其中 -i https://pypi.tuna.tsinghua.edu.cn/simple,是從清華鏡像下載python的tensorflow-gpu第三方包會比從牆外下載快很多很多。
如果pip install tensorflow-gpu==1.13.1會出現以下錯誤,版本不對:
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory Failed to load the native TensorFlow runtime.
感覺就是版本問題惹的禍,搜索了一下,大家的解決方法也都大同小異(例如下面幾篇博客:https://blog.csdn.net/w5688414/article/details/79187499,https://blog.csdn.net/twt520ly/article/details/79415787),或者使用高版本的CUDA+cudnn,或者降低tensorflow的版本,具體而言,CUDA和cudnn這兩者的對應關係如下(按照這篇博客的介紹https://blog.csdn.net/gsch_12/article/details/79368990)
安裝好Tensorflow-GPU之後,代碼測試:
tensorflow_gpu_test.py
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
測試結果:
wu@wu-X555LF:~$ python tensorflow_gpu_test.py
2019-04-21 18:18:40.775553: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-21 18:18:40.897440: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-21 18:18:40.897909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce 930M major: 5 minor: 0 memoryClockRate(GHz): 0.941
pciBusID: 0000:04:00.0
totalMemory: 1.96GiB freeMemory: 1.79GiB
2019-04-21 18:18:40.897936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2019-04-21 18:21:44.220470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1546 MB memory) -> physical GPU (device: 0, name: GeForce 930M, pci bus id: 0000:04:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce 930M, pci bus id: 0000:04:00.0, compute capability: 5.0
2019-04-21 18:21:44.231117: I tensorflow/core/common_runtime/direct_session.cc:297] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce 930M, pci bus id: 0000:04:00.0, compute capability: 5.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-04-21 18:21:44.231886: I tensorflow/core/common_runtime/placer.cc:875] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-04-21 18:21:44.231921: I tensorflow/core/common_runtime/placer.cc:875] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-04-21 18:21:44.231935: I tensorflow/core/common_runtime/placer.cc:875] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
[49. 64.]]
wu@wu-X555LF:~$