pycuda CompileError: nvcc compilation failed

環境 win10, visual studio 2019, pycuda 2019.02,

寫一段 cuda c 代碼, 提交給一個構造函數

import pycuda.driver as cuda
import pycuda.autoinit 
from pycuda.compiler import SourceModule

import numpy as np
a = np.random.randn(4,4)
a = a.astype(np.float32)

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)

運行一個內核函數(kernel)

寫一個代碼來把a_gpu這段顯存中存儲的數組的每一個值都乘以2. 爲了實現這個效果,我們就要寫一段CUDA C代碼,然後把這段代碼提交給一個構造函數,這裏用到了pycuda.compiler.SourceModule:

mod = SourceModule("""
 __global__ void doublify(float *a)
 {
 int idx = threadIdx.x + threadIdx.y*4;
 a[idx] *= 2;
 }
 """)

pycuda CompileError: nvcc compilation failed

如果這一步出錯CompileError: nvcc compilation of C:\Users\KANGNI~1\AppData\Local\Temp\tmpbregy28e\kernel.cu failed 可能是環境配置的問題,嘗試在代碼中加入如下片段

import os
_path = r"D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\bin\Hostx64\x64"

if os.system("cl.exe"):
   os.environ['PATH'] += ';' + _path
if os.system("cl.exe"):
   raise RuntimeError("cl.exe still not found, path probably incorrect")
mod = SourceModule("""
   					__global__ void doublify(float *a)
   					{
   						int idx = threadIdx.x + threadIdx.y*4;
   						 a[idx] *= 2;
   					 }
   			""")

這一步如果沒有出錯,就說明這段代碼已經編譯成功,並且加載到顯卡中。然後咱們可以使用pycuda.driver.Function,然後調用此引用,把顯存中的數組a_gpu作爲參數傳過去,同時設定塊大小爲4x4:


func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))

a_doubled = np.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

result:

[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]
[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]

ref: http://www.voidcn.com/article/p-zoelqetb-bvv.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章