寫一段 cuda c 代碼，提交給一個構造函數

import pycuda.driver as cuda
import pycuda.autoinit 
from pycuda.compiler import SourceModule

import numpy as np
a = np.random.randn(4,4)
a = a.astype(np.float32)

a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)

運行一個內核函數（kernel）

寫一個代碼來把a_gpu這段顯存中存儲的數組的每一個值都乘以2. 爲了實現這個效果，我們就要寫一段CUDA C代碼，然後把這段代碼提交給一個構造函數，這裏用到了pycuda.compiler.SourceModule:

mod = SourceModule("""
 __global__ void doublify(float *a)
 {
 int idx = threadIdx.x + threadIdx.y*4;
 a[idx] *= 2;
 }
 """)

pycuda CompileError: nvcc compilation failed

如果這一步出錯CompileError: nvcc compilation of C:\Users\KANGNI~1\AppData\Local\Temp\tmpbregy28e\kernel.cu failed 可能是環境配置的問題，嘗試在代碼中加入如下片段

import os
_path = r"D:\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\bin\Hostx64\x64"

if os.system("cl.exe"):
   os.environ['PATH'] += ';' + _path
if os.system("cl.exe"):
   raise RuntimeError("cl.exe still not found, path probably incorrect")
mod = SourceModule("""
   					__global__ void doublify(float *a)
   					{
   						int idx = threadIdx.x + threadIdx.y*4;
   						 a[idx] *= 2;
   					 }
   			""")

這一步如果沒有出錯，就說明這段代碼已經編譯成功，並且加載到顯卡中。然後咱們可以使用pycuda.driver.Function,然後調用此引用，把顯存中的數組a_gpu作爲參數傳過去，同時設定塊大小爲4x4：


func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))

a_doubled = np.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

result:

[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]
[[ 1.4142118   1.48865     3.0958736  -0.64879215]
 [ 0.5829473   1.684329    0.21461935 -2.3383026 ]
 [ 1.2626396  -0.7566854   1.8427325  -0.52471924]
 [-0.80164     1.3886924  -3.5787368   0.72956717]]

ref: http://www.voidcn.com/article/p-zoelqetb-bvv.html

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pycuda CompileError: nvcc compilation failed

寫一段 cuda c 代碼，提交給一個構造函數

運行一個內核函數（kernel）

pycuda CompileError: nvcc compilation failed

工作中用到的腳本合集

24-5-18 X

np.vectorize np.piecewise 用法小結

conda更換國內源

docker_note_2

git 常用命令_note

pybind11使用教程筆記__4_數據類型轉換(typeconversion)

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

pycuda CompileError: nvcc compilation failed

寫一段 cuda c 代碼， 提交給一個構造函數

運行一個內核函數（kernel）

pycuda CompileError: nvcc compilation failed

寫一段 cuda c 代碼，提交給一個構造函數