cuda Toolkit 10.2 + VS2015 C++ cuda GPU運算步驟

cuda Toolkit 10.2 + VS2015 C++ cuda

GPU運算步驟

 

包含頭文件

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

 

1. GPU 內存申請

  • cudaError_t cudaStatus = cudaMalloc(void **p, size_t s)

2. 內存拷貝 host memory -> Gpu Buffer

  • cudaStatus = cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind)  //cudaMemcpyHostToDevice

    enum __evice_builtin__ cudaMemcpyKind
        {
            cudaMemcpyHostToHost          =   0,      /**< Host   -> Host */
            cudaMemcpyHostToDevice        =   1,      /**< Host   -> Device */
            cudaMemcpyDeviceToHost        =   2,      /**< Device -> Host */
            cudaMemcpyDeviceToDevice      =   3,      /**< Device -> Device */
            cudaMemcpyDefault             =   4       /**< Direction of the transfer is inferred from the pointer values. Requires unified virtual addressing */
        };

     

3. 調用 kernel 方法

  • addKernel <<<1, size >>>(dev_c, dev_a, dev_b);

  • __global__ void addKernel(int *c, const int *a, const int *b)
    {
    	int i = threadIdx.x + blockIdx.x;
    	printf("blockIdx.x %d threadIdx.x %d \n", blockIdx.x, threadIdx.x);
    	printf("i == %d \n", i);
    
    
    	c[i] = a[i] + b[i];
    }

     

4. 內存拷貝 Gpu Buffer->Host memory

  • cudaMemcpy(void *dst, const void *src, size_t count, enum cudaMemcpyKind kind);  // cudaMemcpyDeviceToHost

5. 設備重置

  • cudaStatus = cudaDeviceReset();

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章