Dear LB community,
I am writing my first LBM code using CUDA on Win7, 64x, MS-VS 2008.
Thanks to Dr. Tolke for a well written article on using CUDA kernel for LBM.
Is the there any limit on numbers of arguments I can pass in CUDA kernel? I am trying to pass 43 arguments of size N*sizeof(float), where, N is 6561.
cuda_kernel<<nblocks, block_size>>(arg1, arg2, arg3, arg4,…argn)…
My code finish successfully , but at the end there is following error message in the debug output window at the bottom of MS-VS 2008. The results are not good.
First-chance exception at 0x000007fefd68aa7d in test.exe: Microsoft C++ exception: cudaError_enum at memory location 0x002bdc08…
First-chance exception at 0x000007fefd68aa7d in test.exe: Microsoft C++ exception: cudaError_enum at memory location 0x002bdc08…
This code works fine when I try to pass 40 arg of size N*sizeof(float), where N=6561.
I will appreciate any input to understand and potentially solve this problem.
Thanks
Shadab
following is my CUDA kernel call
int frame_rate, N=81*81;
size_t size = N*sizeof(float);
int blockSize = 512;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
LBM_comp <<< nBlocks, blockSize >>> (rho0_d, ux0_d, uy0_d, rho1_d, ux1_d, uy1_d, N,
f0_d, f1_d, f2_d, f3_d, f4_d, f5_d, f6_d, f7_d, f8_d, ftemp0_d, ftemp1_d, ftemp2_d, ftemp3_d, ftemp4_d, ftemp5_d, ftemp6_d, ftemp7_d, ftemp8_d,
g0_d, g1_d, g2_d, g3_d, g4_d, g5_d, g6_d, g7_d, g8_d, gtemp0_d, gtemp1_d, gtemp2_d, gtemp3_d, gtemp4_d, gtemp5_d, gtemp6_d, gtemp7_d, gtemp8_d,
is_solid_d, frame_rate);
% f and g’s are pdf’s for two components.
Following is my CUDA memory allocation for all the arguments:
cudaMalloc((void **) &rho0_d, size);
cudaMalloc((void **) &ux0_d, size);
cudaMalloc((void **) &uy0_d, size);
cudaMalloc((void **) &is_solid_d, size);
cudaMalloc((void **) &f0_d, size); cudaMalloc((void **) &ftemp0_d, size);
cudaMalloc((void **) &f1_d, size); cudaMalloc((void **) &ftemp1_d, size);
cudaMalloc((void **) &f2_d, size); cudaMalloc((void **) &ftemp2_d, size);
cudaMalloc((void **) &f3_d, size); cudaMalloc((void **) &ftemp3_d, size);
cudaMalloc((void **) &f4_d, size); cudaMalloc((void **) &ftemp4_d, size);
cudaMalloc((void **) &f5_d, size); cudaMalloc((void **) &ftemp5_d, size);
cudaMalloc((void **) &f6_d, size); cudaMalloc((void **) &ftemp6_d, size);
cudaMalloc((void **) &f7_d, size); cudaMalloc((void **) &ftemp7_d, size);
cudaMalloc((void **) &f8_d, size); cudaMalloc((void **) &ftemp8_d, size);
cudaMalloc((void **) &rho1_d, size);
cudaMalloc((void **) &ux1_d, size);
cudaMalloc((void **) &uy1_d, size);
cudaMalloc((void **) &g0_d, size); cudaMalloc((void **) >emp0_d, size);
cudaMalloc((void **) &g1_d, size); cudaMalloc((void **) >emp1_d, size);
cudaMalloc((void **) &g2_d, size); cudaMalloc((void **) >emp2_d, size);
cudaMalloc((void **) &g3_d, size); cudaMalloc((void **) >emp3_d, size);
cudaMalloc((void **) &g4_d, size); cudaMalloc((void **) >emp4_d, size);
cudaMalloc((void **) &g5_d, size); cudaMalloc((void **) >emp5_d, size);
cudaMalloc((void **) &g6_d, size); cudaMalloc((void **) >emp6_d, size);
cudaMalloc((void **) &g7_d, size); cudaMalloc((void **) >emp7_d, size);
cudaMalloc((void **) &g8_d, size); cudaMalloc((void **) >emp8_d, size);