Title |
Δ |
Do I need to free constant memory assigned using cudaMemcpyToSymbol?
+3.28 |
different thread blocks definition
0.00 |
How CUDA constant memory allocation works?
0.00 |
Peak Bandwidth for CUDA Surface Memory?
0.00 |
NVIDIA Visual Profiler, Debug and Release modes in Visual Studio 2010
-4.47 |
cuda kernels not executing concurrently
0.00 |
the Kernel delay increase by increasing the blocksPerGrid and threa...
0.00 |
cuda "invalid argument" error on second kernel
0.00 |
CUDA Debugging: "No value at target location", I clearly...
0.00 |
create OpenCL project in Visual Studio 2010
0.00 |
OpenGL shader debugging. NVIDIA Parallel nSight?
+3.19 |
Cuda zero-copy performance
0.00 |
Develop a Cuda DLL working with different Runtime versions
0.00 |
CUDA measure execution time per gpu core
0.00 |
CUDA performance improves when running more threads than there are...
+3.46 |
Is register overflowing a possible cause of a CUDA_EXCEPTION_5, War...
0.00 |
Strategies for timing CUDA Kernels: Pros and Cons?
-4.30 |
Shared memory allocation in CUDA
0.00 |
Why only one of the warps is executed by a SM in cuda?
+4.82 |
Uncoalesced float2 CUDA kernel
0.00 |
Nsight 2.2 sometimes works sometimes doesn't
0.00 |
not able to use printf in cuda kernel function
+4.73 |
Difference on creating a CUDA context
0.00 |
Disabling TDR for CUDA in Windows 8
0.00 |
Is my GTX680 really performing
+3.26 |
CUDA: Passing parameters to host compiler during Nsight session
+3.39 |
Unexpectedly large cmem[2] usage in CUDA code
0.00 |
Cuda Shared memory shown as register in Nsight
0.00 |
Cuda: Where do the built-in variables reside? (threadIdx, blockIdx,...
0.00 |
Understanding counters in CUDA profiler
+3.57 |
CUDA disable L1 cache only for one variable
-0.17 |
What does a high branch efficiency and low control flow efficiency...
0.00 |
Calculating achieved bandwidth and flops/Gflops, and evaluate CUDA...
0.00 |
Scalar variables and registers : CUDA
0.00 |
Difference in time reported by NVVP and counters
0.00 |
Define struct array in function
-1.31 |
Nsight profile experiments not running
0.00 |
Clear uint3 in CUDA using cudaMemset
0.00 |
Do we need two GPUs to debug CUDA code?
0.00 |
How to use L2 Cache in CUDA
0.00 |
How is cudaMemset implemented?
0.00 |
CUDA threads, SMX, SP and blocks, how do they work?
0.00 |
driver.Context.synchronize()- what else to take into consideration...
-0.14 |
Time between Kernel Launch and Kernel Execution
0.00 |
'Flush records'-Warning in Parallel Nsight profiling results
0.00 |
Trouble measuring the elapsed time of a CUDA program and CUDA kernels
0.00 |
location of cudaEventRecord and overlapping ops, when second kernel...
0.00 |
How to measure Streaming Multiprocessor use/idle times in CUDA?
0.00 |
In CUDA, how can we call a device function in another translation u...
0.00 |
Misaligned Shared or Local Address
0.00 |