Commit 5a72be08 authored by GabrieleDalmazzone's avatar GabrieleDalmazzone

Race condition bug-fix in


* The second __syncthreads() is necessary, I am sure of that.
* The code works without the first __syncthreads() too, but I have however added it for symmetry. Anyway it doesn't affect time performances, I have checked it with some profiling with nvvp
parent a0f86455
......@@ -331,11 +331,13 @@ namespace cv { namespace cuda { namespace device
if (threadIdx.x < block_hist_size)
elem = hist[0];
__syncthreads(); // prevent race condition (redundant?)
float sum = reduce_smem<nthreads>(squares, elem * elem);
float scale = 1.0f / (::sqrtf(sum) + 0.1f * block_hist_size);
elem = ::min(elem * scale, threshold);
__syncthreads(); // prevent race condition
sum = reduce_smem<nthreads>(squares, elem * elem);
scale = 1.0f / (::sqrtf(sum) + 1e-3f);
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment