Commit 2d8c89c4 authored by Chuanbo Weng's avatar Chuanbo Weng

Remove unnecessary kercn limitation of 4.

When accessing global memory by DWORD4, memory bandwidth
can be fully utilized on Intel platform. This patch will
make more image format(e.g. 8UC4) be processed in DWORD4
by work-item. After applying this patch, 3 subcase of
./opencv_perf_core --gtest_filter=OCL_RepeatFixture_Repeat.Repeat/*
can be speedup on HD4000 graphics card with Beignet:
OCL_RepeatFixture_Repeat.Repeat/2, 64% improvement.
OCL_RepeatFixture_Repeat.Repeat/6, 50% improvement.
OCL_RepeatFixture_Repeat.Repeat/8, 56% improvement.
Signed-off-by: 's avatarChuanbo Weng <chuanbo.weng@intel.com>
parent be5c9103
......@@ -846,7 +846,7 @@ static bool ocl_repeat(InputArray _src, int ny, int nx, OutputArray _dst)
int type = _src.type(), depth = CV_MAT_DEPTH(type), cn = CV_MAT_CN(type),
rowsPerWI = ocl::Device::getDefault().isIntel() ? 4 : 1,
kercn = std::min(ocl::predictOptimalVectorWidth(_src, _dst), 4);
kercn = ocl::predictOptimalVectorWidth(_src, _dst);
ocl::Kernel k("repeat", ocl::core::repeat_oclsrc,
format("-D T=%s -D nx=%d -D ny=%d -D rowsPerWI=%d -D cn=%d",
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment