- 13 Jan, 2017 1 commit
-
-
mshabunin authored
-
- 06 Dec, 2016 1 commit
-
-
Li Peng authored
Add new 5x5 gaussian blur kernel for CV_8UC1 format, it is 50% ~ 70% faster than current ocl kernel in the perf test. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 02 Dec, 2016 1 commit
-
-
Li Peng authored
Add new ocl kernel for image pyramids upsampling, It is 35% faster than current OCL kernel in perf test. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 30 Nov, 2016 1 commit
-
-
Li Peng authored
Add new OpenCL kernels for bicubic interploation, it is 20% faster than current warp image kernel with bicubic interploation. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 29 Nov, 2016 1 commit
-
-
Li Peng authored
Add new ocl kernels for warpAffine and warpPerspective, The average performance improvemnt is about 30%. The new ocl kernels require CV_8UC1 format and support nearest neighbor and bilinear interpolation. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 23 Nov, 2016 1 commit
-
-
Rostislav Vasilikhin authored
* fixed wrong equivalence in YUV conversion * fixed channel order from YVU to YUV
-
- 17 Nov, 2016 1 commit
-
-
Li Peng authored
This ocl kernel is 46%~171% faster than current laplacian 3x3 ocl kernel in the perf test, with image format "CV_8UC1". Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 14 Nov, 2016 1 commit
-
-
Li Peng authored
It improves 108%~230% performance in the perf test with image format "CV_8UC1" and kernel size 3. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 08 Nov, 2016 1 commit
-
-
Li Peng authored
This ocl kernel is for 3x3 kernel size and CV_8UC1 format It is 115% ~ 300% faster than current ocl path in perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture* Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 07 Nov, 2016 1 commit
-
-
mshabunin authored
-
- 04 Nov, 2016 2 commits
-
-
Li Peng authored
This kernel is for CV_8UC1 format and 3x3 kernel size, It is about 33% ~ 55% faster than current ocl kernel with below perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture* python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture* Also add accuracy test cases for this kernel, the test command is ./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3* Signed-off-by:
Li Peng <peng.li@intel.com>
-
Tetragramm authored
Fix an undiscovered bug in the c++ code.
-
- 26 Oct, 2016 1 commit
-
-
Li Peng authored
The optimization is for CV_8UC1 format and 3x3 box filter, it is 15%~87% faster than current ocl kernel with below perf test ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture* Also add test cases for this ocl kernel. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 17 Oct, 2016 1 commit
-
-
LukeZhu authored
-
- 09 Aug, 2016 1 commit
-
-
Alexander Alekhin authored
There is an issue with processing of abs(short) function for negative argument. Affected OpenCL devices: - iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 ) - CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))
-
- 24 Apr, 2016 1 commit
-
-
ohnozzy authored
Add OpenCL support to linearPolar & logPolar. The OpenCL code use float instead of double, so that it does not require cl_khr_fp64 extension, with slight precision lost. Add explicit conversion Add explicit conversion from double to float to eliminate warning during compilation.
-
- 15 Mar, 2016 1 commit
-
-
Zhigang Gong authored
See the below code snippet: while(l_counter != 0) { int mod = l_counter % LOCAL_TOTAL; int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); for (int i = 0; i < pix_per_thr; ++i) { int index = atomic_dec(&l_counter) - 1; .... } .... barrier(CLK_LOCAL_MEM_FENCE); } If we don't put a barrier before the for loop, then there is a possiblity that some work item enter this loop but the others are not, the the l_counter will be reduced in the for loop and may be changed to zero, and the other work items may can't enter the while loop. If this happens, it breaks the barrier's rule which requires all the work items reach the same barrier. And it may hang the GPU depends on the implementation of opencl platform. This issue is raised at: https://github.com/Itseez/opencv/issues/5175Signed-off-by:
Zhigang Gong <zhigang.gong@linux.intel.com>
-
- 26 May, 2015 2 commits
-
-
Zhigang Gong authored
Signed-off-by:
Zhigang Gong <zhigang.gong@intel.com>
-
Zhigang Gong authored
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); The pix_per_thr * LOCAL_TOTAL may be larger than l_counter. Thus the index of l_stack may be negative which may cause serious problems. Let's skip the loop when we get negative index and we need to add back the lcounter to keep its balance and avoid potential negative counter. Signed-off-by:
Zhigang Gong <zhigang.gong@intel.com>
-
- 22 Apr, 2015 1 commit
-
-
Pavel Rojtberg authored
-
- 26 Nov, 2014 1 commit
-
-
Yan Wang authored
It could improve performance when image size is large. E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18
-
- 07 Nov, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 06 Nov, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 05 Nov, 2014 1 commit
-
-
vbystricky authored
-
- 28 Oct, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 27 Oct, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 21 Oct, 2014 5 commits
-
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
- 07 Oct, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 29 Sep, 2014 2 commits
-
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
- 17 Sep, 2014 1 commit
-
-
Chuanbo Weng authored
According to opencl 1.2 spec 6.1.5: For arguments to a __kernel function declared to be a pointer to a data type, the OpenCL compiler can assume that the pointee is always appropriately aligned as required by the data type. The behavior of an unaligned load or store is undefined, except for the vloadn, vload_halfn, vstoren, and vstore_halfn functions defined in section 6.12.7. Original code read data of type T from address not aligned by multiple of sizeof(T), so the result is incorrect. With this patch, the cases ./opencv_perf_imgproc --gtest_filter=OCL_ImgSize_TmplSize_Method_MatType_MatchTemplate.MatchTemplate/* could work well with beignet 0.9.3. Signed-off-by:
Chuanbo Weng <chuanbo.weng@intel.com>
-
- 11 Sep, 2014 2 commits
-
-
Alexander Karsakov authored
-
vbystricky authored
-
- 05 Sep, 2014 2 commits
-
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
- 04 Sep, 2014 1 commit
-
-
Alexander Karsakov authored
-