Commits · ddfe688be68654417c82b7ce0326698ab6c2b53c · submodule / opencv

13 Jan, 2017 1 commit
- imgproc/CLAHE/ocl: Removed unnecessary __local variable · 8c66531c
  mshabunin authored 8 years ago
  
  8c66531c
06 Dec, 2016 1 commit

5x5 gaussian blur optimization · 396921dd

Li Peng authored 8 years ago

Add new 5x5 gaussian blur kernel for CV_8UC1 format,
it is 50% ~ 70% faster than current ocl kernel in the perf test.
Signed-off-by: Li Peng <peng.li@intel.com>

396921dd

02 Dec, 2016 1 commit

Image pyramids upsampling optimization · b69cdb24

Li Peng authored 8 years ago

Add new ocl kernel for image pyramids upsampling,
It is 35% faster than current OCL kernel in perf test.
Signed-off-by: Li Peng <peng.li@intel.com>

b69cdb24

30 Nov, 2016 1 commit

more optimization for warpAffine and warpPerspective · 2ca5a7e8

Li Peng authored 8 years ago

Add new OpenCL kernels for bicubic interploation, it is 20% faster
than current warp image kernel with bicubic interploation.
Signed-off-by: Li Peng <peng.li@intel.com>

2ca5a7e8

29 Nov, 2016 1 commit

optimization for warpAffine and warpPerspective · b72d1967

Li Peng authored 8 years ago

Add new ocl kernels for warpAffine and warpPerspective,
The average performance improvemnt is about 30%. The new
ocl kernels require CV_8UC1 format and support nearest
neighbor and bilinear interpolation.
Signed-off-by: Li Peng <peng.li@intel.com>

b72d1967

23 Nov, 2016 1 commit
- fixed wrong equivalence in YUV conversion (#7481) · 7db43f9f
  Rostislav Vasilikhin authored 8 years ago
```
* fixed wrong equivalence in YUV conversion

* fixed channel order from YVU to YUV
```
  7db43f9f
17 Nov, 2016 1 commit

laplacian ocl kernel optimization · 6cb73356

Li Peng authored 8 years ago

This ocl kernel is 46%~171% faster than current laplacian 3x3
ocl kernel in the perf test, with image format "CV_8UC1".
Signed-off-by: Li Peng <peng.li@intel.com>

6cb73356

14 Nov, 2016 1 commit

sobel and scharr ocl kernel optimization · 8d4a7d3d

Li Peng authored 8 years ago

It improves 108%~230% performance in the perf test
with image format "CV_8UC1" and kernel size 3.
Signed-off-by: Li Peng <peng.li@intel.com>

8d4a7d3d

08 Nov, 2016 1 commit

gaussian blur ocl kernel optimization · 8f63f51e

Li Peng authored 8 years ago

This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*
Signed-off-by: Li Peng <peng.li@intel.com>

8f63f51e

07 Nov, 2016 1 commit
- Fixed several OpenCL compiler warnings · 3e28d517
  mshabunin authored 8 years ago
  
  3e28d517
04 Nov, 2016 2 commits

morph ocl kernel for erode and dilate filter · 35198b84

Li Peng authored 8 years ago

This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*

Also add accuracy test cases for this kernel, the test command is

./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*
Signed-off-by: Li Peng <peng.li@intel.com>

35198b84

Fix the OpenCL portion to match the c++ code. · 17df65e6
Tetragramm authored 8 years ago
```
Fix an undiscovered bug in the c++ code.
```
17df65e6

26 Oct, 2016 1 commit

ocl kernel performance optimization for box filter · 3607da9f

Li Peng authored 8 years ago

The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test

./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*

Also add test cases for this ocl kernel.
Signed-off-by: Li Peng <peng.li@intel.com>

3607da9f

17 Oct, 2016 1 commit
- Fix the problem: filterSmall.cl report error with double · ef47bcc8
  LukeZhu authored 8 years ago
  
  ef47bcc8
09 Aug, 2016 1 commit

ocl: fix Canny for Intel devices · b8e08d5d

Alexander Alekhin authored 8 years ago

There is an issue with processing of abs(short) function for
negative argument.

Affected OpenCL devices:
- iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 )
- CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))

b8e08d5d

24 Apr, 2016 1 commit

Add OpenCL support to linearPolar & logPolar · db9f6117

ohnozzy authored 8 years ago

Add OpenCL  support to linearPolar & logPolar.
The OpenCL code use float instead of double, so that it does not require
cl_khr_fp64 extension, with slight precision lost.

Add explicit conversion

Add explicit conversion from double to float to eliminate warning during
compilation.

db9f6117

15 Mar, 2016 1 commit

fix potential race condition in canny.cl. · 0b08d255

Zhigang Gong authored 9 years ago

See the below code snippet:

while(l_counter != 0)
{
    int mod = l_counter % LOCAL_TOTAL;
    int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);

    for (int i = 0; i < pix_per_thr; ++i)
    {
        int index = atomic_dec(&l_counter) - 1;
        ....
    }
    ....
    barrier(CLK_LOCAL_MEM_FENCE);
}

If we don't put a barrier before the for loop, then there is a possiblity
that some work item enter this loop but the others are not, the the l_counter
will be reduced in the for loop and may be changed to zero, and the other
work items may can't enter the while loop. If this happens, it breaks the
barrier's rule which requires all the work items reach the same barrier.
And it may hang the GPU depends on the implementation of opencl platform.

This issue is raised at:
https://github.com/Itseez/opencv/issues/5175Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>

0b08d255

26 May, 2015 2 commits

Fixed the race condition between inc and dec on the l_counter. · 0f7de40e
Zhigang Gong authored 9 years ago
```
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
```
0f7de40e

Avoid negative index for a local buffer in Canny.cl. · 3c852009

Zhigang Gong authored 9 years ago

int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
The pix_per_thr * LOCAL_TOTAL may be larger than l_counter.
Thus the index of l_stack may be negative which may cause serious
problems. Let's skip the loop when we get negative index and we need
to add back the lcounter to keep its balance and avoid potential
negative counter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>

3c852009

22 Apr, 2015 1 commit
- fix gftt opencv kernel when using mask · 1ea41e72
  Pavel Rojtberg authored 9 years ago
  
  1ea41e72
26 Nov, 2014 1 commit

Optimize pyrUp_unrolled() by mad function. · 6e705055

Yan Wang authored 10 years ago

It could improve performance when image size is large.
E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18

6e705055

07 Nov, 2014 1 commit
- Correctly unrolled some cycles · 7c870014
  Alexander Karsakov authored 10 years ago
  
  7c870014
06 Nov, 2014 1 commit
- Minor optimization for ocl_canny · 0ec0aeb7
  Alexander Karsakov authored 10 years ago
  
  0ec0aeb7
05 Nov, 2014 1 commit
- Fix OpenCL version of HoughLinesP function · 957e5ef8
  vbystricky authored 10 years ago
  
  957e5ef8
28 Oct, 2014 1 commit
- Added optimized loading to YUV2RGB_422 kernel · 643c906f
  Alexander Karsakov authored 10 years ago
  
  643c906f
27 Oct, 2014 1 commit
- Added loading 4 pixels in line instead of 2 to RGB[A] -> YUV(420) kernel · 1466621f
  Alexander Karsakov authored 10 years ago
  
  1466621f
21 Oct, 2014 5 commits
- Used direct float calculations · 60367907
  Alexander Karsakov authored 10 years ago
  
  60367907
- Added OCL code for YUV422 -> RGB[A]|BGR[A] color conversion · 5aa9ac9a
  Alexander Karsakov authored 10 years ago
  
  5aa9ac9a
- Added OCL code for RGB[A]|BGR[A] -> YUV_[YV12|IYUV] color conversion · c8707b89
  Alexander Karsakov authored 10 years ago
  
  c8707b89
- Added OCL code for YUV2BGR_YV12 and YUV2BGR_IYUV color conversions · 1cc17a71
  Alexander Karsakov authored 10 years ago
  
  1cc17a71
- Added support for YUV2RGB[A]_NV21 and YUV2BGR[A]_NV21 conversion · 85b60ee3
  Alexander Karsakov authored 10 years ago
  
  85b60ee3
07 Oct, 2014 1 commit
- Optimization for HoughLinesP · 66a8acfd
  Alexander Karsakov authored 10 years ago
  
  66a8acfd
29 Sep, 2014 2 commits
- Added HoughLinesP OCL implementation · eaf5a163
  Alexander Karsakov authored 10 years ago
  
  eaf5a163
- Combined counter and corner buffers into one · 3695a316
  Alexander Karsakov authored 10 years ago
  
  3695a316
17 Sep, 2014 1 commit

Use vload to read unaligned data instead of dereference operator. · c5552788

Chuanbo Weng authored 10 years ago

According to opencl 1.2 spec 6.1.5:
    For arguments to a __kernel function declared to be a pointer to a
    data type, the OpenCL compiler can assume that the pointee is always
    appropriately aligned as required by the data type. The behavior of
    an unaligned load or store is undefined, except for the
    vloadn, vload_halfn, vstoren, and vstore_halfn functions defined in
    section 6.12.7.

Original code read data of type T from address not aligned by multiple
of sizeof(T), so the result is incorrect. With this patch, the cases
./opencv_perf_imgproc
--gtest_filter=OCL_ImgSize_TmplSize_Method_MatType_MatchTemplate.MatchTemplate/*
could work well with beignet 0.9.3.
Signed-off-by: Chuanbo Weng <chuanbo.weng@intel.com>

c5552788

11 Sep, 2014 2 commits
- Remove two "set" kernel call · 8c08714b
  Alexander Karsakov authored 10 years ago
  
  8c08714b
- Optimization OpenCL version of Filter2D · b0bf8478
  vbystricky authored 10 years ago
  
  b0bf8478
05 Sep, 2014 2 commits
- Refactoring and optimization · 39b27a19
  Alexander Karsakov authored 10 years ago
  
  39b27a19
- Optimization for getLines · d59a6fa5
  Alexander Karsakov authored 10 years ago
  
  d59a6fa5
04 Sep, 2014 1 commit
- Refactoring, minor optimization · fee8f29f
  Alexander Karsakov authored 10 years ago
  
  fee8f29f