Commits · 57cbde339372e0f8dad0758cb3bdf61008cded6b · submodule / opencv

20 Dec, 2017 1 commit
- imgproc(ocl): fix RGB2RGBA kernel out of range access · 813ff379
  Alexander Alekhin authored 7 years ago
  
  813ff379
21 Sep, 2017 1 commit

Bit-exact version of Luv2RGB_b (#9470) · cc547e82

Rostislav Vasilikhin authored 7 years ago

* lab_tetra squashed

* initial version is almost written

* unfinished work

* compilation fixed, to be debugged

* Lab test removed

* more fixes

* Luv2RGBinteger: channels order fixed

* Lab structs removed

* good trilinear interpolation added

* several fixes

* removed Luv2RGB interpolations, XYZ tables; 8-cell LUT added

* no_interpolate made 8-cell

* interpolations rewritten to 8-cell, minor fixes

* packed interpolation added for RGB2Luv

* tetra implemented

* removing unnecessary code

* LUT building merged

* changes ported to color.cpp

* minor fixes; try to suppress warnings

* fixed v range of Luv

* fixed incorrect src channel number

* minor fixes

* preliminary version of Luv2RGBinteger is done

* Luv2RGB_b is in progress

* XYZ color constants converted to softfloat

* Luv test: precision fixed

* Luv bit-exactness test added

* warnings fixed

* compilation fixed, error message fixed

* Luv check is limited to [0-2,0-2,0-2] by XYZ

* L->Y generation moved to LUT

* LUTs added for up and vp of Luv2RGB_b

* still works

* fixed-point is done, works at maxerr 2

* vectorized code is done, 2x slower than original

* perf improved by 10%

* extra comments removed

* code moved to color.cpp

* test_lab.cpp updated

* minor refactoring

* test added for Luv2RGB

* OCL Luv2RGB_b: XYZ are limited to [0, 2]; docs updated

* Luv2RGB_b rewritten to universal intrinsics

* test_lab.cpp moved to luv_tetra branch

cc547e82

07 Sep, 2017 1 commit
- imgproc(ocl): don't use doubles to process float data · 89bb028b
  Alexander Alekhin authored 7 years ago
  
  89bb028b
29 Aug, 2017 1 commit
- imgproc(ocl): eliminate div by zero in Canny · e3b12bdb
  Alexander Alekhin authored 7 years ago
  
  e3b12bdb
16 Jul, 2017 1 commit

initial version of Lab2RGB_f tetrahedral interpolation written · 4b75be80

Rostislav Vasilikhin authored 8 years ago

RGB2Lab_f added, bugs fixed, moved to float

several bugs fixed

LUT fixed, no switch in tetraInterpolate()

temporary code; to be removed and rewritten

before refactoring

extra interpolations removed, some things to do left

added Lab2RGB_b +XYZ version, etc.

basic version is done, to be sped up

tetra refactored

interpolations: LUT for weights, refactor., etc.

address arithm optimized

initial version of vectorized code added (not compiling now)

compilation fixed, now segfaults

a lot of fixes, vectorization temp. disabled

fixed trilinear shift size, max error dropped from 19 to 10

fixed several bugs (255 vs 256, signed vs unsigned, bIdx)

minor changes

packed: address arithmetics fixed

shorter code

experiments with pure integer calculations

Lab2RGB max error decreased to 2; need to clean the code

ready for vectorization; need cleaning

vectorized, to be debugged

precision fixed, max error is 2

Lab->XYZ shortened

minor fixes

Lab2RGB_f version fixed, to be completely rewritten using _b code

RGB2Lab_f vectorized

minors

moved to separate file

refactored Lab2RGB to float and int versions

minor fix

Lab2RGB_f vectorized

minor refactoring

Lab2RGBint refactored: process methods, vectorize by 4 pix

Lab2RGB_f int version is done

cleanup extra code

code copied to color.cpp

fixed blue idx bug

optimizations enabled when testing; mulFracConst introduced

divConst -> mulFracConst

calc min time in perf instead of avg

minors

process() slightly sped up

Lab2RGB_f: disabled int version

reinterpret added, minor fixes in names

some warnings fixed

changes transferred to color.cpp

RGB2Lab_f code (and trilinear interpolation code) moved to rgb2lab_faster

whitespace

shift negative fixed

more warnings fixed

"constant condition" warnings fixed, little speed up

minor changes

test_photo decolor fixed

changes copied to test_lab.cpp

idx bounds checking in LUT init

several fixes

WIP: softfloat almost integrated

test_lab partially rewritten to SoftFloat

color.cpp rewritten to SoftFloat

test_lab.cpp: accuracy code added

several fixes

RGB2Lab_b testing fixed

splineBuild() rewritten to SoftFloat

accuracy control improved

rounding fixed

Luv <=> RGB: rewritten to SoftFloat

OCL cvtColor Lab and Lut rewritten to SoftFloat

minor fixes

refactored to new SoftFloat interface

round() -> cvRound, etc.

fixed OCL tests

softfloat.cpp: internal functions made static, unused ones removed

meaningful constants

extra lines removed

unused function removed

unfinished work

it works, need to fix TODOs

refactoring; more calls rewritten

mulFracConst removed

constants made bit exact; minors

changes moved to color.cpp

fixed 1 bug and 4 warnings

OCL: fixed constants

pow(x, _1_3f) replaced by cubeRoot(x)

fixed compilation on MSVC32

magic constants explained

file with internal accuracy&speed tests moved to lab_tetra branch

4b75be80

05 Jul, 2017 2 commits
- magic constants explained · aa621d6f
  Rostislav Vasilikhin authored 7 years ago
  
  aa621d6f
- OCL code fixed, fix for NEON added · 704c6882
  Rostislav Vasilikhin authored 7 years ago
  
  704c6882
05 Apr, 2017 1 commit
- Fixed cvtColor OCL compilation issue (BGRA2mBGRA) · ce50df56
  Maksim Shabunin authored 7 years ago
  
  ce50df56
03 Mar, 2017 1 commit
- ocl: don't use vload4 for 3 channel images · ba8a6e35
  Alexander Alekhin authored 8 years ago
  
  ba8a6e35
13 Jan, 2017 1 commit
- imgproc/CLAHE/ocl: Removed unnecessary __local variable · 8c66531c
  mshabunin authored 8 years ago
  
  8c66531c
06 Dec, 2016 1 commit

5x5 gaussian blur optimization · 396921dd

Li Peng authored 8 years ago

Add new 5x5 gaussian blur kernel for CV_8UC1 format,
it is 50% ~ 70% faster than current ocl kernel in the perf test.
Signed-off-by: Li Peng <peng.li@intel.com>

396921dd

02 Dec, 2016 1 commit

Image pyramids upsampling optimization · b69cdb24

Li Peng authored 8 years ago

Add new ocl kernel for image pyramids upsampling,
It is 35% faster than current OCL kernel in perf test.
Signed-off-by: Li Peng <peng.li@intel.com>

b69cdb24

30 Nov, 2016 1 commit

more optimization for warpAffine and warpPerspective · 2ca5a7e8

Li Peng authored 8 years ago

Add new OpenCL kernels for bicubic interploation, it is 20% faster
than current warp image kernel with bicubic interploation.
Signed-off-by: Li Peng <peng.li@intel.com>

2ca5a7e8

29 Nov, 2016 1 commit

optimization for warpAffine and warpPerspective · b72d1967

Li Peng authored 8 years ago

Add new ocl kernels for warpAffine and warpPerspective,
The average performance improvemnt is about 30%. The new
ocl kernels require CV_8UC1 format and support nearest
neighbor and bilinear interpolation.
Signed-off-by: Li Peng <peng.li@intel.com>

b72d1967

23 Nov, 2016 1 commit
- fixed wrong equivalence in YUV conversion (#7481) · 7db43f9f
  Rostislav Vasilikhin authored 8 years ago
```
* fixed wrong equivalence in YUV conversion

* fixed channel order from YVU to YUV
```
  7db43f9f
17 Nov, 2016 1 commit

laplacian ocl kernel optimization · 6cb73356

Li Peng authored 8 years ago

This ocl kernel is 46%~171% faster than current laplacian 3x3
ocl kernel in the perf test, with image format "CV_8UC1".
Signed-off-by: Li Peng <peng.li@intel.com>

6cb73356

14 Nov, 2016 1 commit

sobel and scharr ocl kernel optimization · 8d4a7d3d

Li Peng authored 8 years ago

It improves 108%~230% performance in the perf test
with image format "CV_8UC1" and kernel size 3.
Signed-off-by: Li Peng <peng.li@intel.com>

8d4a7d3d

08 Nov, 2016 1 commit

gaussian blur ocl kernel optimization · 8f63f51e

Li Peng authored 8 years ago

This ocl kernel is for 3x3 kernel size and CV_8UC1 format
It is 115% ~ 300% faster than current ocl path in perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture*
Signed-off-by: Li Peng <peng.li@intel.com>

8f63f51e

07 Nov, 2016 1 commit
- Fixed several OpenCL compiler warnings · 3e28d517
  mshabunin authored 8 years ago
  
  3e28d517
04 Nov, 2016 2 commits

morph ocl kernel for erode and dilate filter · 35198b84

Li Peng authored 8 years ago

This kernel is for CV_8UC1 format and 3x3 kernel size,
It is about 33% ~ 55% faster than current ocl kernel with below perf test

python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture*
python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture*

Also add accuracy test cases for this kernel, the test command is

./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3*
Signed-off-by: Li Peng <peng.li@intel.com>

35198b84

Fix the OpenCL portion to match the c++ code. · 17df65e6
Tetragramm authored 8 years ago
```
Fix an undiscovered bug in the c++ code.
```
17df65e6

26 Oct, 2016 1 commit

ocl kernel performance optimization for box filter · 3607da9f

Li Peng authored 8 years ago

The optimization is for CV_8UC1 format and 3x3 box filter,
it is 15%~87% faster than current ocl kernel with below perf test

./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture*

Also add test cases for this ocl kernel.
Signed-off-by: Li Peng <peng.li@intel.com>

3607da9f

17 Oct, 2016 1 commit
- Fix the problem: filterSmall.cl report error with double · ef47bcc8
  LukeZhu authored 8 years ago
  
  ef47bcc8
09 Aug, 2016 1 commit

ocl: fix Canny for Intel devices · b8e08d5d

Alexander Alekhin authored 8 years ago

There is an issue with processing of abs(short) function for
negative argument.

Affected OpenCL devices:
- iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 )
- CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))

b8e08d5d

24 Apr, 2016 1 commit

Add OpenCL support to linearPolar & logPolar · db9f6117

ohnozzy authored 8 years ago

Add OpenCL  support to linearPolar & logPolar.
The OpenCL code use float instead of double, so that it does not require
cl_khr_fp64 extension, with slight precision lost.

Add explicit conversion

Add explicit conversion from double to float to eliminate warning during
compilation.

db9f6117

15 Mar, 2016 1 commit

fix potential race condition in canny.cl. · 0b08d255

Zhigang Gong authored 9 years ago

See the below code snippet:

while(l_counter != 0)
{
    int mod = l_counter % LOCAL_TOTAL;
    int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);

    for (int i = 0; i < pix_per_thr; ++i)
    {
        int index = atomic_dec(&l_counter) - 1;
        ....
    }
    ....
    barrier(CLK_LOCAL_MEM_FENCE);
}

If we don't put a barrier before the for loop, then there is a possiblity
that some work item enter this loop but the others are not, the the l_counter
will be reduced in the for loop and may be changed to zero, and the other
work items may can't enter the while loop. If this happens, it breaks the
barrier's rule which requires all the work items reach the same barrier.
And it may hang the GPU depends on the implementation of opencl platform.

This issue is raised at:
https://github.com/Itseez/opencv/issues/5175Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>

0b08d255

26 May, 2015 2 commits

Fixed the race condition between inc and dec on the l_counter. · 0f7de40e
Zhigang Gong authored 9 years ago
```
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
```
0f7de40e

Avoid negative index for a local buffer in Canny.cl. · 3c852009

Zhigang Gong authored 9 years ago

int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0);
The pix_per_thr * LOCAL_TOTAL may be larger than l_counter.
Thus the index of l_stack may be negative which may cause serious
problems. Let's skip the loop when we get negative index and we need
to add back the lcounter to keep its balance and avoid potential
negative counter.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>

3c852009

22 Apr, 2015 1 commit
- fix gftt opencv kernel when using mask · 1ea41e72
  Pavel Rojtberg authored 9 years ago
  
  1ea41e72
26 Nov, 2014 1 commit

Optimize pyrUp_unrolled() by mad function. · 6e705055

Yan Wang authored 10 years ago

It could improve performance when image size is large.
E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18

6e705055

07 Nov, 2014 1 commit
- Correctly unrolled some cycles · 7c870014
  Alexander Karsakov authored 10 years ago
  
  7c870014
06 Nov, 2014 1 commit
- Minor optimization for ocl_canny · 0ec0aeb7
  Alexander Karsakov authored 10 years ago
  
  0ec0aeb7
05 Nov, 2014 1 commit
- Fix OpenCL version of HoughLinesP function · 957e5ef8
  vbystricky authored 10 years ago
  
  957e5ef8
28 Oct, 2014 1 commit
- Added optimized loading to YUV2RGB_422 kernel · 643c906f
  Alexander Karsakov authored 10 years ago
  
  643c906f
27 Oct, 2014 1 commit
- Added loading 4 pixels in line instead of 2 to RGB[A] -> YUV(420) kernel · 1466621f
  Alexander Karsakov authored 10 years ago
  
  1466621f
21 Oct, 2014 5 commits
- Used direct float calculations · 60367907
  Alexander Karsakov authored 10 years ago
  
  60367907
- Added OCL code for YUV422 -> RGB[A]|BGR[A] color conversion · 5aa9ac9a
  Alexander Karsakov authored 10 years ago
  
  5aa9ac9a
- Added OCL code for RGB[A]|BGR[A] -> YUV_[YV12|IYUV] color conversion · c8707b89
  Alexander Karsakov authored 10 years ago
  
  c8707b89
- Added OCL code for YUV2BGR_YV12 and YUV2BGR_IYUV color conversions · 1cc17a71
  Alexander Karsakov authored 10 years ago
  
  1cc17a71
- Added support for YUV2RGB[A]_NV21 and YUV2BGR[A]_NV21 conversion · 85b60ee3
  Alexander Karsakov authored 10 years ago
  
  85b60ee3