- 20 Dec, 2017 1 commit
-
-
Alexander Alekhin authored
-
- 21 Sep, 2017 1 commit
-
-
Rostislav Vasilikhin authored
* lab_tetra squashed * initial version is almost written * unfinished work * compilation fixed, to be debugged * Lab test removed * more fixes * Luv2RGBinteger: channels order fixed * Lab structs removed * good trilinear interpolation added * several fixes * removed Luv2RGB interpolations, XYZ tables; 8-cell LUT added * no_interpolate made 8-cell * interpolations rewritten to 8-cell, minor fixes * packed interpolation added for RGB2Luv * tetra implemented * removing unnecessary code * LUT building merged * changes ported to color.cpp * minor fixes; try to suppress warnings * fixed v range of Luv * fixed incorrect src channel number * minor fixes * preliminary version of Luv2RGBinteger is done * Luv2RGB_b is in progress * XYZ color constants converted to softfloat * Luv test: precision fixed * Luv bit-exactness test added * warnings fixed * compilation fixed, error message fixed * Luv check is limited to [0-2,0-2,0-2] by XYZ * L->Y generation moved to LUT * LUTs added for up and vp of Luv2RGB_b * still works * fixed-point is done, works at maxerr 2 * vectorized code is done, 2x slower than original * perf improved by 10% * extra comments removed * code moved to color.cpp * test_lab.cpp updated * minor refactoring * test added for Luv2RGB * OCL Luv2RGB_b: XYZ are limited to [0, 2]; docs updated * Luv2RGB_b rewritten to universal intrinsics * test_lab.cpp moved to luv_tetra branch
-
- 07 Sep, 2017 1 commit
-
-
Alexander Alekhin authored
-
- 29 Aug, 2017 1 commit
-
-
Alexander Alekhin authored
-
- 16 Jul, 2017 1 commit
-
-
Rostislav Vasilikhin authored
RGB2Lab_f added, bugs fixed, moved to float several bugs fixed LUT fixed, no switch in tetraInterpolate() temporary code; to be removed and rewritten before refactoring extra interpolations removed, some things to do left added Lab2RGB_b +XYZ version, etc. basic version is done, to be sped up tetra refactored interpolations: LUT for weights, refactor., etc. address arithm optimized initial version of vectorized code added (not compiling now) compilation fixed, now segfaults a lot of fixes, vectorization temp. disabled fixed trilinear shift size, max error dropped from 19 to 10 fixed several bugs (255 vs 256, signed vs unsigned, bIdx) minor changes packed: address arithmetics fixed shorter code experiments with pure integer calculations Lab2RGB max error decreased to 2; need to clean the code ready for vectorization; need cleaning vectorized, to be debugged precision fixed, max error is 2 Lab->XYZ shortened minor fixes Lab2RGB_f version fixed, to be completely rewritten using _b code RGB2Lab_f vectorized minors moved to separate file refactored Lab2RGB to float and int versions minor fix Lab2RGB_f vectorized minor refactoring Lab2RGBint refactored: process methods, vectorize by 4 pix Lab2RGB_f int version is done cleanup extra code code copied to color.cpp fixed blue idx bug optimizations enabled when testing; mulFracConst introduced divConst -> mulFracConst calc min time in perf instead of avg minors process() slightly sped up Lab2RGB_f: disabled int version reinterpret added, minor fixes in names some warnings fixed changes transferred to color.cpp RGB2Lab_f code (and trilinear interpolation code) moved to rgb2lab_faster whitespace shift negative fixed more warnings fixed "constant condition" warnings fixed, little speed up minor changes test_photo decolor fixed changes copied to test_lab.cpp idx bounds checking in LUT init several fixes WIP: softfloat almost integrated test_lab partially rewritten to SoftFloat color.cpp rewritten to SoftFloat test_lab.cpp: accuracy code added several fixes RGB2Lab_b testing fixed splineBuild() rewritten to SoftFloat accuracy control improved rounding fixed Luv <=> RGB: rewritten to SoftFloat OCL cvtColor Lab and Lut rewritten to SoftFloat minor fixes refactored to new SoftFloat interface round() -> cvRound, etc. fixed OCL tests softfloat.cpp: internal functions made static, unused ones removed meaningful constants extra lines removed unused function removed unfinished work it works, need to fix TODOs refactoring; more calls rewritten mulFracConst removed constants made bit exact; minors changes moved to color.cpp fixed 1 bug and 4 warnings OCL: fixed constants pow(x, _1_3f) replaced by cubeRoot(x) fixed compilation on MSVC32 magic constants explained file with internal accuracy&speed tests moved to lab_tetra branch
-
- 05 Jul, 2017 2 commits
-
-
Rostislav Vasilikhin authored
-
Rostislav Vasilikhin authored
-
- 05 Apr, 2017 1 commit
-
-
Maksim Shabunin authored
-
- 03 Mar, 2017 1 commit
-
-
Alexander Alekhin authored
-
- 13 Jan, 2017 1 commit
-
-
mshabunin authored
-
- 06 Dec, 2016 1 commit
-
-
Li Peng authored
Add new 5x5 gaussian blur kernel for CV_8UC1 format, it is 50% ~ 70% faster than current ocl kernel in the perf test. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 02 Dec, 2016 1 commit
-
-
Li Peng authored
Add new ocl kernel for image pyramids upsampling, It is 35% faster than current OCL kernel in perf test. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 30 Nov, 2016 1 commit
-
-
Li Peng authored
Add new OpenCL kernels for bicubic interploation, it is 20% faster than current warp image kernel with bicubic interploation. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 29 Nov, 2016 1 commit
-
-
Li Peng authored
Add new ocl kernels for warpAffine and warpPerspective, The average performance improvemnt is about 30%. The new ocl kernels require CV_8UC1 format and support nearest neighbor and bilinear interpolation. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 23 Nov, 2016 1 commit
-
-
Rostislav Vasilikhin authored
* fixed wrong equivalence in YUV conversion * fixed channel order from YVU to YUV
-
- 17 Nov, 2016 1 commit
-
-
Li Peng authored
This ocl kernel is 46%~171% faster than current laplacian 3x3 ocl kernel in the perf test, with image format "CV_8UC1". Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 14 Nov, 2016 1 commit
-
-
Li Peng authored
It improves 108%~230% performance in the perf test with image format "CV_8UC1" and kernel size 3. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 08 Nov, 2016 1 commit
-
-
Li Peng authored
This ocl kernel is for 3x3 kernel size and CV_8UC1 format It is 115% ~ 300% faster than current ocl path in perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_GaussianBlurFixture* Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 07 Nov, 2016 1 commit
-
-
mshabunin authored
-
- 04 Nov, 2016 2 commits
-
-
Li Peng authored
This kernel is for CV_8UC1 format and 3x3 kernel size, It is about 33% ~ 55% faster than current ocl kernel with below perf test python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_ErodeFixture* python ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_DilateFixture* Also add accuracy test cases for this kernel, the test command is ./bin/opencv_test_imgproc --gtest_filter=OCL_Filter/MorphFilter3x3* Signed-off-by:
Li Peng <peng.li@intel.com>
-
Tetragramm authored
Fix an undiscovered bug in the c++ code.
-
- 26 Oct, 2016 1 commit
-
-
Li Peng authored
The optimization is for CV_8UC1 format and 3x3 box filter, it is 15%~87% faster than current ocl kernel with below perf test ./modules/ts/misc/run.py -t imgproc --gtest_filter=OCL_BlurFixture* Also add test cases for this ocl kernel. Signed-off-by:
Li Peng <peng.li@intel.com>
-
- 17 Oct, 2016 1 commit
-
-
LukeZhu authored
-
- 09 Aug, 2016 1 commit
-
-
Alexander Alekhin authored
There is an issue with processing of abs(short) function for negative argument. Affected OpenCL devices: - iGPU: Intel(R) HD Graphics 520 (OpenCL 2.0 ) - CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (OpenCL 2.0 (Build 10094))
-
- 24 Apr, 2016 1 commit
-
-
ohnozzy authored
Add OpenCL support to linearPolar & logPolar. The OpenCL code use float instead of double, so that it does not require cl_khr_fp64 extension, with slight precision lost. Add explicit conversion Add explicit conversion from double to float to eliminate warning during compilation.
-
- 15 Mar, 2016 1 commit
-
-
Zhigang Gong authored
See the below code snippet: while(l_counter != 0) { int mod = l_counter % LOCAL_TOTAL; int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); for (int i = 0; i < pix_per_thr; ++i) { int index = atomic_dec(&l_counter) - 1; .... } .... barrier(CLK_LOCAL_MEM_FENCE); } If we don't put a barrier before the for loop, then there is a possiblity that some work item enter this loop but the others are not, the the l_counter will be reduced in the for loop and may be changed to zero, and the other work items may can't enter the while loop. If this happens, it breaks the barrier's rule which requires all the work items reach the same barrier. And it may hang the GPU depends on the implementation of opencl platform. This issue is raised at: https://github.com/Itseez/opencv/issues/5175Signed-off-by:
Zhigang Gong <zhigang.gong@linux.intel.com>
-
- 26 May, 2015 2 commits
-
-
Zhigang Gong authored
Signed-off-by:
Zhigang Gong <zhigang.gong@intel.com>
-
Zhigang Gong authored
int pix_per_thr = l_counter / LOCAL_TOTAL + ((lid < mod) ? 1 : 0); The pix_per_thr * LOCAL_TOTAL may be larger than l_counter. Thus the index of l_stack may be negative which may cause serious problems. Let's skip the loop when we get negative index and we need to add back the lcounter to keep its balance and avoid potential negative counter. Signed-off-by:
Zhigang Gong <zhigang.gong@intel.com>
-
- 22 Apr, 2015 1 commit
-
-
Pavel Rojtberg authored
-
- 26 Nov, 2014 1 commit
-
-
Yan Wang authored
It could improve performance when image size is large. E.g. OCL_PyrUpFixture_PyrUp.PyrUp/18
-
- 07 Nov, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 06 Nov, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 05 Nov, 2014 1 commit
-
-
vbystricky authored
-
- 28 Oct, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 27 Oct, 2014 1 commit
-
-
Alexander Karsakov authored
-
- 21 Oct, 2014 5 commits
-
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-
Alexander Karsakov authored
-