Commits · dc3406eed9f04f773c8e85af4f814086a3b2200c · submodule / opencv

17 Sep, 2018 1 commit
- Support GpuMat in copyTo() functions · ecc9bd09
  Hamdi Sahloul authored 6 years ago
  
  ecc9bd09
13 Sep, 2018 1 commit
- Add semicolons after `CV_INSTRUMENT` macros · 5d54def2
  Hamdi Sahloul authored 6 years ago
  
  5d54def2
07 Sep, 2018 1 commit
- Utilize CV_UNUSED macro · a39e0daa
  Hamdi Sahloul authored 6 years ago
  
  a39e0daa
30 Jul, 2018 1 commit
- core(ocl): do not split refcount operations / compare · e90e398e
  Alexander Alekhin authored 6 years ago
```
- check result from CV_XADD() directly
- decrease urefcount after unmap() call only
```
  e90e398e
04 Jul, 2018 1 commit
- opencv: Use cv::AutoBuffer<>::data() · b09a4a98
  Alexander Alekhin authored 6 years ago
  
  b09a4a98
14 May, 2018 1 commit

handle huge matrices correctly (#11505) · e0dbe5cf

* make sure that the matrix with more than INT_MAX elements is marked as non-continuous, and thus all the pixel-wise functions process it correctly (i.e. row-by-row, not as a single row, where integer overflow may occur when computing the total number of elements)

e0dbe5cf

24 Apr, 2018 1 commit
- build: unreachable code after CV_Error() (part 2) · 6b581c4e
  Alexander Alekhin authored 6 years ago
  
  6b581c4e
20 Apr, 2018 1 commit
- ocl: CL_MEM_USE_HOST_PTR workaround test · d76b41b5
  Alexander Alekhin authored 6 years ago
  
  d76b41b5
09 Feb, 2018 1 commit
- ocl: allow recursive UMatData lock() calls with the same objects · 42e1fe30
  Alexander Alekhin authored 7 years ago
```
OpenCLAllocator::copy() may call upload()/download() methods
```
  42e1fe30
16 Jan, 2018 1 commit

core(ocl): fix deadlock in UMatDataAutoLock · cec70052

Alexander Alekhin authored 7 years ago

UMatData locks are not mapped on real locks (they are mapped to some "pre-initialized" pool).

Concurrent execution of these statements may lead to deadlock:
- a.copyTo(b) from thread 1
- c.copyTo(d) from thread 2
where:
- 'a' and 'd' are mapped to single lock "A".
- 'b' and 'c' are mapped to single lock "B".

Workaround is to process locks with strict order.

cec70052

11 Dec, 2017 1 commit
- Build for embedded systems · 7349b8f5
  Maksim Shabunin authored 7 years ago
  
  7349b8f5
28 Nov, 2017 1 commit

ocl: avoid unnecessary loading/initializing OpenCL subsystem · 0ed3209b

Alexander Alekhin authored 7 years ago

If there are no OpenCL/UMat methods calls from application.

OpenCL subsystem is initialized:
- haveOpenCL() is called from application
- useOpenCL() is called from application
- access to OpenCL allocator: UMat is created (empty UMat is ignored) or UMat <-> Mat conversions are called

Don't call OpenCL functions if OPENCV_OPENCL_RUNTIME=disabled
(independent from OpenCL linkage type)

0ed3209b

02 Oct, 2017 1 commit

Merge pull request #9114 from pengli:dnn_rebase · e340ff9c

pengli authored 7 years ago

add libdnn acceleration to dnn module  (#9114)

* import libdnn code
Signed-off-by: Li Peng <peng.li@intel.com>

* add convolution layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* add pooling layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* add softmax layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* add lrn layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* add innerproduct layer ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* add HAVE_OPENCL macro
Signed-off-by: Li Peng <peng.li@intel.com>

* fix for convolution ocl
Signed-off-by: Li Peng <peng.li@intel.com>

* enable getUMat() for multi-dimension Mat
Signed-off-by: Li Peng <peng.li@intel.com>

* use getUMat for ocl acceleration
Signed-off-by: Li Peng <peng.li@intel.com>

* use CV_OCL_RUN macro
Signed-off-by: Li Peng <peng.li@intel.com>

* set OPENCL target when it is available

and disable fuseLayer for OCL target for the time being
Signed-off-by: Li Peng <peng.li@intel.com>

* fix innerproduct accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>

* remove trailing space
Signed-off-by: Li Peng <peng.li@intel.com>

* Fixed tensorflow demo bug.

Root cause is that tensorflow has different algorithm with libdnn
to calculate convolution output dimension.

libdnn don't calculate output dimension anymore and just use one
passed in by config.

* split gemm ocl file

split it into gemm_buffer.cl and gemm_image.cl
Signed-off-by: Li Peng <peng.li@intel.com>

* Fix compile failure
Signed-off-by: Li Peng <peng.li@intel.com>

* check env flag for auto tuning
Signed-off-by: Li Peng <peng.li@intel.com>

* switch to new ocl kernels for softmax layer
Signed-off-by: Li Peng <peng.li@intel.com>

* update softmax layer

on some platform subgroup extension may not work well,
fallback to non subgroup ocl acceleration.
Signed-off-by: Li Peng <peng.li@intel.com>

* fallback to cpu path for fc layer with multi output
Signed-off-by: Li Peng <peng.li@intel.com>

* update output message
Signed-off-by: Li Peng <peng.li@intel.com>

* update fully connected layer

fallback to gemm API if libdnn return false
Signed-off-by: Li Peng <peng.li@intel.com>

* Add ReLU OCL implementation

* disable layer fusion for now
Signed-off-by: Li Peng <peng.li@intel.com>

* Add OCL implementation for concat layer
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>

* libdnn: update license and copyrights

Also refine libdnn coding style
Signed-off-by: Wu Zhiwen <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>

* DNN: Don't link OpenCL library explicitly

* DNN: Make default preferableTarget to DNN_TARGET_CPU

User should set it to DNN_TARGET_OPENCL explicitly if want to
use OpenCL acceleration.

Also don't fusion when using DNN_TARGET_OPENCL

* DNN: refine coding style

* Add getOpenCLErrorString

* DNN: Use int32_t/uint32_t instread of alias

* Use namespace ocl4dnn to include libdnn things

* remove extra copyTo in softmax ocl path
Signed-off-by: Li Peng <peng.li@intel.com>

* update ReLU layer ocl path
Signed-off-by: Li Peng <peng.li@intel.com>

* Add prefer target property for layer class

It is used to indicate the target for layer forwarding,
either the default CPU target or OCL target.
Signed-off-by: Li Peng <peng.li@intel.com>

* Add cl_event based timer for cv::ocl

* Rename libdnn to ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>

* use UMat for ocl4dnn internal buffer

Remove allocateMemory which use clCreateBuffer directly
Signed-off-by: Li Peng <peng.li@intel.com>
Signed-off-by: wzw <zhiwen.wu@intel.com>

* enable buffer gemm in ocl4dnn innerproduct
Signed-off-by: Li Peng <peng.li@intel.com>

* replace int_tp globally for ocl4dnn kernels.
Signed-off-by: wzw <zhiwen.wu@intel.com>
Signed-off-by: Li Peng <peng.li@intel.com>

* create UMat for layer params
Signed-off-by: Li Peng <peng.li@intel.com>

* update sign ocl kernel
Signed-off-by: Li Peng <peng.li@intel.com>

* update image based gemm of inner product layer
Signed-off-by: Li Peng <peng.li@intel.com>

* remove buffer gemm of inner product layer

call cv::gemm API instead
Signed-off-by: Li Peng <peng.li@intel.com>

* change ocl4dnn forward parameter to UMat
Signed-off-by: Li Peng <peng.li@intel.com>

* Refine auto-tuning mechanism.

- Use OPENCV_OCL4DNN_KERNEL_CONFIG_PATH to set cache directory
  for fine-tuned kernel configuration.
  e.g. export OPENCV_OCL4DNN_KERNEL_CONFIG_PATH=/home/tmp,
  the cache directory will be /home/tmp/spatialkernels/ on Linux.

- Define environment OPENCV_OCL4DNN_ENABLE_AUTO_TUNING to enable
  auto-tuning.

- OPENCV_OPENCL_ENABLE_PROFILING is only used to enable profiling
  for OpenCL command queue. This fix basic kernel get wrong running
  time, i.e. 0ms.

- If creating cache directory failed, disable auto-tuning.

* Detect and create cache dir on windows
Signed-off-by: Li Peng <peng.li@intel.com>

* Refine gemm like convolution kernel.
Signed-off-by: Li Peng <peng.li@intel.com>

* Fix redundant swizzleWeights calling when use cached kernel config.

* Fix "out of resource" bug when auto-tuning too many kernels.

* replace cl_mem with UMat in ocl4dnnConvSpatial class

* OCL4DNN: reduce the tuning kernel candidate.

This patch could reduce 75% of the tuning candidates with less
than 2% performance impact for the final result.
Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>

* replace cl_mem with umat in ocl4dnn convolution
Signed-off-by: Li Peng <peng.li@intel.com>

* remove weight_image_ of ocl4dnn inner product

Actually it is unused in the computation
Signed-off-by: Li Peng <peng.li@intel.com>

* Various fixes for ocl4dnn

1. OCL_PERFORMANCE_CHECK(ocl::Device::getDefault().isIntel())
2. Ptr<OCL4DNNInnerProduct<float> > innerProductOp
3. Code comments cleanup
4. ignore check on OCL cpu device
Signed-off-by: Li Peng <peng.li@intel.com>

* add build option for log softmax
Signed-off-by: Li Peng <peng.li@intel.com>

* remove unused ocl kernels in ocl4dnn
Signed-off-by: Li Peng <peng.li@intel.com>

* replace ocl4dnnSet with opencv setTo
Signed-off-by: Li Peng <peng.li@intel.com>

* replace ALIGN with cv::alignSize
Signed-off-by: Li Peng <peng.li@intel.com>

* check kernel build options
Signed-off-by: Li Peng <peng.li@intel.com>

* Handle program compilation fail properly.

* Use std::numeric_limits<float>::infinity() for large float number

* check ocl4dnn kernel compilation result
Signed-off-by: Li Peng <peng.li@intel.com>

* remove unused ctx_id
Signed-off-by: Li Peng <peng.li@intel.com>

* change clEnqueueNDRangeKernel to kernel.run()
Signed-off-by: Li Peng <peng.li@intel.com>

* change cl_mem to UMat in image based gemm
Signed-off-by: Li Peng <peng.li@intel.com>

* check intel subgroup support for lrn and pooling layer
Signed-off-by: Li Peng <peng.li@intel.com>

* Fix convolution bug if group is greater than 1
Signed-off-by: Li Peng <peng.li@intel.com>

* Set default layer preferableTarget to be DNN_TARGET_CPU
Signed-off-by: Li Peng <peng.li@intel.com>

* Add ocl perf test for convolution
Signed-off-by: Li Peng <peng.li@intel.com>

* Add more ocl accuracy test
Signed-off-by: Li Peng <peng.li@intel.com>

* replace cl_image with ocl::Image2D
Signed-off-by: Li Peng <peng.li@intel.com>

* Fix build failure in elementwise layer
Signed-off-by: Li Peng <peng.li@intel.com>

* use getUMat() to get blob data
Signed-off-by: Li Peng <peng.li@intel.com>

* replace cl_mem handle with ocl::KernelArg
Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(build): don't use C++11, OPENCL_LIBRARIES fix

* dnn(ocl4dnn): remove unused OpenCL kernels

* dnn(ocl4dnn): extract OpenCL code into .cl files

* dnn(ocl4dnn): refine auto-tuning

Defaultly disable auto-tuning, set OPENCV_OCL4DNN_ENABLE_AUTO_TUNING
environment variable to enable it.

Use a set of pre-tuned configs as default config if auto-tuning is disabled.
These configs are tuned for Intel GPU with 48/72 EUs, and for googlenet,
AlexNet, ResNet-50

If default config is not suitable, use the first available kernel config
from the candidates. Candidate priority from high to low is gemm like kernel,
IDLF kernel, basick kernel.

* dnn(ocl4dnn): pooling doesn't use OpenCL subgroups

* dnn(ocl4dnn): fix perf test

OpenCV has default 3sec time limit for each performance test.
Warmup OpenCL backend outside of perf measurement loop.

* use ocl::KernelArg as much as possible
Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): fix bias bug for gemm like kernel

* dnn(ocl4dnn): wrap cl_mem into UMat
Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): Refine signature of kernel config

- Use more readable string as signture of kernel config
- Don't count device name and vendor in signature string
- Default kernel configurations are tuned for Intel GPU with
  24/48/72 EUs, and for googlenet, AlexNet, ResNet-50 net model.

* dnn(ocl4dnn): swap width/height in configuration

* dnn(ocl4dnn): enable configs for Intel OpenCL runtime only

* core: make configuration helper functions accessible from non-core modules

* dnn(ocl4dnn): update kernel auto-tuning behavior

Avoid unwanted creation of directories

* dnn(ocl4dnn): simplify kernel to workaround OpenCL compiler crash

* dnn(ocl4dnn): remove redundant code

* dnn(ocl4dnn): Add more clear message for simd size dismatch.

* dnn(ocl4dnn): add const to const argument
Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): force compiler use a specific SIMD size for IDLF kernel

* dnn(ocl4dnn): drop unused tuneLocalSize()

* dnn(ocl4dnn): specify OpenCL queue for Timer and convolve() method

* dnn(ocl4dnn): sanitize file names used for cache

* dnn(perf): enable Network tests with OpenCL

* dnn(ocl4dnn/conv): drop computeGlobalSize()

* dnn(ocl4dnn/conv): drop unused fields

* dnn(ocl4dnn/conv): simplify ctor

* dnn(ocl4dnn/conv): refactor kernelConfig localSize=NULL

* dnn(ocl4dnn/conv): drop unsupported double / untested half types

* dnn(ocl4dnn/conv): drop unused variable

* dnn(ocl4dnn/conv): alignSize/divUp

* dnn(ocl4dnn/conv): use enum values

* dnn(ocl4dnn): drop unused innerproduct variable
Signed-off-by: Li Peng <peng.li@intel.com>

* dnn(ocl4dnn): add an generic function to check cl option support

* dnn(ocl4dnn): run softmax subgroup version kernel first
Signed-off-by: Li Peng <peng.li@intel.com>

e340ff9c

05 Sep, 2017 1 commit
- Fix https://github.com/opencv/opencv/issues/8693 · 8b094755
  Dmitry Kurtaev authored 7 years ago
  
  8b094755
25 Jul, 2017 1 commit
- core: fix Mat/UMat cleanup on exceptions in deallocate() · 7f3eea63
  Alexander Alekhin authored 7 years ago
  
  7f3eea63
27 Jun, 2017 1 commit
- Fixing some static analysis issues · 32d4af36
  Maksim Shabunin authored 7 years ago
  
  32d4af36
23 May, 2017 1 commit
- Fixed several issues found by static analysis in core module · b04ed595
  Maksim Shabunin authored 7 years ago
  
  b04ed595
22 Feb, 2017 1 commit
- core: fix adjustROI behavior on indexes overflow · 14451f3f
  Vladislav Sovrasov authored 8 years ago
  
  14451f3f
14 Feb, 2017 1 commit

ocl: validate arguments in KernelArgs constructor · 4c7aa864

Alexander Alekhin authored 8 years ago

- don't use undefined flag=0. It should be CONSTANT instead.
- don't allow 'UMat* m=NULL' argument (except LOCAL/CONSTANT flags).
  This case is not handled well to provide NULL __global pointers.
  It is better to use '-D' macro defines instead (at least for performance)

4c7aa864

15 Dec, 2016 1 commit

Added N-dim submat selection with vectors · eb04b2bf

Addison Elliott authored 8 years ago

Currently, to select a submatrix of a N-dimensional matrix, it requires
two lines of code while only one line of code is required if using a 2D
array.

I added functionality to be able to select an N-dim submatrix using a
vector list instead of a Range pointer. This allows initializer lists to
be used for a one-line selection.

eb04b2bf

14 Dec, 2016 1 commit

Added new overloaded functions for Mat and UMat that accepts std::vector<int>… · fa6692af

Addison Elliott authored 8 years ago

Added new overloaded functions for Mat and UMat that accepts std::vector<int> instead of int * for the sizes on a N-dimensional array.

This allows for an N-dimensional array to be setup in one line instead of two when using C++11 initializer lists. cv::Mat(3, {zDim, yDim, xDim}, ...) can be used instead of having to create an int pointer to hold the size array.

fa6692af

19 Aug, 2016 1 commit
- Instrumentation for OpenCV API regions and IPP functions; · 30a6cee2
  Pavel Vlasov authored 8 years ago
  
  30a6cee2
26 Jan, 2016 1 commit
- core: preserve sizes values (fixes #5991) · 2978a16c
  Alexander Alekhin authored 9 years ago
```
_sizes can point to internal structure which is destroyed
by release() call
```
  2978a16c
08 Dec, 2015 1 commit
- Clarified default allocator interface. · 4f373a42
  Dan Moodie authored 9 years ago
```
Conflicts:
	modules/core/src/matrix.cpp
```
  4f373a42
20 Oct, 2015 1 commit
- Visual Studio 2015 warning and test fixes · 6e9d0d9a
  Maksim Shabunin authored 9 years ago
  
  6e9d0d9a
09 Sep, 2015 2 commits
- ocl: workaround for getUMat() · ad70ab40
  Alexander Alekhin authored 9 years ago
  
  ad70ab40
- man/unmap, preventing getMat/getUMat from temp object, fix thread-unsafe code in `UMat::getMat()` · cea2dafa
  Andrey Pavlenko authored 9 years ago
  
  cea2dafa
22 Aug, 2015 1 commit
- Fix issue #5234 (UMat::convertTo when noScale) · 0629add3
  Philippe FOUBERT authored 9 years ago
  
  0629add3
21 Aug, 2015 1 commit
- Changed behaviour of Mat/UMat::reshape() to accept n-dim shapes · 85cc11e3
  Vitaliy Lyudvichenko authored 9 years ago
  
  85cc11e3
28 Jul, 2015 2 commits
- ocl: add map tests · cd5c7069
  Alexander Alekhin authored 9 years ago
  
  cd5c7069
- fix OpenCV code (bug 4006: #4862) · b36f565d
  Alexander Alekhin authored 9 years ago
  
  b36f565d
23 Jul, 2015 1 commit
- It is unnecessary to use fma() if no scaling. · 132416eb
  Yan Wang authored 9 years ago
```
Signed-off-by: Yan Wang <yan.wang@linux.intel.com>
```
  132416eb
09 Jul, 2015 1 commit
- t-api: try to fix allocator fallback code paths · 88e66697
  Alexander Alekhin authored 9 years ago
```
issue: http://code.opencv.org/issues/4461
```
  88e66697
19 Jun, 2015 1 commit

OpenCV-OpenCL interop (PR #4072): · 217dd63e

Vladimir Dudnik authored 9 years ago

Commits:
added new function, cv::ocl::attachContext(String& platformName, void* platformID, void* context, void* deviceID) which allow to attach externally created OpenCL context to OpenCV.
add definitions of clRetainDevice, clRetainContext funcs
removed definitions for clRetainContext, clRetainDevice
fixed build issue under Linux
fixed uninitialized vars, replace dbgassert in error handling
remove function which is not ready yet
add new function, cv::ocl::convertFromBuffer(int rows, int cols, int type, void* cl_mem_obj, UMat& dst, UMatUsageFlags usageFlags = cv::USAGE_DEFAULT) which attaches user allocated OpenCL clBuffer to UMat
uncommented clGetMemObjectInfo definition (otherwise prevent opencv build)
fixed build issue on linux and android
add step parameter to cv::ocl::convertFromBuffer func
suppress compile-time warning
added sample opencl-opencv interoperability (showcase for cv::ocl::convertFromBuffer func)
CMakeLists.txt modified to not create sample build script if OpenCL SDK not found in system
fixed build issue (apple opencl include dir and spaces in CMake file)
added call to clRetainContext for attachContext func and call to clRetainMemObject for convertFromBuffer func
uncommented clRetainMemObject definition
added comments and cleanup
add local path to cmake modules search dirs (instead of replacing)
remove REQUIRED for find_package call (sample build together with opencv). need to try standalone sample build
opencl-interop sample moved to standalone build
set minimum version requirement for sample's cmake to 3.1
put cmake_minimum_required under condition, so do not check if samples not builded
remove code dups for setSize, updateContinuityFlag, and finalizeHdr
commented out cmake_minimum_required(VERSION 3.1)
add safety check for cmake version
add convertFromImage func and update opencl-interop sample
uncommented clGetImageInfo defs
uncommented clEnqueueCopyImageToBuffer defs
fixed clEnqueueCopyImageToBuffer defs
add doxygen comments
remove doxygen @fn tag
try to restart buildbot
add doxygen comments to directx interop funcs
remove internal header, use fwd declarations in affected compile units instead

217dd63e

23 Jan, 2015 1 commit
- ocl: OpenCL SVM support · 0a07d780
  Alexander Alekhin authored 10 years ago
  
  0a07d780
12 Jan, 2015 1 commit
- used popcnt · fc086973
  Ilya Lavrenov authored 10 years ago
  
  fc086973
05 Jan, 2015 1 commit
- Fix shadowed variable warning · 379de570
  Joe Howse authored 10 years ago
  
  379de570
29 Dec, 2014 1 commit
- optimization of UMat::setTo · 1af7d397
  Ilya Lavrenov authored 10 years ago
  
  1af7d397
15 Oct, 2014 1 commit

Implementation detector and selector for IPP and OpenCL; · 45958eaa

Pavel Vlasov authored 10 years ago

IPP can be switched on and off on runtime;

Optional implementation collector was added (switched off by default in CMake). Gathers data of implementation used in functions and report this info through performance TS;

TS modifications for implementations control;

45958eaa

13 Aug, 2014 1 commit

Several type of formal refactoring: · 8a4a1bb0

Adil Ibragimov authored 10 years ago

1. someMatrix.data -> someMatrix.prt()
2. someMatrix.data + someMatrix.step * lineIndex -> someMatrix.ptr( lineIndex )
3. (SomeType*) someMatrix.data -> someMatrix.ptr<SomeType>()
4. someMatrix.data -> !someMatrix.empty() ( or !someMatrix.data -> someMatrix.empty() ) in logical expressions

8a4a1bb0