- 25 Dec, 2014 1 commit
-
-
Vladislav Vinogradov authored
-
- 24 Dec, 2014 5 commits
-
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
- 23 Dec, 2014 10 commits
-
-
Vladislav Vinogradov authored
* improve `CUDA_TARGET_CPU_ARCH` cache initialization, allow to override initial value from calling script; * add `CUDA_TARGET_OS_VARIANT` option to select OS variant; * add `CUDA_TARGET_TRIPLET` option to select target triplet from `${CUDA_TOOLKIT_ROOT_DIR}/targets` folder; * remove `CUDA_TOOLKIT_TARGET_DIR` option, now it is calculated from `CUDA_TARGET_TRIPLET`, the old approach still can be used for compatibility; * for CUDA 6.5 and newer try to locate static libraries too, because in 6.5 toolkit for ARM cross compilation only static libraries are included.
-
Vladislav Vinogradov authored
-
Vladislav Vinogradov authored
move main CUDA group to modules/core/cuda.hpp
-
Vladislav Vinogradov authored
add a note to use new cudev module as a replacement
-
Vladislav Vinogradov authored
-
Vladislav Vinogradov authored
it is internal class, no need to export it
-
Vladislav Vinogradov authored
The deinitialization of BufferPool internal objects is controled by global object, but it depends on other global objects, which leads to errors caused by undefined deinitialization order of global objects. I merge global objects initialization into single class, which performs initialization and deinitialization in correct order.
-
Alexander Alekhin authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
- 22 Dec, 2014 17 commits
-
-
Maksim Shabunin authored
-
Maksim Shabunin authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vadim Pisarevsky authored
-
Vladislav Vinogradov authored
-
Vladislav Vinogradov authored
-
Vladislav Vinogradov authored
it is generated by CUDA headers and we can't fix it
-
- 21 Dec, 2014 1 commit
-
-
orestis authored
to compensate for neon ieee754 non-compliancy. Also changed the comparison between max valid and calculated distance to make the error message more accurate (in case curMaxDist == maxDist)
-
- 20 Dec, 2014 1 commit
-
-
orestis authored
Set it 1 instead of 0.001, as is already done in gaussianBlur3x3. That will allow integer destination matrices that are not exactly the same, but very close to the expected result, to pass the test.
-
- 19 Dec, 2014 5 commits
-
-
orestis authored
NEON speedup: 2.31x Auto-vect speedup: 2.26x Test kernel: [-0.9432, -1.1528, 0, 1.1528, 0.9432]
-
orestis authored
NEON speedup: 2.36x Auto-vect speedup: 2.36x Test kernel: [0.1, 0.2408, 0.3184, 0.2408, 0.1]
-
orestis authored
NEON speedup: 9.46x Auto-vect speedup: 1x Test kernel: [-0.9432, -1.1528, 0, 1.1528, 0.9432]
-
orestis authored
NEON speedup: 8.64x Auto-vect speedup: 1x Test kernel: [0.1, 0.2408, 0.3184, 0.2408, 0.1]
-
orestis authored
NEON speedup: 2.12x Auto-vect speedup: 1.01x Test kernel: [-2, 0, 2]
-