• Marat K's avatar
    Merge pull request #12249 from kopytjuk:feature/region-layer-batch-mode · 38f8fc6c
    Marat K authored
    Feature/region layer batch mode (#12249)
    
    * Add batch mode for Darknet networks.
    
    Swap variables in test_darknet.
    
    Adapt reorg layer to batch mode.
    
    Adapt region layer.
    
    Add OpenCL implementation.
    
    Remove trailing whitespace.
    
    Bugifx reorg opencl implementation.
    
    Fix bug in OpenCL reorg.
    
    Fix modulo bug.
    
    Fix bug.
    
    Reorg openCL.
    
    Restore reorg layer opencl code.
    
    OpenCl fix.
    
    Work on openCL reorg.
    
    Remove whitespace.
    
    Fix openCL region layer implementation.
    
    Fix bug.
    
    Fix softmax region opencl bug.
    
    Fix opencl bug.
    
    Fix openCL bug.
    
    Update aff_trans.cpp
    
    When the fullAffine parameter is set to false, the estimateRigidTransform function maybe return empty, then the _localAffineEstimate function will be called, but the bug in it will result in incorrect results.
    
    core(libva): support YV12 too
    
    Added to CPU path only.
    OpenCL code path still expects NV12 only (according to Intel OpenCL extension)
    
    cmake: allow to specify own libva paths
    
    via CMake:
    - `-DVA_LIBRARIES=/opt/intel/mediasdk/lib64/libva.so.2\;/opt/intel/mediasdk/lib64/libva-drm.so.2`
    
    android: NDK17 support
    
    tested with NDK 17b (17.1.4828580)
    
    Enable more deep learning tests using Intel's Inference Engine backend
    
    ts: don't pass NULL for std::string() constructor
    
    openvino: use 2018R3 defines
    
    experimental version++
    
    OpenCV version++
    
    OpenCV 3.4.3
    
    OpenCV version '-openvino'
    
    openvino: use 2018R3 defines
    
    Fixed windows build with InferenceEngine
    
    dnn: fix variance setting bug for PriorBoxLayer
    
    - The size of second channel should be size[2] of output tensor,
    - The Scalar should be {variance[0], variance[0], variance[0], variance[0]}
      for _variance.size() == 1 case.
    Signed-off-by: 's avatarWu Zhiwen <zhiwen.wu@intel.com>
    
    Fix lifetime of networks which are loaded from Model Optimizer IRs
    
    Adds a small note describing BUILD_opencv_world (#12332)
    
    * Added a mall note describing BUILD_opencv_world cmake option to the Installation in Windows tutorial.
    
    * Made slight changes in BUILD_opencv_world documentation.
    
    * Update windows_install.markdown
    
    improved grammar
    
    Update opengl_interop.cpp
    
    resolves #12307
    
    java: fix LIST_GET macro
    
    fix typo
    
    Added option to fail on missing testdata
    
    Fixed that object_detection.py does not work in python3.
    
    cleanup: IPP Async (IPP_A)
    
    except header file with conversion routines (will be removed in OpenCV 4.0)
    
    imgcodecs: add null pointer check
    
    Include preprocessing nodes to object detection TensorFlow networks (#12211)
    
    * Include preprocessing nodes to object detection TensorFlow networks
    
    * Enable more fusion
    
    * faster_rcnn_resnet50_coco_2018_01_28 test
    
    countNonZero function reworked to use wide universal intrinsics instead of SSE2 intrinsics
    
    resolve #5788
    
    imgcodecs(webp): multiple fixes
    
    - don't reallocate passed 'img' (test fixed - must use IMREAD_UNCHANGED / IMREAD_ANYCOLOR)
    - avoid memory DDOS
    - avoid reading of whole file during header processing
    - avoid data access after allocated buffer during header processing (missing checks)
    - use WebPFree() to free allocated buffers (libwebp >= 0.5.0)
    - drop unused & undefined `.close()` method
    - added checks for channels >= 5 in encoder
    
    ml: fix adjusting K in KNearest (#12358)
    
    dnn(perf): fix and merge Convolution tests
    
    - OpenCL tests didn't run any OpenCL kernels
    - use real configuration from existed models (the first 100 cases)
    - batch size = 1
    
    dnn(test): use dnnBackendsAndTargets() param generator
    
    Bit-exact resize reworked to use wide intrinsics (#12038)
    
    * Bit-exact resize reworked to use wide intrinsics
    
    * Reworked bit-exact resize row data loading
    
    * Added bit-exact resize row data loaders for SIMD256 and SIMD512
    
    * Fixed type punned pointer dereferencing warning
    
    * Reworked loading of source data for SIMD256 and SIMD512 bit-exact resize
    
    Bit-exact GaussianBlur reworked to use wide intrinsics (#12073)
    
    * Bit-exact GaussianBlur reworked to use wide intrinsics
    
    * Added v_mul_hi universal intrinsic
    
    * Removed custom SSE2 branch from bit-exact GaussianBlur
    
    * Removed loop unrolling for gaussianBlur horizontal smoothing
    
    doc: fix English gramma in tutorial out-of-focus-deblur filter (#12214)
    
    * doc: fix English gramma in tutorial out-of-focus-deblur filter
    
    * Update out_of_focus_deblur_filter.markdown
    
    slightly modified one sentence
    
    doc: add new tutorial motion deblur filter (#12215)
    
    * doc: add new tutorial motion deblur filter
    
    * Update motion_deblur_filter.markdown
    
    a few minor changes
    
    Replace Slice layer to Crop in Faster-RCNN networks from Caffe
    
    js: use generated list of OpenCV headers
    
    - replaces hand-written list
    
    imgcodecs(webp): use safe cast to size_t on Win32
    
    * Put Version status back to -dev.
    
    follow the common codestyle
    
    Exclude some target engines.
    
    Refactor formulas.
    
    Refactor code.
    
    * Remove unused variable.
    
    * Remove inference engine check for yolov2.
    
    * Alter darknet batch tests to test with two different images.
    
    * Add yolov3 second image GT.
    
    * Fix bug.
    
    * Fix bug.
    
    * Add second test.
    
    * Remove comment.
    
    * Add NMS on network level.
    
    * Add helper files to dev.
    
    * syntax fix.
    
    * Fix OD sample.
    
    Fix sample dnn object detection.
    
    Fix NMS boxes bug.
    
    remove trailing whitespace.
    
    Remove debug function.
    
    Change thresholds for opencl tests.
    
    * Adapt score diff and iou diff.
    
    * Alter iouDiffs.
    
    * Add debug messages.
    
    * Adapt iouDiff.
    
    * Fix tests
    38f8fc6c
reorg.cl 3.11 KB
/*M///////////////////////////////////////////////////////////////////////////////////////
//
//  IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING.
//
//  By downloading, copying, installing or using the software you agree to this license.
//  If you do not agree to this license, do not download, install,
//  copy or use the software.
//
//
//                           License Agreement
//                For Open Source Computer Vision Library
//
// Copyright (c) 2016-2017 Fabian David Tschopp, all rights reserved.
// Third party copyrights are property of their respective owners.
//
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
//
//   * Redistribution's of source code must retain the above copyright notice,
//     this list of conditions and the following disclaimer.
//
//   * Redistribution's in binary form must reproduce the above copyright notice,
//     this list of conditions and the following disclaimer in the documentation
//     and/or other materials provided with the distribution.
//
//   * The name of the copyright holders may not be used to endorse or promote products
//     derived from this software without specific prior written permission.
//
// This software is provided by the copyright holders and contributors "as is" and
// any express or implied warranties, including, but not limited to, the implied
// warranties of merchantability and fitness for a particular purpose are disclaimed.
// In no event shall the Intel Corporation or contributors be liable for any direct,
// indirect, incidental, special, exemplary, or consequential damages
// (including, but not limited to, procurement of substitute goods or services;
// loss of use, data, or profits; or business interruption) however caused
// and on any theory of liability, whether in contract, strict liability,
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
//
//M*/

#if defined(cl_khr_fp16)
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
#endif

__kernel void reorg(const int count,
                    __global const Dtype* src,
                    const int channels,
                    const int height,
                    const int width,
                    const int reorgStride,
                    __global Dtype* dst)
{
    for (int index = get_global_id(0); index < count; index += get_global_size(0))
    {
        int sample_size = channels*height*width;
        int b = index/sample_size;
        int new_index = index%sample_size;
        int k = new_index / (height * width);
        int j = (new_index - (k * height * width)) / width;
        int i = new_index % width;
        int out_c = channels / (reorgStride*reorgStride);
        int c2 = k % out_c;
        int offset = k / out_c;
        int w2 = i*reorgStride + offset % reorgStride;
        int h2 = j*reorgStride + offset / reorgStride;
        int in_index = w2 + width*reorgStride*(h2 + height*reorgStride*c2);
        dst[index] = src[b*sample_size + in_index];
    }
}