• Paul Murphy's avatar
    Merge pull request #15257 from pmur:resize · a011035e
    Paul Murphy authored
    * resize: HResizeLinear reduce duplicate work
    
    There appears to be a 2x unroll of the HResizeLinear against k,
    however the k value is only incremented by 1 during the unroll. This
    results in k - 1 duplicate passes when k > 1.
    
    Likewise, the final pass may not respect the work done by the vector
    loop. Start it with the offset returned by the vector op if
    implemented. Note, no vector ops are implemented today.
    
    The performance is most noticable on a linear downscale. A set of
    performance tests are added to characterize this.  The performance
    improvement is 10-50% depending on the scaling.
    
    * imgproc: vectorize HResizeLinear
    
    Performance is mostly gated by the gather operations
    for x inputs.
    
    Likewise, provide a 2x unroll against k, this reduces the
    number of alpha gathers by 1/2 for larger k.
    
    While not a 4x improvement, it still performs substantially
    better under P9 for a 1.4x improvement. P8 baseline is
    1.05-1.10x due to reduced VSX instruction set.
    
    For float types, this results in a more modest
    1.2x improvement.
    
    * Update U8 processing for non-bitexact linear resize
    
    * core: hal: vsx: improve v_load_expand_q
    
    With a little help, we can do this quickly without gprs on
    all VSX enabled targets.
    
    * resize: Fix cn == 3 step per feedback
    
    Per feedback, ensure we don't overrun. This was caught via the
    failure observed in Test_TensorFlow.inception_accuracy.
    a011035e
Name
Last commit
Last update
..
calib3d Loading commit data...
core Loading commit data...
cudaarithm Loading commit data...
cudabgsegm Loading commit data...
cudacodec Loading commit data...
cudafeatures2d Loading commit data...
cudafilters Loading commit data...
cudaimgproc Loading commit data...
cudalegacy Loading commit data...
cudaobjdetect Loading commit data...
cudaoptflow Loading commit data...
cudastereo Loading commit data...
cudawarping Loading commit data...
cudev Loading commit data...
dnn Loading commit data...
features2d Loading commit data...
flann Loading commit data...
highgui Loading commit data...
imgcodecs Loading commit data...
imgproc Loading commit data...
java Loading commit data...
js Loading commit data...
ml Loading commit data...
objdetect Loading commit data...
photo Loading commit data...
python Loading commit data...
shape Loading commit data...
stitching Loading commit data...
superres Loading commit data...
ts Loading commit data...
video Loading commit data...
videoio Loading commit data...
videostab Loading commit data...
viz Loading commit data...
world Loading commit data...
CMakeLists.txt Loading commit data...