src · a3133482abc55d4ad75ce554fb0d774edd4b950d · submodule / ngraph

nvgpu cuda softmax optimization (#2101) · a3133482

Fenglei authored Dec 11, 2018

* add some helper function

* update with new helper function

* update reduce to nd with new helper function

* update float sum to stable sum

* fix bug

* update all reduce to stable sum for float

* fix bug and pass the sum stable test

* remove debug info

* style

* update with shape

* fix bug

* add host parameters to cuda_emitter

* clang format

* fix bugs

* add element::type support

* format

* add a cached value with datatype name

* add init_reduce_value

* unroll loop

* optimization

* remove the need for init_value

* add memset kernel

* add memcpy

* working version

* remove debug info

* add comments, clean up code.

* change in_idx to input_idx

* fix bug

* change args name for memset in emitter

* pass element::Type instead of string

* the op::reduce come with init value, add support

* resolve codacy-bot comment

* fix bug

* resove codacy-bot comment

* add soft_max_block_reduce kernel

* fix bugs

* add softmax_block_reduce to cuda_emitter

* compiing ok, result wrong

* fix bug in kernel

* working version

* removed unused code

* remove unused comments, resolve comments

* cuda reduce for max, min, mul, reduce op init value, format

* use type::info

* use type info for numeric_limits

* remove code from gpu_host_parameters

* header

* remvoe outdated comments

* add helper to check if stable sum is needed

* add stable sum test for double

* remove extra line

* consolidate helper functions

* no need list now.

* remove extra ;

* clang format

* style

* add skip test for cpu and intelGPU side

* resolve more conflict

* update comment

* fix a warning

* Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp

using load.
Co-Authored-By: fengleitian <35274053+fengleitian@users.noreply.github.com>

* using WARPSIZE instead of 32, using lambda

* more WARPSIZE instead of 32

* fix block_size_x bug

* using __expf

a3133482

Name	Last commit	Last update
..
ngraph		Loading commit data...
resource		Loading commit data...
tools		Loading commit data...
CMakeLists.txt		Loading commit data...