Files · a3133482abc55d4ad75ce554fb0d774edd4b950d · submodule / ngraph

nvgpu cuda softmax optimization (#2101) · a3133482

Fenglei authored Dec 11, 2018

* add some helper function

* update with new helper function

* update reduce to nd with new helper function

* update float sum to stable sum

* fix bug

* update all reduce to stable sum for float

* fix bug and pass the sum stable test

* remove debug info

* style

* update with shape

* fix bug

* add host parameters to cuda_emitter

* clang format

* fix bugs

* add element::type support

* format

* add a cached value with datatype name

* add init_reduce_value

* unroll loop

* optimization

* remove the need for init_value

* add memset kernel

* add memcpy

* working version

* remove debug info

* add comments, clean up code.

* change in_idx to input_idx

* fix bug

* change args name for memset in emitter

* pass element::Type instead of string

* the op::reduce come with init value, add support

* resolve codacy-bot comment

* fix bug

* resove codacy-bot comment

* add soft_max_block_reduce kernel

* fix bugs

* add softmax_block_reduce to cuda_emitter

* compiing ok, result wrong

* fix bug in kernel

* working version

* removed unused code

* remove unused comments, resolve comments

* cuda reduce for max, min, mul, reduce op init value, format

* use type::info

* use type info for numeric_limits

* remove code from gpu_host_parameters

* header

* remvoe outdated comments

* add helper to check if stable sum is needed

* add stable sum test for double

* remove extra line

* consolidate helper functions

* no need list now.

* remove extra ;

* clang format

* style

* add skip test for cpu and intelGPU side

* resolve more conflict

* update comment

* fix a warning

* Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp

using load.
Co-Authored-By: fengleitian <35274053+fengleitian@users.noreply.github.com>

* using WARPSIZE instead of 32, using lambda

* more WARPSIZE instead of 32

* fix block_size_x bug

* using __expf

a3133482

Name	Last commit	Last update
.ci		Loading commit data...
cmake		Loading commit data...
contrib/docker		Loading commit data...
doc		Loading commit data...
licenses		Loading commit data...
maint		Loading commit data...
python		Loading commit data...
src		Loading commit data...
test		Loading commit data...
.clang-format		Loading commit data...
.gitattributes		Loading commit data...
.gitignore		Loading commit data...
.gitmodules		Loading commit data...
.travis.yml		Loading commit data...
ABOUT.md		Loading commit data...
CMakeLists.txt		Loading commit data...
CODEOWNERS		Loading commit data...
CONTRIB.md		Loading commit data...
FAQs.md		Loading commit data...
INSTALL.md		Loading commit data...
LICENSE		Loading commit data...
README.md		Loading commit data...
VERSION.in		Loading commit data...
changes.md		Loading commit data...
ecosystem-overview.md		Loading commit data...

README.md