• Fenglei's avatar
    nvgpu cuda softmax optimization (#2101) · a3133482
    Fenglei authored
    * add some helper function
    
    * update with new helper function
    
    * update reduce to nd with new helper function
    
    * update float sum to stable sum
    
    * fix bug
    
    * update all reduce to stable sum for float
    
    * fix bug and pass the sum stable test
    
    * remove debug info
    
    * style
    
    * update with shape
    
    * fix bug
    
    * add host parameters to cuda_emitter
    
    * clang format
    
    * fix bugs
    
    * add element::type support
    
    * format
    
    * add a cached value with datatype name
    
    * add init_reduce_value
    
    * unroll loop
    
    * optimization
    
    * remove the need for init_value
    
    * add memset kernel
    
    * add memcpy
    
    * working version
    
    * remove debug info
    
    * add comments, clean up code.
    
    * change in_idx to input_idx
    
    * fix bug
    
    * change args name for memset in emitter
    
    * pass element::Type instead of string
    
    * the op::reduce come with init value, add support
    
    * resolve codacy-bot comment
    
    * fix bug
    
    * resove codacy-bot comment
    
    * add soft_max_block_reduce kernel
    
    * fix bugs
    
    * add softmax_block_reduce to cuda_emitter
    
    * compiing ok, result wrong
    
    * fix bug in kernel
    
    * working version
    
    * removed unused code
    
    * remove unused comments, resolve comments
    
    * cuda reduce for max, min, mul, reduce op init value, format
    
    * use type::info
    
    * use type info for numeric_limits
    
    * remove code from gpu_host_parameters
    
    * header
    
    * remvoe outdated comments
    
    * add helper to check if stable sum is needed
    
    * add stable sum test for double
    
    * remove extra line
    
    * consolidate helper functions
    
    * no need list now.
    
    * remove extra ;
    
    * clang format
    
    * style
    
    * add skip test for cpu and intelGPU side
    
    * resolve more conflict
    
    * update comment
    
    * fix a warning
    
    * Update src/ngraph/runtime/gpu/gpu_cuda_kernel_builder.cpp
    
    using load.
    Co-Authored-By: 's avatarfengleitian <35274053+fengleitian@users.noreply.github.com>
    
    * using WARPSIZE instead of 32, using lambda
    
    * more WARPSIZE instead of 32
    
    * fix block_size_x bug
    
    * using __expf
    a3133482
Name
Last commit
Last update
.ci Loading commit data...
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
licenses Loading commit data...
maint Loading commit data...
python Loading commit data...
src Loading commit data...
test Loading commit data...
.clang-format Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
.travis.yml Loading commit data...
ABOUT.md Loading commit data...
CMakeLists.txt Loading commit data...
CODEOWNERS Loading commit data...
CONTRIB.md Loading commit data...
FAQs.md Loading commit data...
INSTALL.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
VERSION.in Loading commit data...
changes.md Loading commit data...
ecosystem-overview.md Loading commit data...