• Fenglei's avatar
    nvgpu reduce to scalar optimization (#1491) · 5f40d957
    Fenglei authored
    * add cuda reduce
    
    * clang format
    
    * fix bugs
    
    * fix bug
    
    * add 1d reduce
    
    * clang format
    
    * fix bugs
    
    * unroll loop
    
    * remove debug info
    
    * revert tests
    
    * unroll 1D reduce op
    
    * add comments
    
    * using cudnn for nd to scalar reduction
    
    * remove cuda 1d reduction since cudnn version is faster
    
    * remove 1D kernel
    
    * fix bugs
    
    * 1d multi block size
    
    * remove debug
    
    * change kernel name
    
    * add reduce to scalar optimization, add test
    
    * fix bugs and tune parameters
    
    * clang format
    
    * update comments
    
    * update comments
    
    * update comments
    
    * clang format
    
    * update comments
    
    * remove wrong comments, apply clang format
    
    * resolve Bob's comment
    
    * clang format
    
    * pass shared mem size from cuLaunchKernel, set unroll loop size through host code
    
    * remove unused code.clang format
    
    * change reduce to thread with shfl for each warp first
    
    * add seed
    
    * unroll size
    5f40d957
Name
Last commit
Last update
.ci Loading commit data...
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
licenses Loading commit data...
maint Loading commit data...
python Loading commit data...
src Loading commit data...
test Loading commit data...
.clang-format Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
.travis.yml Loading commit data...
CMakeLists.txt Loading commit data...
CONTRIB.md Loading commit data...
INSTALL.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
VERSION.in Loading commit data...
changes.md Loading commit data...