Files · 5f40d9570eefeee69c98e57b62f6439b00acdee8 · submodule / ngraph

nvgpu reduce to scalar optimization (#1491) · 5f40d957

Fenglei authored Sep 04, 2018

* add cuda reduce

* clang format

* fix bugs

* fix bug

* add 1d reduce

* clang format

* fix bugs

* unroll loop

* remove debug info

* revert tests

* unroll 1D reduce op

* add comments

* using cudnn for nd to scalar reduction

* remove cuda 1d reduction since cudnn version is faster

* remove 1D kernel

* fix bugs

* 1d multi block size

* remove debug

* change kernel name

* add reduce to scalar optimization, add test

* fix bugs and tune parameters

* clang format

* update comments

* update comments

* update comments

* clang format

* update comments

* remove wrong comments, apply clang format

* resolve Bob's comment

* clang format

* pass shared mem size from cuLaunchKernel, set unroll loop size through host code

* remove unused code.clang format

* change reduce to thread with shfl for each warp first

* add seed

* unroll size

5f40d957

Name	Last commit	Last update
.ci		Loading commit data...
cmake		Loading commit data...
contrib/docker		Loading commit data...
doc		Loading commit data...
licenses		Loading commit data...
maint		Loading commit data...
python		Loading commit data...
src		Loading commit data...
test		Loading commit data...
.clang-format		Loading commit data...
.gitignore		Loading commit data...
.gitmodules		Loading commit data...
.travis.yml		Loading commit data...
CMakeLists.txt		Loading commit data...
CONTRIB.md		Loading commit data...
INSTALL.md		Loading commit data...
LICENSE		Loading commit data...
README.md		Loading commit data...
VERSION.in		Loading commit data...
changes.md		Loading commit data...

README.md