test/backend_test.in.cpp · 5f40d9570eefeee69c98e57b62f6439b00acdee8 · submodule / ngraph

nvgpu reduce to scalar optimization (#1491) · 5f40d957

Fenglei authored Sep 04, 2018

* add cuda reduce

* clang format

* fix bugs

* fix bug

* add 1d reduce

* clang format

* fix bugs

* unroll loop

* remove debug info

* revert tests

* unroll 1D reduce op

* add comments

* using cudnn for nd to scalar reduction

* remove cuda 1d reduction since cudnn version is faster

* remove 1D kernel

* fix bugs

* 1d multi block size

* remove debug

* change kernel name

* add reduce to scalar optimization, add test

* fix bugs and tune parameters

* clang format

* update comments

* update comments

* update comments

* clang format

* update comments

* remove wrong comments, apply clang format

* resolve Bob's comment

* clang format

* pass shared mem size from cuLaunchKernel, set unroll loop size through host code

* remove unused code.clang format

* change reduce to thread with shfl for each warp first

* add seed

* unroll size

5f40d957

backend_test.in.cpp 365 KB

Replace backend_test.in.cpp