nvgpu reduction optimization (#1455)
* add cuda reduce * clang format * fix bugs * fix bug * add 1d reduce * clang format * fix bugs * unroll loop * remove debug info * revert tests * unroll 1D reduce op * add comments * using cudnn for nd to scalar reduction * remove cuda 1d reduction since cudnn version is faster * remove 1D kernel * fix variable name * resolve Chris's comments * non_reduce_in_strides to non_reduce_strides
Showing
Please
register
or
sign in
to comment