Files · bae77590bbbef5d0f8bd84ac31e0054dc9b10110 · submodule / ngraph

Add reduce sum to the GPU transformer (op::Sum) (#671) · bae77590

Chris Sullivan authored Mar 22, 2018

* Current cudnn implementations use only
a single dimension for the ngraph tensor data (width).
In this case the tensor format should be set to

CUDNN_TENSOR_NCHW

so that adjacent memory accesses are coalesced (stride=1 for width).

* * Added some kernel emitter helpers that are reused often.
* Renamed EmitElementwise -> emit_elementwise to match emit<T>.
* op::Sum now handles trivial case of dim(input_tensor) = dim(output_tensor)
  by performing a memcpy as no axes are reduced.

*   Added general case for Nd descriptors which is used when the tensor
  has more than 4 dimensions. Currently a naive reduce is performed,
  in the future a coordinate transformation could be performed to
  improve the memory layout for the reduction.

* Switched to codegen::CodeWriter::block_begin/end.
It appears that CodeWriter::block_begin/end is not frequently used for emitters (in cpu and gpu transformers)
because a block comment is often desired. To this end I added prefix/suffix default parameters to CodeWriter::block_begin/end
so that this functionality is captured.

bae77590

Name	Last commit	Last update
cmake		Loading commit data...
contrib/docker		Loading commit data...
doc		Loading commit data...
licenses		Loading commit data...
maint		Loading commit data...
python		Loading commit data...
src		Loading commit data...
test		Loading commit data...
third-party		Loading commit data...
.clang-format		Loading commit data...
.gitignore		Loading commit data...
.gitmodules		Loading commit data...
CMakeLists.txt		Loading commit data...
INSTALL.md		Loading commit data...
LICENSE		Loading commit data...
README.md		Loading commit data...
changes.md		Loading commit data...

README.md