• Chris Sullivan's avatar
    Add reduce sum to the GPU transformer (op::Sum) (#671) · bae77590
    Chris Sullivan authored
    * Current cudnn implementations use only
    a single dimension for the ngraph tensor data (width).
    In this case the tensor format should be set to
    
    CUDNN_TENSOR_NCHW
    
    so that adjacent memory accesses are coalesced (stride=1 for width).
    
    * * Added some kernel emitter helpers that are reused often.
    * Renamed EmitElementwise -> emit_elementwise to match emit<T>.
    * op::Sum now handles trivial case of dim(input_tensor) = dim(output_tensor)
      by performing a memcpy as no axes are reduced.
    
    *   Added general case for Nd descriptors which is used when the tensor
      has more than 4 dimensions. Currently a naive reduce is performed,
      in the future a coordinate transformation could be performed to
      improve the memory layout for the reduction.
    
    * Switched to codegen::CodeWriter::block_begin/end.
    It appears that CodeWriter::block_begin/end is not frequently used for emitters (in cpu and gpu transformers)
    because a block comment is often desired. To this end I added prefix/suffix default parameters to CodeWriter::block_begin/end
    so that this functionality is captured.
    bae77590
Name
Last commit
Last update
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
licenses Loading commit data...
maint Loading commit data...
python Loading commit data...
src Loading commit data...
test Loading commit data...
third-party Loading commit data...
.clang-format Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
CMakeLists.txt Loading commit data...
INSTALL.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
changes.md Loading commit data...