-
Chris Sullivan authored
* Current cudnn implementations use only a single dimension for the ngraph tensor data (width). In this case the tensor format should be set to CUDNN_TENSOR_NCHW so that adjacent memory accesses are coalesced (stride=1 for width). * * Added some kernel emitter helpers that are reused often. * Renamed EmitElementwise -> emit_elementwise to match emit<T>. * op::Sum now handles trivial case of dim(input_tensor) = dim(output_tensor) by performing a memcpy as no axes are reduced. * Added general case for Nd descriptors which is used when the tensor has more than 4 dimensions. Currently a naive reduce is performed, in the future a coordinate transformation could be performed to improve the memory layout for the reduction. * Switched to codegen::CodeWriter::block_begin/end. It appears that CodeWriter::block_begin/end is not frequently used for emitters (in cpu and gpu transformers) because a block comment is often desired. To this end I added prefix/suffix default parameters to CodeWriter::block_begin/end so that this functionality is captured.
bae77590
Name |
Last commit
|
Last update |
---|---|---|
cmake | ||
contrib/docker | ||
doc | ||
licenses | ||
maint | ||
python | ||
src | ||
test | ||
third-party | ||
.clang-format | ||
.gitignore | ||
.gitmodules | ||
CMakeLists.txt | ||
INSTALL.md | ||
LICENSE | ||
README.md | ||
changes.md |