Nd convolution via blocked GEMM for C{d1,...,dn}N layout (#1131)
* Added blank convolution kernel and refactored coordinate transform kernel helper. * Added op::Reshape to the CUDAEmitter. * Added 2-Nd tiled convolution. * Bug fixes with data_dilation and filter loop. Still need to add test for coverage of register tiling. * Styling. * Removed some comments and code added for testing. * Some tests became enabled in merge, removing them.
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please
register
or
sign in
to comment