-
Fenglei authored
* add gpu_timer to external function * compiled version * working version * using block_begin and block_end * add the missing ' ;' * move slice to cuda emiter * change size_t to uint32_t in kernel * working version * change block size from 1 to 64 * fix bugs * nthreads need to be size_t in broadcast op * add rank to kernel name hash * change reshape to cuda_emitter * fix bugs * bug, remove rank from kernel * clang format * update slice in convolution * resolve index conflict * change align to align_to_blocksize, add overflow check * add gird size check and fix pool merge bug * code style, change names * fix merge conflict * change kernel_runner to kernel_launch
b5e69eaa
Name |
Last commit
|
Last update |
---|---|---|
.. | ||
ngraph | ||
resource | ||
tools | ||
CMakeLists.txt |