-
Fenglei authored
* add gpu_timer to external function * compiled version * working version * using block_begin and block_end * add the missing ' ;' * move slice to cuda emiter * change size_t to uint32_t in kernel * working version * change block size from 1 to 64 * fix bugs * nthreads need to be size_t in broadcast op * add rank to kernel name hash * update slice in convolution * resolve index conflict * change align to align_to_blocksize, add overflow check * add gird size check and fix pool merge bug * code style, change names
f243d035
Name |
Last commit
|
Last update |
---|---|---|
.ci/travis/ubuntu | ||
cmake | ||
contrib/docker | ||
doc | ||
licenses | ||
maint | ||
python | ||
src | ||
test | ||
.clang-format | ||
.gitignore | ||
.gitmodules | ||
.travis.yml | ||
CMakeLists.txt | ||
CONTRIB.md | ||
INSTALL.md | ||
LICENSE | ||
README.md | ||
VERSION.in | ||
changes.md |