• Fenglei's avatar
    gpu reshape optimization (#1174) · b5e69eaa
    Fenglei authored
    * add gpu_timer to external function
    
    * compiled version
    
    * working version
    
    * using block_begin and block_end
    
    * add the missing '
    ;'
    
    * move slice to cuda emiter
    
    * change size_t to uint32_t in kernel
    
    * working version
    
    * change block size from 1 to 64
    
    * fix bugs
    
    * nthreads need to be size_t in broadcast op
    
    * add rank to kernel name hash
    
    * change reshape to cuda_emitter
    
    * fix bugs
    
    * bug, remove rank from kernel
    
    * clang format
    
    * update slice in convolution
    
    * resolve index conflict
    
    * change align to align_to_blocksize, add overflow check
    
    * add gird size check and fix pool merge bug
    
    * code style, change names
    
    * fix merge conflict
    
    * change kernel_runner to kernel_launch
    b5e69eaa
Name
Last commit
Last update
.ci/travis/ubuntu Loading commit data...
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
licenses Loading commit data...
maint Loading commit data...
python Loading commit data...
src Loading commit data...
test Loading commit data...
.clang-format Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
.travis.yml Loading commit data...
CMakeLists.txt Loading commit data...
CONTRIB.md Loading commit data...
INSTALL.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
VERSION.in Loading commit data...
changes.md Loading commit data...