• Fenglei's avatar
    gpu slice optimization (#1172) · f243d035
    Fenglei authored
    * add gpu_timer to external function
    
    * compiled version
    
    * working version
    
    * using block_begin and block_end
    
    * add the missing '
    ;'
    
    * move slice to cuda emiter
    
    * change size_t to uint32_t in kernel
    
    * working version
    
    * change block size from 1 to 64
    
    * fix bugs
    
    * nthreads need to be size_t in broadcast op
    
    * add rank to kernel name hash
    
    * update slice in convolution
    
    * resolve index conflict
    
    * change align to align_to_blocksize, add overflow check
    
    * add gird size check and fix pool merge bug
    
    * code style, change names
    f243d035
Name
Last commit
Last update
.ci/travis/ubuntu Loading commit data...
cmake Loading commit data...
contrib/docker Loading commit data...
doc Loading commit data...
licenses Loading commit data...
maint Loading commit data...
python Loading commit data...
src Loading commit data...
test Loading commit data...
.clang-format Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
.travis.yml Loading commit data...
CMakeLists.txt Loading commit data...
CONTRIB.md Loading commit data...
INSTALL.md Loading commit data...
LICENSE Loading commit data...
README.md Loading commit data...
VERSION.in Loading commit data...
changes.md Loading commit data...