• Fenglei's avatar
    gpu reshape optimization (#1174) · b5e69eaa
    Fenglei authored
    * add gpu_timer to external function
    
    * compiled version
    
    * working version
    
    * using block_begin and block_end
    
    * add the missing '
    ;'
    
    * move slice to cuda emiter
    
    * change size_t to uint32_t in kernel
    
    * working version
    
    * change block size from 1 to 64
    
    * fix bugs
    
    * nthreads need to be size_t in broadcast op
    
    * add rank to kernel name hash
    
    * change reshape to cuda_emitter
    
    * fix bugs
    
    * bug, remove rank from kernel
    
    * clang format
    
    * update slice in convolution
    
    * resolve index conflict
    
    * change align to align_to_blocksize, add overflow check
    
    * add gird size check and fix pool merge bug
    
    * code style, change names
    
    * fix merge conflict
    
    * change kernel_runner to kernel_launch
    b5e69eaa
Name
Last commit
Last update
..
ngraph Loading commit data...
resource Loading commit data...
tools Loading commit data...
CMakeLists.txt Loading commit data...