• Fenglei's avatar
    nvgpu reduce to scalar optimization (#1491) · 5f40d957
    Fenglei authored
    * add cuda reduce
    
    * clang format
    
    * fix bugs
    
    * fix bug
    
    * add 1d reduce
    
    * clang format
    
    * fix bugs
    
    * unroll loop
    
    * remove debug info
    
    * revert tests
    
    * unroll 1D reduce op
    
    * add comments
    
    * using cudnn for nd to scalar reduction
    
    * remove cuda 1d reduction since cudnn version is faster
    
    * remove 1D kernel
    
    * fix bugs
    
    * 1d multi block size
    
    * remove debug
    
    * change kernel name
    
    * add reduce to scalar optimization, add test
    
    * fix bugs and tune parameters
    
    * clang format
    
    * update comments
    
    * update comments
    
    * update comments
    
    * clang format
    
    * update comments
    
    * remove wrong comments, apply clang format
    
    * resolve Bob's comment
    
    * clang format
    
    * pass shared mem size from cuLaunchKernel, set unroll loop size through host code
    
    * remove unused code.clang format
    
    * change reduce to thread with shfl for each warp first
    
    * add seed
    
    * unroll size
    5f40d957
Name
Last commit
Last update
..
files Loading commit data...
models Loading commit data...
ref_generators Loading commit data...
util Loading commit data...
CMakeLists.txt Loading commit data...
algebraic_simplification.cpp Loading commit data...
all_close_f.cpp Loading commit data...
assertion.cpp Loading commit data...
autodiff.in.cpp Loading commit data...
backend_api.cpp Loading commit data...
backend_debug_api.cpp Loading commit data...
backend_performance.cpp Loading commit data...
backend_test.in.cpp Loading commit data...
build_graph.cpp Loading commit data...
builder.cpp Loading commit data...
builder_autobroadcast.cpp Loading commit data...
constant_folding.cpp Loading commit data...
convolution_test.in.cpp Loading commit data...
copy.cpp Loading commit data...
core_fusion.cpp Loading commit data...
cpio.cpp Loading commit data...
cpu_fusion.cpp Loading commit data...
cpu_test.cpp Loading commit data...
cse.cpp Loading commit data...
cudnn.cpp Loading commit data...
distributed.cpp Loading commit data...
element_type.cpp Loading commit data...
file_util.cpp Loading commit data...
gpu_test.cpp Loading commit data...
graph_partition.cpp Loading commit data...
includes.cpp Loading commit data...
inliner.cpp Loading commit data...
input_output_assign.cpp Loading commit data...
main.cpp Loading commit data...
mkldnn.cpp Loading commit data...
nop_elimination.cpp Loading commit data...
onnx_import.cpp Loading commit data...
op.cpp Loading commit data...
pass_liveness.cpp Loading commit data...
pass_manager.cpp Loading commit data...
pass_memory_layout.cpp Loading commit data...
pattern.cpp Loading commit data...
reshape_elimination.cpp Loading commit data...
serialize.cpp Loading commit data...
shape.cpp Loading commit data...
tensor.cpp Loading commit data...
type_prop.cpp Loading commit data...
update_reference.sh Loading commit data...
util.cpp Loading commit data...
uuid.cpp Loading commit data...
zero_dim_tensor_elimination.cpp Loading commit data...