• Chris Sullivan's avatar
    RNN fusion (inference) (#1459) · 4df5ea8b
    Chris Sullivan authored
    * Add op::Sigmoid to nvgpu.
    
    * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
    
    * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
    
    * Formatting
    
    * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
    
    * Add RNN fusion of LSTM cells within a recurrent layer.
    
    * Formatting.
    
    * Add fusion across RNN layers.
    
    * Formatting.
    
    * Add algebraic simplification.
    
    * Added rnn fusion tests.
    
    * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
    
    * Formatting.
    
    * Removed print statements.
    
    * Formatting.
    
    * Committing missing file.
    
    * Remove concat inputs pass and mkldnn references.
    
    * fix cmake paths
    
    * conflict resolution with merge from master.
    
    * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
    
    * Formatting.
    
    * Use NGRAPH_ASSERT. Formatting of intel copyright.
    
    * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
    
    * fix wrong rnn header
    
    * Formatting.
    
    * Add back lstm op to dispatch table.
    
    * Added RNN test which shows cudnn rnn kernel is not producing correct results.
    
    * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
    
    * Bug fix in parameter tensor packing.
    
    * Alias third output element of RNN for cell state (bug fix).
    
    * Resolve numerical correctness issue with negative values in RNN (bug fix).
    Add minimal test to evaluate LSTM and compare with values calculated by hand.
    
    * Add tensor parameter sizes to kernel hash as
    they are kernel-specific.
    
    * Add 2 layer lstm fusion test against by-hand solution.
    
    * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
    
    * Formatting.
    
    * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
    
    * Simplify macro support for gpu ops.
    
    * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
    Need to decide how to notify user of increased performance with >= 7200.
    
    * Revert lstm_analytic test to explicitly copy data to tensor params.
    
    * Removed namespace arg from NGRAPH_GPU_OP.
    
    * Refactored macros to different header so op_tbl only contains op list.
    
    * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
    
    * doubles -> floats
    
    * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
    
    * Remove Lstm op and replace it with Rnn.
    
    * Format
    
    * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
    Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
    results in a double free inside of the memory layout pass. Appears to be a bug
    in Reshape pass through.
    
    * Removed print statements. Add check on input data and recurrent data.
    
    * Don't reuse memory for non-destructive ops.
    
    * Add back Rnn test.
    
    * Formatting.
    
    * Clean up comments.
    
    * Update test per review comments.
    4df5ea8b
Name
Last commit
Last update
..
files Loading commit data...
models Loading commit data...
ref_generators Loading commit data...
util Loading commit data...
CMakeLists.txt Loading commit data...
algebraic_simplification.cpp Loading commit data...
all_close_f.cpp Loading commit data...
assertion.cpp Loading commit data...
autodiff.in.cpp Loading commit data...
backend_api.cpp Loading commit data...
backend_debug_api.cpp Loading commit data...
backend_performance.cpp Loading commit data...
backend_test.in.cpp Loading commit data...
build_graph.cpp Loading commit data...
builder.cpp Loading commit data...
builder_autobroadcast.cpp Loading commit data...
constant_folding.cpp Loading commit data...
control_dependencies.cpp Loading commit data...
convolution_test.in.cpp Loading commit data...
coordinate.cpp Loading commit data...
copy.cpp Loading commit data...
core_fusion.cpp Loading commit data...
cpio.cpp Loading commit data...
cpu_fusion.cpp Loading commit data...
cpu_test.cpp Loading commit data...
cse.cpp Loading commit data...
cudnn.cpp Loading commit data...
distributed.cpp Loading commit data...
element_type.cpp Loading commit data...
file_util.cpp Loading commit data...
gpu_fusion.cpp Loading commit data...
gpu_test.cpp Loading commit data...
graph_partition.cpp Loading commit data...
includes.cpp Loading commit data...
inliner.cpp Loading commit data...
input_output_assign.cpp Loading commit data...
main.cpp Loading commit data...
mkldnn.cpp Loading commit data...
nop_elimination.cpp Loading commit data...
onnx_import.cpp Loading commit data...
onnxifi.cpp Loading commit data...
op.cpp Loading commit data...
partial_shape.cpp Loading commit data...
pass_liveness.cpp Loading commit data...
pass_manager.cpp Loading commit data...
pass_memory_layout.cpp Loading commit data...
pattern.cpp Loading commit data...
quantize_cpu.cpp Loading commit data...
reshape_elimination.cpp Loading commit data...
serialize.cpp Loading commit data...
shape.cpp Loading commit data...
tensor.cpp Loading commit data...
type_prop.cpp Loading commit data...
update_reference.sh Loading commit data...
util.cpp Loading commit data...
uuid.cpp Loading commit data...
zero_dim_tensor_elimination.cpp Loading commit data...