test · 15da6cfed34d199d2718b78b8624a6fa5328d352 · submodule / ngraph

RNN fusion (inference) (#1459) · 4df5ea8b

Chris Sullivan authored Oct 05, 2018

* Add op::Sigmoid to nvgpu.

* Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.

* Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.

* Formatting

* Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).

* Add RNN fusion of LSTM cells within a recurrent layer.

* Formatting.

* Add fusion across RNN layers.

* Formatting.

* Add algebraic simplification.

* Added rnn fusion tests.

* Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.

* Formatting.

* Removed print statements.

* Formatting.

* Committing missing file.

* Remove concat inputs pass and mkldnn references.

* fix cmake paths

* conflict resolution with merge from master.

* remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.

* Formatting.

* Use NGRAPH_ASSERT. Formatting of intel copyright.

* Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.

* fix wrong rnn header

* Formatting.

* Add back lstm op to dispatch table.

* Added RNN test which shows cudnn rnn kernel is not producing correct results.

* With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.

* Bug fix in parameter tensor packing.

* Alias third output element of RNN for cell state (bug fix).

* Resolve numerical correctness issue with negative values in RNN (bug fix).
Add minimal test to evaluate LSTM and compare with values calculated by hand.

* Add tensor parameter sizes to kernel hash as
they are kernel-specific.

* Add 2 layer lstm fusion test against by-hand solution.

* Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.

* Formatting.

* Finishing touches after merge: add support for macro expansed dispatch via op_tbl.

* Simplify macro support for gpu ops.

* Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
Need to decide how to notify user of increased performance with >= 7200.

* Revert lstm_analytic test to explicitly copy data to tensor params.

* Removed namespace arg from NGRAPH_GPU_OP.

* Refactored macros to different header so op_tbl only contains op list.

* Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.

* doubles -> floats

* Reorg. pass asserts, prepare to replace with non-throwing pass failures.

* Remove Lstm op and replace it with Rnn.

* Format

* Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
results in a double free inside of the memory layout pass. Appears to be a bug
in Reshape pass through.

* Removed print statements. Add check on input data and recurrent data.

* Don't reuse memory for non-destructive ops.

* Add back Rnn test.

* Formatting.

* Clean up comments.

* Update test per review comments.

4df5ea8b

Name	Last commit	Last update
..
files		Loading commit data...
models		Loading commit data...
ref_generators		Loading commit data...
util		Loading commit data...
CMakeLists.txt		Loading commit data...
algebraic_simplification.cpp		Loading commit data...
all_close_f.cpp		Loading commit data...
assertion.cpp		Loading commit data...
autodiff.in.cpp		Loading commit data...
backend_api.cpp		Loading commit data...
backend_debug_api.cpp		Loading commit data...
backend_performance.cpp		Loading commit data...
backend_test.in.cpp		Loading commit data...
build_graph.cpp		Loading commit data...
builder.cpp		Loading commit data...
builder_autobroadcast.cpp		Loading commit data...
constant_folding.cpp		Loading commit data...
control_dependencies.cpp		Loading commit data...
convolution_test.in.cpp		Loading commit data...
coordinate.cpp		Loading commit data...
copy.cpp		Loading commit data...
core_fusion.cpp		Loading commit data...
cpio.cpp		Loading commit data...
cpu_fusion.cpp		Loading commit data...
cpu_test.cpp		Loading commit data...
cse.cpp		Loading commit data...
cudnn.cpp		Loading commit data...
distributed.cpp		Loading commit data...
element_type.cpp		Loading commit data...
file_util.cpp		Loading commit data...
gpu_fusion.cpp		Loading commit data...
gpu_test.cpp		Loading commit data...
graph_partition.cpp		Loading commit data...
includes.cpp		Loading commit data...
inliner.cpp		Loading commit data...
input_output_assign.cpp		Loading commit data...
main.cpp		Loading commit data...
mkldnn.cpp		Loading commit data...
nop_elimination.cpp		Loading commit data...
onnx_import.cpp		Loading commit data...
onnxifi.cpp		Loading commit data...
op.cpp		Loading commit data...
partial_shape.cpp		Loading commit data...
pass_liveness.cpp		Loading commit data...
pass_manager.cpp		Loading commit data...
pass_memory_layout.cpp		Loading commit data...
pattern.cpp		Loading commit data...
quantize_cpu.cpp		Loading commit data...
reshape_elimination.cpp		Loading commit data...
serialize.cpp		Loading commit data...
shape.cpp		Loading commit data...
tensor.cpp		Loading commit data...
type_prop.cpp		Loading commit data...
update_reference.sh		Loading commit data...
util.cpp		Loading commit data...
uuid.cpp		Loading commit data...
zero_dim_tensor_elimination.cpp		Loading commit data...