1. 14 Oct, 2018 1 commit
  2. 13 Oct, 2018 6 commits
  3. 12 Oct, 2018 10 commits
  4. 11 Oct, 2018 3 commits
  5. 10 Oct, 2018 3 commits
    • Fenglei's avatar
      add back missing part (#1785) · a41c1baa
      Fenglei authored
      a41c1baa
    • Fenglei's avatar
      nvgpu one hot update (#1773) · 6cd35432
      Fenglei authored
      * update onehot
      
      * clang
      
      * fix bugs
      
      * format
      
      * add output_datatype_size to hash
      
      * typo
      
      * hash
      6cd35432
    • Nick Korovaiko's avatar
      Reshape Sinking (#1701) · f642bc4c
      Nick Korovaiko authored
      * reshape sinking working on mnist_conv
      
      * forgot to add reshape_sinking files
      
      * refactoring of binary case
      
      * Quantize/Dequantize case, fix add case, add assert
      
      * address bob and scott's feedback
      
      * debug
      
      * fix a bug where reshapes are removed too early
      f642bc4c
  6. 09 Oct, 2018 4 commits
  7. 08 Oct, 2018 4 commits
    • Robert Kimball's avatar
      optimize operator== (#1765) · c5f0bd9d
      Robert Kimball authored
      c5f0bd9d
    • Chris Sullivan's avatar
      Update pad on nvpgu (#1759) · 40ff77bd
      Chris Sullivan authored
      * Add pad with fill operator using the outward-in index pattern.
      
      * Remove static pad and rename build_pad_dynamic -> build_pad. Update maxpool 1d padding.
      
      * Formatting.
      
      * Split build_pad_dynamic into build_pad and build_pad_fill.
      
      * Add test coverage for fixed bug in op::Pad for gpu.
      40ff77bd
    • Jayaram Bobba's avatar
      IAT: Skip reshapes that are removing or adding size-1 dimensions (#1684) · 519b18ac
      Jayaram Bobba authored
      * Reshape optimizations for when unit-sized dimensions are added/removed from tensors
      
      * Added unit tests for eliminating squeeze and expand_dims operations
      
      * Bug fix to expand dims layout
      
      * Style fix
      519b18ac
    • Jayaram Bobba's avatar
      IAT: More convolution folding optimizations (#1712) · 00b4453d
      Jayaram Bobba authored
      * Check output shape when setting memory layout for slice op.
      
      * Miscellaneous fusion and other optimizations for inception-resnetv2
      - ConvBias Batchnorm folding
      - ConvBias Affine folding
      - Check if MKLDNN can slice a given layout and select layouts
        appropriately
      
      * Fixed unit test and bug in conv bias pattern
      
      * Addressed PR feedback
      
      * Addressed PR feedback
      00b4453d
  8. 06 Oct, 2018 2 commits
  9. 05 Oct, 2018 7 commits
    • gcwenger's avatar
      Support LRN for NVGPU Backend (#1740) · fe06f325
      gcwenger authored
      * LRN WIP
      
      * Explicit lambda captures.
      
      * Switched to Ayan's new caching routine.
      
      * Remove commented out lrn from manifest.
      
      * Fixed clang 3.9 error.
      
      * Corrected lrn hash. Only call cudnnSetLRNDescriptor once.
      
      * Simplified lrn hash. Removed redundant parameters. No longer passing CUDNN_LRN_CROSS_CHANNEL_DIM1 as parameter because it's the only choice for cudnnLRNCrossChannelForward.
      fe06f325
    • Jaikrishnan Menon's avatar
      CPU: Make DEX mode the default (#1755) · c8858ef2
      Jaikrishnan Menon authored
      c8858ef2
    • Scott Cyphers's avatar
      Cyphers/doc1 (#1758) · 0e6c9c26
      Scott Cyphers authored
      * More op doc, fix formatting
      
      * sqrt, tan
      
      * Formatting.
      0e6c9c26
    • Robert Kimball's avatar
      address klocwork issue (#1748) · 0920ed1c
      Robert Kimball authored
      0920ed1c
    • Robert Kimball's avatar
      Changes to make Klocwork a little happier (#1739) · 15da6cfe
      Robert Kimball authored
      * address klocwork issue
      
      * move class init
      
      * more klocwork
      
      * more klocwork
      
      * more klocwork
      
      * comment on where the magic number is from
      
      * address review comments
      
      * address review comments
      15da6cfe
    • Chris Sullivan's avatar
      RNN fusion (inference) (#1459) · 4df5ea8b
      Chris Sullivan authored
      * Add op::Sigmoid to nvgpu.
      
      * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
      
      * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
      
      * Formatting
      
      * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
      
      * Add RNN fusion of LSTM cells within a recurrent layer.
      
      * Formatting.
      
      * Add fusion across RNN layers.
      
      * Formatting.
      
      * Add algebraic simplification.
      
      * Added rnn fusion tests.
      
      * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
      
      * Formatting.
      
      * Removed print statements.
      
      * Formatting.
      
      * Committing missing file.
      
      * Remove concat inputs pass and mkldnn references.
      
      * fix cmake paths
      
      * conflict resolution with merge from master.
      
      * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
      
      * Formatting.
      
      * Use NGRAPH_ASSERT. Formatting of intel copyright.
      
      * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
      
      * fix wrong rnn header
      
      * Formatting.
      
      * Add back lstm op to dispatch table.
      
      * Added RNN test which shows cudnn rnn kernel is not producing correct results.
      
      * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
      
      * Bug fix in parameter tensor packing.
      
      * Alias third output element of RNN for cell state (bug fix).
      
      * Resolve numerical correctness issue with negative values in RNN (bug fix).
      Add minimal test to evaluate LSTM and compare with values calculated by hand.
      
      * Add tensor parameter sizes to kernel hash as
      they are kernel-specific.
      
      * Add 2 layer lstm fusion test against by-hand solution.
      
      * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
      
      * Formatting.
      
      * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
      
      * Simplify macro support for gpu ops.
      
      * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
      Need to decide how to notify user of increased performance with >= 7200.
      
      * Revert lstm_analytic test to explicitly copy data to tensor params.
      
      * Removed namespace arg from NGRAPH_GPU_OP.
      
      * Refactored macros to different header so op_tbl only contains op list.
      
      * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
      
      * doubles -> floats
      
      * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
      
      * Remove Lstm op and replace it with Rnn.
      
      * Format
      
      * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
      Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
      results in a double free inside of the memory layout pass. Appears to be a bug
      in Reshape pass through.
      
      * Removed print statements. Add check on input data and recurrent data.
      
      * Don't reuse memory for non-destructive ops.
      
      * Add back Rnn test.
      
      * Formatting.
      
      * Clean up comments.
      
      * Update test per review comments.
      4df5ea8b
    • Adam Procter's avatar
      Add asserts to reference to make sure we don't overshoot iterators (#1757) · f04503b6
      Adam Procter authored
      * Add some asserts to make sure we don't overshoot certain iterators in the reference kernels
      
      * Add missing assertion.hpp include
      f04503b6