1. 08 Oct, 2018 5 commits
  2. 06 Oct, 2018 2 commits
  3. 05 Oct, 2018 13 commits
    • gcwenger's avatar
      Support LRN for NVGPU Backend (#1740) · fe06f325
      gcwenger authored
      * LRN WIP
      
      * Explicit lambda captures.
      
      * Switched to Ayan's new caching routine.
      
      * Remove commented out lrn from manifest.
      
      * Fixed clang 3.9 error.
      
      * Corrected lrn hash. Only call cudnnSetLRNDescriptor once.
      
      * Simplified lrn hash. Removed redundant parameters. No longer passing CUDNN_LRN_CROSS_CHANNEL_DIM1 as parameter because it's the only choice for cudnnLRNCrossChannelForward.
      fe06f325
    • Jaikrishnan Menon's avatar
      CPU: Make DEX mode the default (#1755) · c8858ef2
      Jaikrishnan Menon authored
      c8858ef2
    • Scott Cyphers's avatar
      Cyphers/doc1 (#1758) · 0e6c9c26
      Scott Cyphers authored
      * More op doc, fix formatting
      
      * sqrt, tan
      
      * Formatting.
      0e6c9c26
    • Robert Kimball's avatar
      address klocwork issue (#1748) · 0920ed1c
      Robert Kimball authored
      0920ed1c
    • Robert Kimball's avatar
      Changes to make Klocwork a little happier (#1739) · 15da6cfe
      Robert Kimball authored
      * address klocwork issue
      
      * move class init
      
      * more klocwork
      
      * more klocwork
      
      * more klocwork
      
      * comment on where the magic number is from
      
      * address review comments
      
      * address review comments
      15da6cfe
    • Chris Sullivan's avatar
      RNN fusion (inference) (#1459) · 4df5ea8b
      Chris Sullivan authored
      * Add op::Sigmoid to nvgpu.
      
      * Bring rnn fusion and concat passes over into GPU from IA. This is a temporary move until generalization and gpu specification can occur.
      
      * Add LSTM fusion and cudnn inference kernel. Next need recurrent fusion and layer fusion.
      
      * Formatting
      
      * Removed unecessary extra output from LSTM op (rnn with seq. length = 1, so y = hy).
      
      * Add RNN fusion of LSTM cells within a recurrent layer.
      
      * Formatting.
      
      * Add fusion across RNN layers.
      
      * Formatting.
      
      * Add algebraic simplification.
      
      * Added rnn fusion tests.
      
      * Updated conditional on LSTM fusion to better distinguish bound nodes as ht vs xt.
      
      * Formatting.
      
      * Removed print statements.
      
      * Formatting.
      
      * Committing missing file.
      
      * Remove concat inputs pass and mkldnn references.
      
      * fix cmake paths
      
      * conflict resolution with merge from master.
      
      * remove explicit lstm op support. bare LSTM ops are converted to RNN ops for emission.
      
      * Formatting.
      
      * Use NGRAPH_ASSERT. Formatting of intel copyright.
      
      * Add check on the feature size (shape) of the recurrent (hidden) input and cell state, to ensure they are the same size.
      
      * fix wrong rnn header
      
      * Formatting.
      
      * Add back lstm op to dispatch table.
      
      * Added RNN test which shows cudnn rnn kernel is not producing correct results.
      
      * With update to AlgSimpl. to simplify concat-reshape-slice, the check modifed in this commit needed to be relaxed.
      
      * Bug fix in parameter tensor packing.
      
      * Alias third output element of RNN for cell state (bug fix).
      
      * Resolve numerical correctness issue with negative values in RNN (bug fix).
      Add minimal test to evaluate LSTM and compare with values calculated by hand.
      
      * Add tensor parameter sizes to kernel hash as
      they are kernel-specific.
      
      * Add 2 layer lstm fusion test against by-hand solution.
      
      * Export param concatenation to graph for cudnn kernel at both the single rnn layer and multi-layer.
      
      * Formatting.
      
      * Finishing touches after merge: add support for macro expansed dispatch via op_tbl.
      
      * Simplify macro support for gpu ops.
      
      * Add CUDNN_VERSION >= 7200 defguards for RNN fusion.
      Need to decide how to notify user of increased performance with >= 7200.
      
      * Revert lstm_analytic test to explicitly copy data to tensor params.
      
      * Removed namespace arg from NGRAPH_GPU_OP.
      
      * Refactored macros to different header so op_tbl only contains op list.
      
      * Defguard on cudnn_descriptor<cudnnRNNDataDescriptor_t>.
      
      * doubles -> floats
      
      * Reorg. pass asserts, prepare to replace with non-throwing pass failures.
      
      * Remove Lstm op and replace it with Rnn.
      
      * Format
      
      * Utilize RETURN_IF_FALSE in rnn pass to avoid any RT asserts.
      Note that falling back to raw (no passes) graph for 2rnn_3lstm json from mxnet models
      results in a double free inside of the memory layout pass. Appears to be a bug
      in Reshape pass through.
      
      * Removed print statements. Add check on input data and recurrent data.
      
      * Don't reuse memory for non-destructive ops.
      
      * Add back Rnn test.
      
      * Formatting.
      
      * Clean up comments.
      
      * Update test per review comments.
      4df5ea8b
    • Adam Procter's avatar
      Add asserts to reference to make sure we don't overshoot iterators (#1757) · f04503b6
      Adam Procter authored
      * Add some asserts to make sure we don't overshoot certain iterators in the reference kernels
      
      * Add missing assertion.hpp include
      f04503b6
    • dmyershov's avatar
      IntelGPU backend: Broadcast bug fix: (output_shape.at(0) == 1) doesn't mean that… · d9dfaeb8
      dmyershov authored
      IntelGPU backend: Broadcast bug fix: (output_shape.at(0) == 1) doesn't mean that it is scalar (#1754)
      
      d9dfaeb8
    • Chris Sullivan's avatar
      Properly support global stats in BN (#1753) · adb38ab4
      Chris Sullivan authored
      * global stats fix
      
      * Formatting.
      adb38ab4
    • Robert Kimball's avatar
      address klocwork number overflow issue (#1751) · 3d21f6ed
      Robert Kimball authored
      * address klocwork number overflow issue
      
      * one more issue
      3d21f6ed
    • Robert Kimball's avatar
      address klocwork issues (#1750) · be0a9f03
      Robert Kimball authored
      be0a9f03
    • Robert Kimball's avatar
      address klocwork issue (#1747) · 9f26b7e9
      Robert Kimball authored
      9f26b7e9
    • Adam Procter's avatar
      Partial Shapes, Part 2: Adapt Tensor class to have partial shapes (#1718) · a0be5231
      Adam Procter authored
      * Adapt Tensor class to have partial shapes
      
      * Add PartialShapes to Input, Output, Function, Node classes
      
      * Terminological cleanup
      a0be5231
  4. 04 Oct, 2018 8 commits
  5. 03 Oct, 2018 3 commits
    • L.S. Cook's avatar
      add doctools js from basic theme sphinx repo (#1735) · 58df83cf
      L.S. Cook authored
      * add doctools js from basic theme sphinx repo
      
      * fixes from PR 672 RTD theme regarding sphinx build
      58df83cf
    • shssf's avatar
      IntelGPU backend: Operation Reduce implemented (#1736) · cae66197
      shssf authored
      * IntelGPU backend: Operation Reduce implemented
      
      * PR1736. Style fixed
      cae66197
    • Ayan Moitra's avatar
      cublas emitter for NVGPU backend (#1705) · 7ac35345
      Ayan Moitra authored
      * cublas emitter
      
      * clang format fixes
      
      * Initial comment incorporation from Chris
      
      * Chris's If-else change comment incorporation
      
      * incorporating Bob's comments phase 1
      
      *  Remove unnecessary headers in cublas emitter hpp & cpp (as per Bob's comments)
      
      * clang format on previous commit
      
      * incorporate fenglei's refactoring comment
      
      * incorporating comments
      
      * Incorporate Chris's final comment
      
      * All comments resolved
      
      * Resolve Geoff's comments
      
      * Change cache_primitive to register_primitive
      7ac35345
  6. 02 Oct, 2018 9 commits