1. 03 Jul, 2018 2 commits
  2. 02 Jul, 2018 5 commits
    • Sandeep's avatar
      move sigmoid to core fusion (#1132) · d05b5e39
      Sandeep authored
      * declare sigmoid for core fusion
      
      * add simple test for sigmoid
      
      * info fusion status
      
      * cp op as main op
      
      * builds as expected
      
      * move sigmoid fusion code
      
      * add reference kernel
      
      * sigmoid bprop reference kernel and clang-format
      
      * add delta to bprop
      
      * fprop called
      
      * compiles bprop
      
      * move tests
      
      * serializer support
      
      * address comments in code
      
      * add doc
      
      * naming similar to core ops
      
      * fix failing test
      
      * fix failing test
      
      * address clang issue
      
      * more changes
      
      * change test macro
      d05b5e39
    • L.S. Cook's avatar
      18e58ea9
    • Pruthvi's avatar
      MKLDNN BoundedRelu implementation for Relu6 (#1179) · eaa6091c
      Pruthvi authored
      * 1. Added MKLDNNN BoundedRelu op support for Relu6
      2. CpuLayout && CPU assignment pass for BoundedRelu Op
      3. Unit test inter v/s CPU for BoundedReluOp
      4. MKLDNN and default emitter code for BoundedReluOp
      
      * Removed Debug prints
      
      * 1. Added support for boundedrelu to work on any constant literal
      2. unit test case for rank2, rank3, rank4 for bounded relu without serialized graph
      
      * Removed is_six() method
      eaa6091c
    • Louis Feng's avatar
      Conv+bias shape check for better error detection (#1176) · e42e5815
      Louis Feng authored
      * Reshape bias to 1D for conv + bias bprop fusion
      
      * Reshape goe2 back to 2D before replacing
      
      * added shape checks to validate conv+bias op.
      
      * removed conv+bias backprop merge for separate PR review.
      
      * fixed conv_bias_bprop test.
      
      * minor changes to error messages.
      e42e5815
    • Fenglei's avatar
      gpu slice optimization (#1172) · f243d035
      Fenglei authored
      * add gpu_timer to external function
      
      * compiled version
      
      * working version
      
      * using block_begin and block_end
      
      * add the missing '
      ;'
      
      * move slice to cuda emiter
      
      * change size_t to uint32_t in kernel
      
      * working version
      
      * change block size from 1 to 64
      
      * fix bugs
      
      * nthreads need to be size_t in broadcast op
      
      * add rank to kernel name hash
      
      * update slice in convolution
      
      * resolve index conflict
      
      * change align to align_to_blocksize, add overflow check
      
      * add gird size check and fix pool merge bug
      
      * code style, change names
      f243d035
  3. 30 Jun, 2018 2 commits
    • Pruthvi's avatar
      Pruthvi/fix rnn output (#1135) · c4c24cb0
      Pruthvi authored
      * - Fixed replace output for the multi layer recurrent cell state tensor output
      - Modified rnn add_output to consider direction and n_layer while calculating the output size for mkldnn dst_layer and dst_iter
      
      * fix unit test failure
      c4c24cb0
    • Nick Korovaiko's avatar
      LoopKernel Collector (#1128) · 784735d6
      Nick Korovaiko authored
      * collector
      
      * keeping track of inputs; simplifying a merging stratey; adding LKGraph
      
      * LoopKernel Collector
      
      * address feedback
      
      * address feedback 2
      
      * address feedback 3
      784735d6
  4. 29 Jun, 2018 4 commits
  5. 28 Jun, 2018 8 commits
    • Nishant Patel's avatar
      Reshape bias to 1D for cpufusion of conv+bias bprop (#1151) · 1574031c
      Nishant Patel authored
      * Reshape bias to 1D for conv + bias bprop fusion
      
      * Reshape goe2 back to 2D before replacing
      1574031c
    • Fenglei's avatar
      check cudnn version (#1175) · cf3e2992
      Fenglei authored
      cf3e2992
    • Nishant Patel's avatar
      Support dimshuffle/transpose with MKLDNN (#1129) · 846f6bfe
      Nishant Patel authored
      * Reshape 4d
      
      * Support dimshuffles/transpose with MKLDNN
      
      * Addressing PR Feedback
      
      * Use Eigen for 3D dimshuffles
      846f6bfe
    • Pruthvi's avatar
      - Added workspace for rnn fprop kernel (#1153) · d861ba32
      Pruthvi authored
      - fixes segfault issue for GNMT model execution through ngraph-mxnet
      d861ba32
    • Matthew Brookhart's avatar
      working generate_adjoints (#1173) · aa36865c
      Matthew Brookhart authored
      aa36865c
    • Fenglei's avatar
      enable cudnn datatype support (#1122) · eef2b19d
      Fenglei authored
      * enable multi datatpye support for Cudnn. refactor binary ops using cudnn
      
      * fix bugs
      
      * add tests to skip list that CUDNN does not support
      
      * not int support on cudnn for backward pooling
      
      * no GPU.dot_4d_5d_multi_axis_big_fp64_VERY_SLOW test anymore
      
      * clang format
      
      * throw if datatype is int8 or int32 for backward pooling
      
      * comments
      
      * fix list in unit_test.manifest
      
      * add type support for alpha, beta
      
      * fix bugs
      
      * datatype support for alpha, beta
      
      * missing ()
      
      * clang format
      
      * batchnorm backward bug fix
      
      * remove debug info
      
      * change member function name to snake case. remove comments
      
      * use nullptr instead of NULL
      
      * code style, use cuDNN everywhere in comments
      
      * add cudnn host parameters memory manager.
      
      * change name to allocate_by_datatype
      
      * compiled
      
      * debug
      
      * fix bug: using list instead of vector, vector address will change each time it resize
      
      * add CUDNN_DATA_UINT8 and CUDNN_DATA_UINT8x4
      eef2b19d
    • Adam Straw's avatar
      constant broadcast folding (#1139) · 35b04e6a
      Adam Straw authored
      * constant broadcast folding
      
      * code review feedback
      35b04e6a
    • Chris Sullivan's avatar
      Add extra hash parameters to broadcast and max pool (#1163) · 13f00048
      Chris Sullivan authored
      * Move maxpool and avgpool into CudaKernelBuilder and add cache parameters to kernel name for broadcast which are required for correct lookup.
      
      * Styling.
      
      * Add space before avg_pool.
      13f00048
  6. 27 Jun, 2018 5 commits
  7. 26 Jun, 2018 10 commits
  8. 25 Jun, 2018 4 commits