1. 14 Oct, 2018 1 commit
  2. 12 Oct, 2018 1 commit
    • Ayan Moitra's avatar
      Support ArgMin and ArgMax for NVGPU Backend (#1737) · 6f30b32b
      Ayan Moitra authored
      * Project initialization commit
      
      * Added unit tests for 3D tensors for argmax
      
      * Refactored reduce to be used by argmax argmin. argmax argmin still has some issues. WIP
      
      * [WIP]First working version of ArgMax ArgMin
      
      * added reduce buffer for the cudnn api calls
      
      * added reduce buffer for the cudnn api calls
      
      * Further modifications. Using rvalues to pass enums to build reduce method
      
      * more unit tests added
      
      * Incorporate Fenglei's comments
      
      * Incorporating Chris's first set of comments
      
      * small change to test file
      
      * Resolving clang issue that was causing argmin test to fail
      
      * Incorporate Chris's  comments
      
      * clang format issue
      6f30b32b
  3. 09 Oct, 2018 1 commit
  4. 08 Oct, 2018 3 commits
  5. 04 Oct, 2018 1 commit
    • Fenglei's avatar
      nvgpu maxpool bug fix (#1741) · 0051f201
      Fenglei authored
      * add a test failed on gpu, pass on cpu
      
      * fixed bug
      
      * get datatype size
      
      * add descript for test
      
      * update comment
      
      * update comments and name
      0051f201
  6. 02 Oct, 2018 1 commit
  7. 29 Sep, 2018 1 commit
  8. 28 Sep, 2018 3 commits
  9. 26 Sep, 2018 1 commit
    • Adam Straw's avatar
      add nGraph quantize op (#1661) · d640fac3
      Adam Straw authored
      * adding nGraph Quantize op
      
      * unit test failing for floating point exception
      
      * unit test working in float
      
      * unit test working in uint8
      
      * improved type checking and polished unit test - passing
      
      * quantized axes working
      
      * inclusive project method
      
      * add round mode
      
      * TODO cleanup
      
      * code format
      
      * adding serializer support - fails build
      
      * add serializer support
      
      * make CPU quantize op work; new tests for int8, clamp)
      
      * fix build failure
      
      * fix GPU build issue
      
      * fix GPU unit test manifest
      
      * use quantized offset
      
      * add is_quantized field to element::Type
      
      * add reduce function to coordinate.hpp
      d640fac3
  10. 18 Sep, 2018 2 commits
  11. 13 Sep, 2018 1 commit
    • Robert Kimball's avatar
      Handle unsupported op in nbench (#1531) · fe676f72
      Robert Kimball authored
      * add unsupported_op exception
      
      * unsupported_op test
      
      * add printout of unsupported op in model
      
      * fix GPU dispatcher check
      
      * fix test designation
      
      * catch exceptions on single file runs too
      
      * add unsupported_op exception where needed
      
      * remove unsupported_op class
      
      * add unassigned op exception
      
      * add unit test
      
      * catch unsupported op in nbench
      
      * add cpu test back
      
      * update all latest merges
      
      * mode change
      fe676f72
  12. 12 Sep, 2018 1 commit
    • gaurides's avatar
      Add in_place support for ReplaceSlice (#1559) · bb6de284
      gaurides authored
      * Add in_place suport for ReplaceSlice
      
      * Add emit_replace_slice_inplace kernel
      
      * changed file permissions to original
      
      * Formatted code using maint/apply-code-format.sh
      
      * Removed data type check and removed dead code
      
      * Removed setting mkldnn_op(true). ReplaceSlice is not mkldnn op
      bb6de284
  13. 07 Sep, 2018 1 commit
  14. 06 Sep, 2018 1 commit
    • Sang Ik Lee's avatar
      TopK (w/ArgMax, ArgMin python wrapper) (#1560) · 3548772b
      Sang Ik Lee authored
      * Implement TopK.
      
      * Update python wrappers for TopK, ArgMin and ArgMax.
      
      * Address some reviewer comments.
      
      * Add type property check tests for TopK.
      Set correct TopK behavior for K==0.
      
      * TopK: Add 1d and 3d unit tests.
      
      * Address more reviewer comments.
      
      * Apply code style.
      3548772b
  15. 04 Sep, 2018 2 commits
    • Fenglei's avatar
      nvgpu reduce to scalar optimization (#1491) · 5f40d957
      Fenglei authored
      * add cuda reduce
      
      * clang format
      
      * fix bugs
      
      * fix bug
      
      * add 1d reduce
      
      * clang format
      
      * fix bugs
      
      * unroll loop
      
      * remove debug info
      
      * revert tests
      
      * unroll 1D reduce op
      
      * add comments
      
      * using cudnn for nd to scalar reduction
      
      * remove cuda 1d reduction since cudnn version is faster
      
      * remove 1D kernel
      
      * fix bugs
      
      * 1d multi block size
      
      * remove debug
      
      * change kernel name
      
      * add reduce to scalar optimization, add test
      
      * fix bugs and tune parameters
      
      * clang format
      
      * update comments
      
      * update comments
      
      * update comments
      
      * clang format
      
      * update comments
      
      * remove wrong comments, apply clang format
      
      * resolve Bob's comment
      
      * clang format
      
      * pass shared mem size from cuLaunchKernel, set unroll loop size through host code
      
      * remove unused code.clang format
      
      * change reduce to thread with shfl for each warp first
      
      * add seed
      
      * unroll size
      5f40d957
    • shssf's avatar
      IntelGPU backend: Sum operation optimization (#1545) · ed22bf6c
      shssf authored
      * IntelGPU backend: Sum operation optimization
      
      * PR1545. Comments addressed. Test added. Helper function refactored.
      ed22bf6c
  16. 03 Sep, 2018 1 commit
  17. 29 Aug, 2018 1 commit
  18. 27 Aug, 2018 1 commit
  19. 22 Aug, 2018 1 commit
  20. 21 Aug, 2018 1 commit
    • Nick Korovaiko's avatar
      ArgMin (#1435) · 951e77b4
      Nick Korovaiko authored
      * argmin
      
      * address feedbacka argmin
      
      * add new lines
      
      *  addnew lines
      
      * address adam's nitpicks
      
      * scott's feedback
      
      * fix unit tests
      951e77b4
  21. 13 Aug, 2018 2 commits
  22. 08 Aug, 2018 1 commit
  23. 02 Aug, 2018 1 commit
    • Nick Korovaiko's avatar
      LRN (#1282) · 237c4803
      Nick Korovaiko authored
      * lrn init
      
      * fix comment
      
      * mkldnn lrn (#1295)
      
      * add serializer + fix compiler warnings
      237c4803
  24. 26 Jul, 2018 1 commit
  25. 18 Jul, 2018 2 commits
  26. 06 Jul, 2018 1 commit
  27. 02 Jul, 2018 1 commit
    • Sandeep's avatar
      move sigmoid to core fusion (#1132) · d05b5e39
      Sandeep authored
      * declare sigmoid for core fusion
      
      * add simple test for sigmoid
      
      * info fusion status
      
      * cp op as main op
      
      * builds as expected
      
      * move sigmoid fusion code
      
      * add reference kernel
      
      * sigmoid bprop reference kernel and clang-format
      
      * add delta to bprop
      
      * fprop called
      
      * compiles bprop
      
      * move tests
      
      * serializer support
      
      * address comments in code
      
      * add doc
      
      * naming similar to core ops
      
      * fix failing test
      
      * fix failing test
      
      * address clang issue
      
      * more changes
      
      * change test macro
      d05b5e39
  28. 28 Jun, 2018 1 commit
  29. 20 Jun, 2018 1 commit
  30. 15 Jun, 2018 1 commit
  31. 12 Jun, 2018 1 commit
    • Chris Sullivan's avatar
      CUDA softmax kernel and broadcast kernel support for multiple non-consecutive axes (#1070) · 83e6aa5f
      Chris Sullivan authored
      * Added op::ReplaceSlice and enabled respective tests.
      
      * div64 -> division_by_invariant_multiplication
      
      * Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.
      
      * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      Shape is now implicitly convertable to GPUShape.
      
      * Updated shape helpers signature and add conversion operators/constructors for GPUShape.
      
      * Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.
      
      * Forgot to fix lambda for workspace allocations to match that of argspace allocations.
      
      * Added GPUShape and reworked Shape helpers to be
      compatible with different shape types.
      Shape is now implicitly convertable to GPUShape.
      
      * Updated shape helpers signature and add conversion operators/constructors for GPUShape.
      
      * Adjust row_major_strides to avoid reversed-copy.
      
      * Moved declaration out of loop for clang.
      
      * Moved gpu_shape to gpu transformer.
      
      * Removed no longer necessary headers.
      
      * Added stdexcept header to gpu_shape.hpp
      
      * Coordinate->GPUShape
      
      * Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.
      
      * Refactor allocations to make use of primitive emitter.
      Now memory primitives are registered at compile time and
      the gpu memory address is resolved at runtime by ivoking
      the primitive.
      
      * Changed check on 64bit shape to check if high bits are set.
      
      * Added const qualifier to data being copied in GPUAllocator::reserve_argspace
      
      * Added const qualifier to data being copied in GPUAllocator::reserve_argspace
      
      * Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.
      
      * Forgot to remove no longer necessary buffer freeing from op emitters.
      
      * Removed replace slice.
      
      * Removed more replace_slice diffs.
      
      * Updated replace_slice op to utilize GPUShape and GPUMemoryManager.
      
      * Added back missing changes after timeline resolution.
      
      * Added spacing between functions in GPUShape and boolean operators in shape.hpp.
      
      * Template parameters are UPPER_SNAKE_CASE.
      
      * Added unit tests for GPUMemoryManager and added checks that ensure the
      device memory is allocated prior to address resolution by the memory_primitives.
      Also exposed the allocation size of the memory manager.
      
      * Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
      This should be 64bits wide regardless of the underlying value_type of the ShapeType.
      
      * Upstreaming changes to shape_size (which returns size_t).
      
      * cuDNN softmax impl. for all axis activation.
      
      * Added catch for per-axis activations.
      
      * Removed commended headers.
      
      * Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.
      
      * Add softmax cuda kernel. It relies on atomic memory addition to global
      memory, this will add contention and should be optimized in the
      future. A multilevel reduction can be found in
      cs/gpu_softmax_cuda_shfl but it requires some further engineering.
      
      * Refactored reduce coordinate transform code into a helper and applied it to broadcast.
      Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes.
      
      * Removed change to data_types variable and updated/removed comments.
      
      * Refactored softmax into the emission of two fused elementwise collective ops.
      Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast.
      
      * Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.
      
      * GPU workspace memory is zero initialized by default but can be turned off if desired.
      
      * Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE,
      to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel.
      
      * Renamed variables and updated a comment.
      
      * Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced.
      
      * Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>.
      This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a
      serialized graph manifest, we don't depend on AxisSet.
      
      * softmax -> broadcast in build_broadcast.
      
      * Separate elementwise and elementwise_collective.
      83e6aa5f
  32. 06 Jun, 2018 1 commit