- 14 Oct, 2018 1 commit
-
-
gcwenger authored
* Improved AvgPool unit test coverage. Fixed small bug that was revealed. * Renamed disabled unit tests to reflect new names. * Ran clang-format on backend_test.in.cpp to fix format. * Renamed cpu_results->backend_results in two unit tests.
-
- 12 Oct, 2018 1 commit
-
-
Ayan Moitra authored
* Project initialization commit * Added unit tests for 3D tensors for argmax * Refactored reduce to be used by argmax argmin. argmax argmin still has some issues. WIP * [WIP]First working version of ArgMax ArgMin * added reduce buffer for the cudnn api calls * added reduce buffer for the cudnn api calls * Further modifications. Using rvalues to pass enums to build reduce method * more unit tests added * Incorporate Fenglei's comments * Incorporating Chris's first set of comments * small change to test file * Resolving clang issue that was causing argmin test to fail * Incorporate Chris's comments * clang format issue
-
- 09 Oct, 2018 1 commit
-
-
Robert Kimball authored
-
- 08 Oct, 2018 3 commits
-
-
amy.zhuang authored
-
amy.zhuang authored
-
Chris Sullivan authored
* Add pad with fill operator using the outward-in index pattern. * Remove static pad and rename build_pad_dynamic -> build_pad. Update maxpool 1d padding. * Formatting. * Split build_pad_dynamic into build_pad and build_pad_fill. * Add test coverage for fixed bug in op::Pad for gpu.
-
- 04 Oct, 2018 1 commit
-
-
Fenglei authored
* add a test failed on gpu, pass on cpu * fixed bug * get datatype size * add descript for test * update comment * update comments and name
-
- 02 Oct, 2018 1 commit
-
-
shssf authored
-
- 29 Sep, 2018 1 commit
-
-
Robert Kimball authored
* rename files * rename runtime TensorView to Tensor * rename HostTensorView to HostTensor
-
- 28 Sep, 2018 3 commits
-
-
shssf authored
-
shssf authored
-
Adam Straw authored
* add ngraph dequantize op * use a floating point offset * code format * reminder to fix serializer * add serializer support * add dequantize test cases * cleanup and code format * fix build warning for implicit conversion
-
- 26 Sep, 2018 1 commit
-
-
Adam Straw authored
* adding nGraph Quantize op * unit test failing for floating point exception * unit test working in float * unit test working in uint8 * improved type checking and polished unit test - passing * quantized axes working * inclusive project method * add round mode * TODO cleanup * code format * adding serializer support - fails build * add serializer support * make CPU quantize op work; new tests for int8, clamp) * fix build failure * fix GPU build issue * fix GPU unit test manifest * use quantized offset * add is_quantized field to element::Type * add reduce function to coordinate.hpp
-
- 18 Sep, 2018 2 commits
-
-
Chris Sullivan authored
-
Fenglei authored
* pass args instead of pointer to array * add 3d tiled reshpae * working version * add shared mem version of 2d, 3d reshape * remove unused code * style * resolve commits * add test for 3D reshape, some 3D reshape will be treat as 2D
-
- 13 Sep, 2018 1 commit
-
-
Robert Kimball authored
* add unsupported_op exception * unsupported_op test * add printout of unsupported op in model * fix GPU dispatcher check * fix test designation * catch exceptions on single file runs too * add unsupported_op exception where needed * remove unsupported_op class * add unassigned op exception * add unit test * catch unsupported op in nbench * add cpu test back * update all latest merges * mode change
-
- 12 Sep, 2018 1 commit
-
-
gaurides authored
* Add in_place suport for ReplaceSlice * Add emit_replace_slice_inplace kernel * changed file permissions to original * Formatted code using maint/apply-code-format.sh * Removed data type check and removed dead code * Removed setting mkldnn_op(true). ReplaceSlice is not mkldnn op
-
- 07 Sep, 2018 1 commit
-
-
shssf authored
-
- 06 Sep, 2018 1 commit
-
-
Sang Ik Lee authored
* Implement TopK. * Update python wrappers for TopK, ArgMin and ArgMax. * Address some reviewer comments. * Add type property check tests for TopK. Set correct TopK behavior for K==0. * TopK: Add 1d and 3d unit tests. * Address more reviewer comments. * Apply code style.
-
- 04 Sep, 2018 2 commits
-
-
Fenglei authored
* add cuda reduce * clang format * fix bugs * fix bug * add 1d reduce * clang format * fix bugs * unroll loop * remove debug info * revert tests * unroll 1D reduce op * add comments * using cudnn for nd to scalar reduction * remove cuda 1d reduction since cudnn version is faster * remove 1D kernel * fix bugs * 1d multi block size * remove debug * change kernel name * add reduce to scalar optimization, add test * fix bugs and tune parameters * clang format * update comments * update comments * update comments * clang format * update comments * remove wrong comments, apply clang format * resolve Bob's comment * clang format * pass shared mem size from cuLaunchKernel, set unroll loop size through host code * remove unused code.clang format * change reduce to thread with shfl for each warp first * add seed * unroll size
-
shssf authored
* IntelGPU backend: Sum operation optimization * PR1545. Comments addressed. Test added. Helper function refactored.
-
- 03 Sep, 2018 1 commit
-
-
shssf authored
-
- 29 Aug, 2018 1 commit
-
-
Robert Kimball authored
* use line comments instead of multiline comments for license header * update more * update new files * more header updates * style
-
- 27 Aug, 2018 1 commit
-
-
Robert Kimball authored
* normalize comments * address review comments
-
- 22 Aug, 2018 1 commit
-
-
Nick Korovaiko authored
* argmax * manifests and serailizer
-
- 21 Aug, 2018 1 commit
-
-
Nick Korovaiko authored
* argmin * address feedbacka argmin * add new lines * addnew lines * address adam's nitpicks * scott's feedback * fix unit tests
-
- 13 Aug, 2018 2 commits
-
-
Robert Kimball authored
* enable parameter validation for all unit tests
-
Jayaram Bobba authored
* Remove validation checks from performance critical code paths and skip layout propagation to inputs * Add templated call method to backend for cases where users need input validation * Added missing return * fix python api compile error due to ngraph api change. * disable parameter validation in python api * make validating call a separate call rather than templated
-
- 08 Aug, 2018 1 commit
-
-
Robert Kimball authored
-
- 02 Aug, 2018 1 commit
-
-
Nick Korovaiko authored
* lrn init * fix comment * mkldnn lrn (#1295) * add serializer + fix compiler warnings
-
- 26 Jul, 2018 1 commit
-
-
shssf authored
* IntelGPUBackend: Broadcast operation * IntelGPUBackend: more tests for Broadcast operation * Move macro to static C function in Broadcast tests
-
- 18 Jul, 2018 2 commits
-
-
Robert Kimball authored
* make pool test check backends other than CPU * more unit test cleanup
-
Jaikrishnan Menon authored
-
- 06 Jul, 2018 1 commit
-
-
Nishant Patel authored
* Usage of mkldnn reshape updated * update reshape condition for mkldnn * Add a test case and order in which conditions are checked
-
- 02 Jul, 2018 1 commit
-
-
Sandeep authored
* declare sigmoid for core fusion * add simple test for sigmoid * info fusion status * cp op as main op * builds as expected * move sigmoid fusion code * add reference kernel * sigmoid bprop reference kernel and clang-format * add delta to bprop * fprop called * compiles bprop * move tests * serializer support * address comments in code * add doc * naming similar to core ops * fix failing test * fix failing test * address clang issue * more changes * change test macro
-
- 28 Jun, 2018 1 commit
-
-
Nishant Patel authored
* Reshape 4d * Support dimshuffles/transpose with MKLDNN * Addressing PR Feedback * Use Eigen for 3D dimshuffles
-
- 20 Jun, 2018 1 commit
-
-
Adam Procter authored
* Fix bug with concat for 0-size tensors * Simplify test for zero-length axes, per PR comments
-
- 15 Jun, 2018 1 commit
-
-
Robert Kimball authored
-
- 12 Jun, 2018 1 commit
-
-
Chris Sullivan authored
* Added op::ReplaceSlice and enabled respective tests. * div64 -> division_by_invariant_multiplication * Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations. * Added GPUShape and reworked Shape helpers to be compatible with different shape types. Shape is now implicitly convertable to GPUShape. * Updated shape helpers signature and add conversion operators/constructors for GPUShape. * Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators. * Forgot to fix lambda for workspace allocations to match that of argspace allocations. * Added GPUShape and reworked Shape helpers to be compatible with different shape types. Shape is now implicitly convertable to GPUShape. * Updated shape helpers signature and add conversion operators/constructors for GPUShape. * Adjust row_major_strides to avoid reversed-copy. * Moved declaration out of loop for clang. * Moved gpu_shape to gpu transformer. * Removed no longer necessary headers. * Added stdexcept header to gpu_shape.hpp * Coordinate->GPUShape * Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager. * Refactor allocations to make use of primitive emitter. Now memory primitives are registered at compile time and the gpu memory address is resolved at runtime by ivoking the primitive. * Changed check on 64bit shape to check if high bits are set. * Added const qualifier to data being copied in GPUAllocator::reserve_argspace * Added const qualifier to data being copied in GPUAllocator::reserve_argspace * Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time. * Forgot to remove no longer necessary buffer freeing from op emitters. * Removed replace slice. * Removed more replace_slice diffs. * Updated replace_slice op to utilize GPUShape and GPUMemoryManager. * Added back missing changes after timeline resolution. * Added spacing between functions in GPUShape and boolean operators in shape.hpp. * Template parameters are UPPER_SNAKE_CASE. * Added unit tests for GPUMemoryManager and added checks that ensure the device memory is allocated prior to address resolution by the memory_primitives. Also exposed the allocation size of the memory manager. * Return type of shape_size should be large enough to encapsulate the full stride of the tensor. This should be 64bits wide regardless of the underlying value_type of the ShapeType. * Upstreaming changes to shape_size (which returns size_t). * cuDNN softmax impl. for all axis activation. * Added catch for per-axis activations. * Removed commended headers. * Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation. * Add softmax cuda kernel. It relies on atomic memory addition to global memory, this will add contention and should be optimized in the future. A multilevel reduction can be found in cs/gpu_softmax_cuda_shfl but it requires some further engineering. * Refactored reduce coordinate transform code into a helper and applied it to broadcast. Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes. * Removed change to data_types variable and updated/removed comments. * Refactored softmax into the emission of two fused elementwise collective ops. Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast. * Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive. * GPU workspace memory is zero initialized by default but can be turned off if desired. * Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE, to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel. * Renamed variables and updated a comment. * Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced. * Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>. This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a serialized graph manifest, we don't depend on AxisSet. * softmax -> broadcast in build_broadcast. * Separate elementwise and elementwise_collective.
-
- 06 Jun, 2018 1 commit
-
-
Nishant Patel authored
* Support 3-D pool with mkldnn * Move execute() to test_tools.hpp
-