Commits · 8f10251638b330b195ef01172498b810c3686ab5 · submodule / ngraph

14 Oct, 2018 1 commit

Improved AvgPool unit test coverage. Fixed small bug that was revealed. (#1813) · 67844320

gcwenger authored 6 years ago

* Improved AvgPool unit test coverage. Fixed small bug that was revealed.

* Renamed disabled unit tests to reflect new names.

* Ran clang-format on backend_test.in.cpp to fix format.

* Renamed cpu_results->backend_results in two unit tests.

67844320

12 Oct, 2018 1 commit

Support ArgMin and ArgMax for NVGPU Backend (#1737) · 6f30b32b

Ayan Moitra authored 6 years ago

* Project initialization commit

* Added unit tests for 3D tensors for argmax

* Refactored reduce to be used by argmax argmin. argmax argmin still has some issues. WIP

* [WIP]First working version of ArgMax ArgMin

* added reduce buffer for the cudnn api calls

* added reduce buffer for the cudnn api calls

* Further modifications. Using rvalues to pass enums to build reduce method

* more unit tests added

* Incorporate Fenglei's comments

* Incorporating Chris's first set of comments

* small change to test file

* Resolving clang issue that was causing argmin test to fail

* Incorporate Chris's  comments

* clang format issue

6f30b32b

09 Oct, 2018 1 commit
- enable some unit tests that were disabled. (#1766) · 369b95e3
  Robert Kimball authored 6 years ago
  
  369b95e3
08 Oct, 2018 3 commits

Remove a redundant declaration. · 9c564e8c
amy.zhuang authored 6 years ago

9c564e8c
Add in place concat optimization. · 2efc0065
amy.zhuang authored 6 years ago

2efc0065

Update pad on nvpgu (#1759) · 40ff77bd

Chris Sullivan authored 6 years ago

* Add pad with fill operator using the outward-in index pattern.

* Remove static pad and rename build_pad_dynamic -> build_pad. Update maxpool 1d padding.

* Formatting.

* Split build_pad_dynamic into build_pad and build_pad_fill.

* Add test coverage for fixed bug in op::Pad for gpu.

40ff77bd

04 Oct, 2018 1 commit

nvgpu maxpool bug fix (#1741) · 0051f201

Fenglei authored 6 years ago

* add a test failed on gpu, pass on cpu

* fixed bug

* get datatype size

* add descript for test

* update comment

* update comments and name

0051f201

02 Oct, 2018 1 commit
- IntelGPU backend: Datatype workaround for NCF model (#1729) · 0e008cc5
  shssf authored 6 years ago
  
  0e008cc5
29 Sep, 2018 1 commit
- Rename runtime::TensorView to runtime::Tensor (#1699) · 5fc7cf65
  Robert Kimball authored 6 years ago
```
* rename files

* rename runtime TensorView to Tensor

* rename HostTensorView to HostTensor
```
  5fc7cf65
28 Sep, 2018 3 commits
- IntelGPU backend: Use custom eltwise kernel for signed integers (#1716) · fd80d8ee
  shssf authored 6 years ago
  
  fd80d8ee
- IntelGPU backend: Avoid scalar to matrix operation in clDNN (#1715) · 8d70e2a3
  shssf authored 6 years ago
  
  8d70e2a3
- add nGraph dequantize op (#1700) · f6e4323f
  Adam Straw authored 6 years ago
```
* add ngraph dequantize op

* use a floating point offset

* code format

* reminder to fix serializer

* add serializer support

* add dequantize test cases

* cleanup and code format

* fix build warning for implicit conversion
```
  f6e4323f
26 Sep, 2018 1 commit

add nGraph quantize op (#1661) · d640fac3

Adam Straw authored 6 years ago

* adding nGraph Quantize op

* unit test failing for floating point exception

* unit test working in float

* unit test working in uint8

* improved type checking and polished unit test - passing

* quantized axes working

* inclusive project method

* add round mode

* TODO cleanup

* code format

* adding serializer support - fails build

* add serializer support

* make CPU quantize op work; new tests for int8, clamp)

* fix build failure

* fix GPU build issue

* fix GPU unit test manifest

* use quantized offset

* add is_quantized field to element::Type

* add reduce function to coordinate.hpp

d640fac3

18 Sep, 2018 2 commits

Fix bug in cpu_layout: explicitly handle , add test for coverage. (#1621) · 4782e060
Chris Sullivan authored 6 years ago

4782e060

nvgpu optimize reshape v3 (#1617) · 84de3bf4

Fenglei authored 6 years ago

* pass args instead of pointer to array

* add 3d tiled reshpae

* working version

* add shared mem version of 2d, 3d reshape

* remove unused code

* style

* resolve commits

* add test for 3D reshape, some 3D reshape will be treat as 2D

84de3bf4

13 Sep, 2018 1 commit

Handle unsupported op in nbench (#1531) · fe676f72

Robert Kimball authored 6 years ago

* add unsupported_op exception

* unsupported_op test

* add printout of unsupported op in model

* fix GPU dispatcher check

* fix test designation

* catch exceptions on single file runs too

* add unsupported_op exception where needed

* remove unsupported_op class

* add unassigned op exception

* add unit test

* catch unsupported op in nbench

* add cpu test back

* update all latest merges

* mode change

fe676f72

12 Sep, 2018 1 commit

Add in_place support for ReplaceSlice (#1559) · bb6de284

gaurides authored 6 years ago

* Add in_place suport for ReplaceSlice

* Add emit_replace_slice_inplace kernel

* changed file permissions to original

* Formatted code using maint/apply-code-format.sh

* Removed data type check and removed dead code

* Removed setting mkldnn_op(true). ReplaceSlice is not mkldnn op

bb6de284

07 Sep, 2018 1 commit
- IntelGPU backend: Reshape operation optimization (#1566) · 3609cc74
  shssf authored 6 years ago
  
  3609cc74
06 Sep, 2018 1 commit

TopK (w/ArgMax, ArgMin python wrapper) (#1560) · 3548772b

Sang Ik Lee authored 6 years ago

* Implement TopK.

* Update python wrappers for TopK, ArgMin and ArgMax.

* Address some reviewer comments.

* Add type property check tests for TopK.
Set correct TopK behavior for K==0.

* TopK: Add 1d and 3d unit tests.

* Address more reviewer comments.

* Apply code style.

3548772b

04 Sep, 2018 2 commits

nvgpu reduce to scalar optimization (#1491) · 5f40d957

Fenglei authored 6 years ago

* add cuda reduce

* clang format

* fix bugs

* fix bug

* add 1d reduce

* clang format

* fix bugs

* unroll loop

* remove debug info

* revert tests

* unroll 1D reduce op

* add comments

* using cudnn for nd to scalar reduction

* remove cuda 1d reduction since cudnn version is faster

* remove 1D kernel

* fix bugs

* 1d multi block size

* remove debug

* change kernel name

* add reduce to scalar optimization, add test

* fix bugs and tune parameters

* clang format

* update comments

* update comments

* update comments

* clang format

* update comments

* remove wrong comments, apply clang format

* resolve Bob's comment

* clang format

* pass shared mem size from cuLaunchKernel, set unroll loop size through host code

* remove unused code.clang format

* change reduce to thread with shfl for each warp first

* add seed

* unroll size

5f40d957

IntelGPU backend: Sum operation optimization (#1545) · ed22bf6c

shssf authored 6 years ago

* IntelGPU backend: Sum operation optimization

* PR1545. Comments addressed. Test added. Helper function refactored.

ed22bf6c

03 Sep, 2018 1 commit
- TEST: simple test with one constant to two outputs (#1537) · b9cbd039
  shssf authored 6 years ago
  
  b9cbd039
29 Aug, 2018 1 commit

Change license header to use single-line comment (#1508) · a17ec605

Robert Kimball authored 6 years ago

* use line comments instead of multiline comments for license header

* update more

* update new files

* more header updates

* style

a17ec605

27 Aug, 2018 1 commit
- normalize comments (#1492) · 9c48c327
  Robert Kimball authored 6 years ago
```
* normalize comments

* address review comments
```
  9c48c327
22 Aug, 2018 1 commit
- ArgMax (#1453) · 822aa81d
  Nick Korovaiko authored 6 years ago
```
* argmax

* manifests and serailizer
```
  822aa81d
21 Aug, 2018 1 commit

ArgMin (#1435) · 951e77b4

Nick Korovaiko authored 6 years ago

* argmin

* address feedbacka argmin

* add new lines

*  addnew lines

* address adam's nitpicks

* scott's feedback

* fix unit tests

951e77b4

13 Aug, 2018 2 commits

enable parameter validation for all unit tests (#1385) · 24b41844
Robert Kimball authored 6 years ago
```
* enable parameter validation for all unit tests
```
24b41844

Remove validation checks from performance critical code paths and ski… (#1327) · af1201fd

Jayaram Bobba authored 6 years ago

* Remove validation checks from performance critical code paths and skip layout propagation to inputs

* Add templated call method to backend for cases where users need input validation

* Added missing return

* fix python api compile error due to ngraph api change.

* disable parameter validation in python api

* make validating call a separate call rather than templated

af1201fd

08 Aug, 2018 1 commit
- add missing unit tests (#1373) · 104fd3ee
  Robert Kimball authored 6 years ago
  
  104fd3ee
02 Aug, 2018 1 commit

LRN (#1282) · 237c4803

Nick Korovaiko authored 6 years ago

* lrn init

* fix comment

* mkldnn lrn (#1295)

* add serializer + fix compiler warnings

237c4803

26 Jul, 2018 1 commit

IntelGPU backend: broadcast operation (#1252) · d4349db8

shssf authored 6 years ago

* IntelGPUBackend: Broadcast operation

* IntelGPUBackend: more tests for Broadcast operation

* Move macro to static C function in Broadcast tests

d4349db8

18 Jul, 2018 2 commits
- Pool tests updated to check all backends (#1245) · e2255fbd
  Robert Kimball authored 6 years ago
```
* make pool test check backends other than CPU

* more unit test cleanup
```
  e2255fbd
- Fix incorrect divide-by-zero test (#1243) · 7c7c5d62
  Jaikrishnan Menon authored 6 years ago
  
  7c7c5d62
06 Jul, 2018 1 commit

Use mkldnn reorder only for transpose/dimshuffles. (#1188) · 5be99c0a

Nishant Patel authored 6 years ago

* Usage of mkldnn reshape updated

* update reshape condition for mkldnn

* Add a test case and order in which conditions are checked

5be99c0a

02 Jul, 2018 1 commit

move sigmoid to core fusion (#1132) · d05b5e39

Sandeep authored 6 years ago

* declare sigmoid for core fusion

* add simple test for sigmoid

* info fusion status

* cp op as main op

* builds as expected

* move sigmoid fusion code

* add reference kernel

* sigmoid bprop reference kernel and clang-format

* add delta to bprop

* fprop called

* compiles bprop

* move tests

* serializer support

* address comments in code

* add doc

* naming similar to core ops

* fix failing test

* fix failing test

* address clang issue

* more changes

* change test macro

d05b5e39

28 Jun, 2018 1 commit

Support dimshuffle/transpose with MKLDNN (#1129) · 846f6bfe

Nishant Patel authored 6 years ago

* Reshape 4d

* Support dimshuffles/transpose with MKLDNN

* Addressing PR Feedback

* Use Eigen for 3D dimshuffles

846f6bfe

20 Jun, 2018 1 commit
- Fix two bugs with concat for 0-size tensors (#1120) · 22e783ff
  Adam Procter authored 6 years ago
```
* Fix bug with concat for 0-size tensors

* Simplify test for zero-length axes, per PR comments
```
  22e783ff
15 Jun, 2018 1 commit
- move tbb test from backend_test to cpu_test because it is CPU only (#1102) · 7d6a0d1c
  Robert Kimball authored 6 years ago
  
  7d6a0d1c
12 Jun, 2018 1 commit

CUDA softmax kernel and broadcast kernel support for multiple non-consecutive axes (#1070) · 83e6aa5f

Chris Sullivan authored 6 years ago

* Added op::ReplaceSlice and enabled respective tests.

* div64 -> division_by_invariant_multiplication

* Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.

* Forgot to fix lambda for workspace allocations to match that of argspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Adjust row_major_strides to avoid reversed-copy.

* Moved declaration out of loop for clang.

* Moved gpu_shape to gpu transformer.

* Removed no longer necessary headers.

* Added stdexcept header to gpu_shape.hpp

* Coordinate->GPUShape

* Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.

* Refactor allocations to make use of primitive emitter.
Now memory primitives are registered at compile time and
the gpu memory address is resolved at runtime by ivoking
the primitive.

* Changed check on 64bit shape to check if high bits are set.

* Added const qualifier to data being copied in GPUAllocator::reserve_argspace

* Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.

* Forgot to remove no longer necessary buffer freeing from op emitters.

* Removed replace slice.

* Removed more replace_slice diffs.

* Updated replace_slice op to utilize GPUShape and GPUMemoryManager.

* Added back missing changes after timeline resolution.

* Added spacing between functions in GPUShape and boolean operators in shape.hpp.

* Template parameters are UPPER_SNAKE_CASE.

* Added unit tests for GPUMemoryManager and added checks that ensure the
device memory is allocated prior to address resolution by the memory_primitives.
Also exposed the allocation size of the memory manager.

* Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
This should be 64bits wide regardless of the underlying value_type of the ShapeType.

* Upstreaming changes to shape_size (which returns size_t).

* cuDNN softmax impl. for all axis activation.

* Added catch for per-axis activations.

* Removed commended headers.

* Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.

* Add softmax cuda kernel. It relies on atomic memory addition to global
memory, this will add contention and should be optimized in the
future. A multilevel reduction can be found in
cs/gpu_softmax_cuda_shfl but it requires some further engineering.

* Refactored reduce coordinate transform code into a helper and applied it to broadcast.
Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes.

* Removed change to data_types variable and updated/removed comments.

* Refactored softmax into the emission of two fused elementwise collective ops.
Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast.

* Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.

* GPU workspace memory is zero initialized by default but can be turned off if desired.

* Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE,
to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel.

* Renamed variables and updated a comment.

* Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced.

* Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>.
This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a
serialized graph manifest, we don't depend on AxisSet.

* softmax -> broadcast in build_broadcast.

* Separate elementwise and elementwise_collective.

83e6aa5f

06 Jun, 2018 1 commit
- Support 3-D pool with mkldnn (#1079) · bb5c7f07
  Nishant Patel authored 6 years ago
```
* Support 3-D pool with mkldnn

* Move execute() to test_tools.hpp
```
  bb5c7f07