Commits · 9afbc8917585a029dc56466c3a0469f08f10a2eb · submodule / ngraph

29 Aug, 2018 1 commit

Change license header to use single-line comment (#1508) · a17ec605

Robert Kimball authored 6 years ago

* use line comments instead of multiline comments for license header

* update more

* update new files

* more header updates

* style

a17ec605

27 Aug, 2018 1 commit
- normalize comments (#1492) · 9c48c327
  Robert Kimball authored 6 years ago
```
* normalize comments

* address review comments
```
  9c48c327
22 Aug, 2018 1 commit
- ArgMax (#1453) · 822aa81d
  Nick Korovaiko authored 6 years ago
```
* argmax

* manifests and serailizer
```
  822aa81d
21 Aug, 2018 1 commit

ArgMin (#1435) · 951e77b4

Nick Korovaiko authored 6 years ago

* argmin

* address feedbacka argmin

* add new lines

*  addnew lines

* address adam's nitpicks

* scott's feedback

* fix unit tests

951e77b4

13 Aug, 2018 2 commits

enable parameter validation for all unit tests (#1385) · 24b41844
Robert Kimball authored 6 years ago
```
* enable parameter validation for all unit tests
```
24b41844

Remove validation checks from performance critical code paths and ski… (#1327) · af1201fd

Jayaram Bobba authored 6 years ago

* Remove validation checks from performance critical code paths and skip layout propagation to inputs

* Add templated call method to backend for cases where users need input validation

* Added missing return

* fix python api compile error due to ngraph api change.

* disable parameter validation in python api

* make validating call a separate call rather than templated

af1201fd

08 Aug, 2018 1 commit
- add missing unit tests (#1373) · 104fd3ee
  Robert Kimball authored 6 years ago
  
  104fd3ee
02 Aug, 2018 1 commit

LRN (#1282) · 237c4803

Nick Korovaiko authored 6 years ago

* lrn init

* fix comment

* mkldnn lrn (#1295)

* add serializer + fix compiler warnings

237c4803

26 Jul, 2018 1 commit

IntelGPU backend: broadcast operation (#1252) · d4349db8

shssf authored 6 years ago

* IntelGPUBackend: Broadcast operation

* IntelGPUBackend: more tests for Broadcast operation

* Move macro to static C function in Broadcast tests

d4349db8

18 Jul, 2018 2 commits
- Pool tests updated to check all backends (#1245) · e2255fbd
  Robert Kimball authored 6 years ago
```
* make pool test check backends other than CPU

* more unit test cleanup
```
  e2255fbd
- Fix incorrect divide-by-zero test (#1243) · 7c7c5d62
  Jaikrishnan Menon authored 6 years ago
  
  7c7c5d62
06 Jul, 2018 1 commit

Use mkldnn reorder only for transpose/dimshuffles. (#1188) · 5be99c0a

Nishant Patel authored 6 years ago

* Usage of mkldnn reshape updated

* update reshape condition for mkldnn

* Add a test case and order in which conditions are checked

5be99c0a

02 Jul, 2018 1 commit

move sigmoid to core fusion (#1132) · d05b5e39

Sandeep authored 6 years ago

* declare sigmoid for core fusion

* add simple test for sigmoid

* info fusion status

* cp op as main op

* builds as expected

* move sigmoid fusion code

* add reference kernel

* sigmoid bprop reference kernel and clang-format

* add delta to bprop

* fprop called

* compiles bprop

* move tests

* serializer support

* address comments in code

* add doc

* naming similar to core ops

* fix failing test

* fix failing test

* address clang issue

* more changes

* change test macro

d05b5e39

28 Jun, 2018 1 commit

Support dimshuffle/transpose with MKLDNN (#1129) · 846f6bfe

Nishant Patel authored 6 years ago

* Reshape 4d

* Support dimshuffles/transpose with MKLDNN

* Addressing PR Feedback

* Use Eigen for 3D dimshuffles

846f6bfe

20 Jun, 2018 1 commit
- Fix two bugs with concat for 0-size tensors (#1120) · 22e783ff
  Adam Procter authored 6 years ago
```
* Fix bug with concat for 0-size tensors

* Simplify test for zero-length axes, per PR comments
```
  22e783ff
15 Jun, 2018 1 commit
- move tbb test from backend_test to cpu_test because it is CPU only (#1102) · 7d6a0d1c
  Robert Kimball authored 6 years ago
  
  7d6a0d1c
12 Jun, 2018 1 commit

CUDA softmax kernel and broadcast kernel support for multiple non-consecutive axes (#1070) · 83e6aa5f

Chris Sullivan authored 6 years ago

* Added op::ReplaceSlice and enabled respective tests.

* div64 -> division_by_invariant_multiplication

* Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.

* Forgot to fix lambda for workspace allocations to match that of argspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Adjust row_major_strides to avoid reversed-copy.

* Moved declaration out of loop for clang.

* Moved gpu_shape to gpu transformer.

* Removed no longer necessary headers.

* Added stdexcept header to gpu_shape.hpp

* Coordinate->GPUShape

* Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.

* Refactor allocations to make use of primitive emitter.
Now memory primitives are registered at compile time and
the gpu memory address is resolved at runtime by ivoking
the primitive.

* Changed check on 64bit shape to check if high bits are set.

* Added const qualifier to data being copied in GPUAllocator::reserve_argspace

* Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.

* Forgot to remove no longer necessary buffer freeing from op emitters.

* Removed replace slice.

* Removed more replace_slice diffs.

* Updated replace_slice op to utilize GPUShape and GPUMemoryManager.

* Added back missing changes after timeline resolution.

* Added spacing between functions in GPUShape and boolean operators in shape.hpp.

* Template parameters are UPPER_SNAKE_CASE.

* Added unit tests for GPUMemoryManager and added checks that ensure the
device memory is allocated prior to address resolution by the memory_primitives.
Also exposed the allocation size of the memory manager.

* Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
This should be 64bits wide regardless of the underlying value_type of the ShapeType.

* Upstreaming changes to shape_size (which returns size_t).

* cuDNN softmax impl. for all axis activation.

* Added catch for per-axis activations.

* Removed commended headers.

* Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.

* Add softmax cuda kernel. It relies on atomic memory addition to global
memory, this will add contention and should be optimized in the
future. A multilevel reduction can be found in
cs/gpu_softmax_cuda_shfl but it requires some further engineering.

* Refactored reduce coordinate transform code into a helper and applied it to broadcast.
Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes.

* Removed change to data_types variable and updated/removed comments.

* Refactored softmax into the emission of two fused elementwise collective ops.
Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast.

* Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.

* GPU workspace memory is zero initialized by default but can be turned off if desired.

* Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE,
to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel.

* Renamed variables and updated a comment.

* Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced.

* Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>.
This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a
serialized graph manifest, we don't depend on AxisSet.

* softmax -> broadcast in build_broadcast.

* Separate elementwise and elementwise_collective.

83e6aa5f

06 Jun, 2018 1 commit
- Support 3-D pool with mkldnn (#1079) · bb5c7f07
  Nishant Patel authored 6 years ago
```
* Support 3-D pool with mkldnn

* Move execute() to test_tools.hpp
```
  bb5c7f07
04 Jun, 2018 1 commit

Modernize cmake usage (#1032) · eef750df

Robert Kimball authored 6 years ago

* Update cmake files to more modern approach

* disable building libraries that are not required

* handle more build cases

* add versions to backend libs. add start of package target.

* add create_backend to backends

* temporary workaround to tbb not linking correctly with gcc

* install codegen lib

* force tbb to link to the cpu backend so that it is available for codegen

* fix clang build error

* fix warning for codegen build

* update cuda header paths

* change error message for opening backend shared library

* set lib path

eef750df

02 Jun, 2018 1 commit
- Floating point comparison with ULP, adding close_f and all_close_f (#1068) · b8e28555
  Yixing Lao authored 6 years ago
  
  b8e28555
26 May, 2018 1 commit
- Bug fix to graph control logic to always compute output tensors (#1053) · 2f776ef0
  Jayaram Bobba authored 6 years ago
```
* Bug fix to graph control logic to always compute output tensors

* Remove stale comments
```
  2f776ef0
25 May, 2018 2 commits
- add gpu product (#1040) · 60523801
  Fenglei authored 6 years ago
```
* add gpu product

* enable test, change initial value for product
```
  60523801
- fix bug, add another test to catch the error, enable more tests (#1048) · 2177cf5b
  Fenglei authored 6 years ago
  
  2177cf5b
18 May, 2018 1 commit

Enable reverse_sequence for Interpreter (#977) · cd59bfe4

Nick Korovaiko authored 6 years ago

* use reference kernel for reverse_sequence for int

* move tests

* resolve CI errors

* TEST to NGRAPH_TEST

cd59bfe4

10 May, 2018 2 commits

Move test_control to test lib (#989) · 6d01a3bf
Yixing Lao authored 6 years ago
```
* test_control in util
```
6d01a3bf

New manifest driven method for disabling backend unit tests (#983) · 44b75607

Robert Kimball authored 6 years ago

* Add mechanism for disabling specific backend unit tests from a manifest file.
Populate the test manifest files for CPU, GPU and INTERPRETER.

* update docs for new manifest controlled transformer unit tests

44b75607

09 May, 2018 2 commits

Add op::Or and op::And to GPU transformer (#979) · 8508410f

Chris Sullivan authored 6 years ago

* Moved emit_elementwise implementation into CUDAEmitter and added logical_and and logical_or ops.

* Updated comment and formatting.

* Added check for multi-output elementwise ops.

8508410f

CUDNN and CUDA kernels for AvgPool (forward/backward) (#951) · b1b3d4d6

Chris Sullivan authored 6 years ago

* Added op::AvgPool cudnn impl. which works for 2-3 spatial dimesions and no/symmetric padding. Enabled tests.

* Added cuda-c implementation of average pool which handles 1-3 spatial
dimensions as well as asymmetric padding. This commit also introduces
several helper functions for performing fast integer division and
fast constant memory access.

* Formatting. Removed bool that was used for testing to force the cuda impl. over cudnn.

* Added CUDNN AvgPoolBackprop implementation.

* Removed inline enum in preference of a helper struct. Removed instances of multiple declarations on a single line. Updated comments.

* Removed _prefix to helper functions in anonymous namespace.

b1b3d4d6

08 May, 2018 2 commits

add gpu concat op (#931) · 57d58e50

Fenglei authored 6 years ago

* add concat op

* change to concat

* add more code for gpu concat

* compile sucess version with bug

* add emit_concat_op

* runable with wrong result

* working version

* add some comments

* delete old comments.

* delete old comments.

* remove bug doxyen comments

57d58e50

Computation reuse (#945) · 41c50b44

Jayaram Bobba authored 6 years ago

* Make temp memory pools static to avoid memory allocation overheads

* Initial implementation for graph control to enable caching and computation reuse

* Added sphinx documentation

* Turned off memory buffer reuse in CPU transformer to support computation reuse. Added unit test

* Change memoizable to cacheable

* Change memoizable to cacheable

* Rename variables

41c50b44

05 May, 2018 1 commit

add gpu reverse (#952) · af946b7d

Fenglei authored 6 years ago

* add code to gpu reverse

* add reverse emitter and kernel builder

* working versrion

af946b7d

30 Apr, 2018 1 commit

A reference implementation of batchnorm fprop and tests. (#861) · 833a05b2

varun-intel authored 6 years ago

* interpreter implementation and tests

* style

* correct

* tolerance

* skip

* type

* cast

* double

* types

* format

* add bn to the inference engine backend

833a05b2

27 Apr, 2018 1 commit

gpu select (#919) · 30d24597

Fenglei authored 6 years ago

* add select op, pass data type for each operand

* fix bugs and apply clang format

* fix index bug

30d24597

25 Apr, 2018 2 commits

CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a

Chris Sullivan authored 6 years ago

* Added cudnn batch norm operation to GPU transformer.
Brought batchnorm tests out of cpu_tests and into
backend_tests. Need to add JIRA ticket for interpreter
SKIPS.

* CUDNN batchnorm is implemented. In the ForwardTraining branch
CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
Currently the batchnorm output and mean are calculated correctly for tests:
* GPU.batchnorm_fprop_b2c2h3w3_mean_var
* GPU.batchnorm_fprop_b1c2h2w2
* GPU.batchnorm_fprop_b2c2h2w1
but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.

Also added an additional test and cleaned up some of the old tests.

* MKLDNN internally utilizes the biased estimate of the population variance
and the tests have been crafted to suit MKLDNN. According to the original
batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
(unbiased) statistics should be used for inference, and mini-batch (biased)
statistics should be used training (forward/backward). For the variance this
means utlitizing the following equations, respectively:

(biased) Var[X] = 1/m * Sum_i(x_i-mu)^2 :: used in training
(unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2 :: used in inference

s.t. x_i are elements of X and m = N*D*H*W.

For large batch sizes in inference this may not impact convergence as m >> 1,
but for small batch sizes it will. CUDNN internally utilizes the unbiased
variance.

Changes:
* Added Multiply op to Forward pass of batchnorm to convert
the unbiased variance to a biased one. The op utilizes the
blending scaling factors to apply the bias factor.
* Adds emission for the BatchNormBackprop kernel and cleans up
the emitter implementation.

* Added hashing to cudnn::batchnorm op.

* Formatting.

* Changed hashing of epsilon in cudnn batchnorm.

* Remove implicit conversion and default case in switch for bn.

* Added skips for IE transformer on batchnorm.

* add cudnn include path to compiler.cpp

* seperate two path

* PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
Adding them in as they are unimplemented ops.

* The allocation and deletion of primitives was occuring in seperate
translation units with raw c pointers. Because of this, it was not
clear that these were being freed appropriate, nor did it indicate
ownership of the pointers.

In this commit these raw pointers have been converted over to
std::unique_ptrs such that the construction/destruction is managed
automatically. Furthermore, GPUPrimitiveEmitter::insert now only
takes an r-value reference, requiring move-semantics to indicate
that when inserting a primitive, the GPUPrimitiveEmitter takes
ownership of the pointer.

All instances of primitive creation have been modified.

* CUDNN_SAFE_CALL

* Removed redundant comment and made variable names more verbose.

* Change from conditionals to case-switch in pooling to conform to
batchnorm per @fengleitian's suggestion.

23ac5e5a

add cudnn include path to compiler.cpp (#902) · b0421577

Fenglei authored 6 years ago

* add cudnn include path to compiler.cpp

* seperate two path

* Skipping one_hot tests for CPU as
CI is failing. JIRA bug report: https://jira01.devtools.intel.com/browse/NGRAPH-1682.

b0421577

24 Apr, 2018 1 commit
- Update to enable pass backend unit tests (#904) · 1eb9f9bf
  Robert Kimball authored 6 years ago
```
* get all ops working

* enable autodiff tests for IE backend
```
  1eb9f9bf
23 Apr, 2018 2 commits

Add logical-and, logical-or ops (#892) · 12e8b9b7

Adam Procter authored 6 years ago

* Add logical-and, logical-or ops

* Restore accidentally-deleted test

* add new ops to IE backend

12e8b9b7

Enable users to request default/row-major layouts on result nodes (#884) · c74da83e

Jayaram Bobba authored 6 years ago

* Enable users to request default/row-major layouts on result nodes

* copy default layout attribute when copying the result ops

* Result nodes cannot be replaced. use direct graph manipulation instead

* Add unit test to verify default layouts on result nodes when requested

c74da83e

21 Apr, 2018 2 commits

Add Inference Engine (IE) backend (#883) · 3d590dea

Adam Straw authored 6 years ago

* ie backend and manager with passing unit tests except for select/function

* fix function_call and select

* simplify implemenation by removing support for convert and select

* remove manager

3d590dea

Support concat with mkldnn and add a test case (#825) · 1a73f10c

Nishant Patel authored 6 years ago

* Support Concat with mkldnn (two inputs)

* Support concat with mkldnn (multiple inputs)

* Address feedback

* Remove unused variable

* Allow rank two tensor to mkldnn for concat & add a test case for 2D inputs

* Add mkldnn_any layout to concat

* Make API changes to get consistent with master

1a73f10c