Commits · 833a05b29c231ddf720db285a77fb90b2a0b1d82 · submodule / ngraph

30 Apr, 2018 1 commit

A reference implementation of batchnorm fprop and tests. (#861) · 833a05b2

varun-intel authored 6 years ago

* interpreter implementation and tests

* style

* correct

* tolerance

* skip

* type

* cast

* double

* types

* format

* add bn to the inference engine backend

833a05b2

28 Apr, 2018 1 commit
- nnpi and nnpi tester in unit test (#929) · 42676ed8
  Yixing Lao authored 6 years ago
  
  42676ed8
27 Apr, 2018 2 commits

gpu select (#919) · 30d24597

Fenglei authored 6 years ago

* add select op, pass data type for each operand

* fix bugs and apply clang format

* fix index bug

30d24597

add enable gpu convolution tests flag in py (#898) · 2d02b23f

Fenglei authored 6 years ago

* add enable gpu convolution tests flag in py

* working version

* revert convolution_test.in.app

* use a skip list to skip backend for test

* add comment to remind implement of neeeded ops

2d02b23f

26 Apr, 2018 4 commits
- Replace interpreter with Adam's simplified implementation (#915) · 89963725
  Robert Kimball authored 6 years ago
```
* wip

* simplified interpreter backend
```
  89963725
- Replace with Broadcast if exists in Algebraic Simplifier (#908) · 82c19d24
  Nick Korovaiko authored 6 years ago
```
* pick broadcast if exists

* remove logic for sum

* get at broadcast using the label-on-skip approach

* tests for broadcast fix

*  add comments
```
  82c19d24
- label on skip (#909) · 04985b27
  Nick Korovaiko authored 6 years ago
  
  04985b27
- Algebraic Simplifier for Sum (#907) · a28f9a67
  Nick Korovaiko authored 6 years ago
```
* simplifier for sum

* add comment, remove visualization passes
```
  a28f9a67
25 Apr, 2018 3 commits

CUDNN BatchNorm (inference/forward/backward) (#893) · 23ac5e5a

Chris Sullivan authored 6 years ago

* Added cudnn batch norm operation to GPU transformer.
Brought batchnorm tests out of cpu_tests and into
backend_tests. Need to add JIRA ticket for interpreter
SKIPS.

* CUDNN batchnorm is implemented. In the ForwardTraining branch
CUDNN seems to calculate the batch mean correctly but the batch variance incorrectly.
Currently the batchnorm output and mean are calculated correctly for tests:
* GPU.batchnorm_fprop_b2c2h3w3_mean_var
* GPU.batchnorm_fprop_b1c2h2w2
* GPU.batchnorm_fprop_b2c2h2w1
but the variance calculated for the batches in these tests is incorrectly calculated by CUDNN.

Also added an additional test and cleaned up some of the old tests.

* MKLDNN internally utilizes the biased estimate of the population variance
and the tests have been crafted to suit MKLDNN. According to the original
batchnorm publication (https://arxiv.org/pdf/1502.03167v3.pdf), population
(unbiased) statistics should be used for inference, and mini-batch (biased)
statistics should be used training (forward/backward). For the variance this
means utlitizing the following equations, respectively:

(biased) Var[X] = 1/m * Sum_i(x_i-mu)^2 :: used in training
(unbiased) Var[X] = 1/(m-1) * Sum_i(x_i-mu)^2 :: used in inference

s.t. x_i are elements of X and m = N*D*H*W.

For large batch sizes in inference this may not impact convergence as m >> 1,
but for small batch sizes it will. CUDNN internally utilizes the unbiased
variance.

Changes:
* Added Multiply op to Forward pass of batchnorm to convert
the unbiased variance to a biased one. The op utilizes the
blending scaling factors to apply the bias factor.
* Adds emission for the BatchNormBackprop kernel and cleans up
the emitter implementation.

* Added hashing to cudnn::batchnorm op.

* Formatting.

* Changed hashing of epsilon in cudnn batchnorm.

* Remove implicit conversion and default case in switch for bn.

* Added skips for IE transformer on batchnorm.

* add cudnn include path to compiler.cpp

* seperate two path

* PR #892 and #825 which were recently merged both forgot skips for the GPU backend.
Adding them in as they are unimplemented ops.

* The allocation and deletion of primitives was occuring in seperate
translation units with raw c pointers. Because of this, it was not
clear that these were being freed appropriate, nor did it indicate
ownership of the pointers.

In this commit these raw pointers have been converted over to
std::unique_ptrs such that the construction/destruction is managed
automatically. Furthermore, GPUPrimitiveEmitter::insert now only
takes an r-value reference, requiring move-semantics to indicate
that when inserting a primitive, the GPUPrimitiveEmitter takes
ownership of the pointer.

All instances of primitive creation have been modified.

* CUDNN_SAFE_CALL

* Removed redundant comment and made variable names more verbose.

* Change from conditionals to case-switch in pooling to conform to
batchnorm per @fengleitian's suggestion.

23ac5e5a

add cudnn include path to compiler.cpp (#902) · b0421577

Fenglei authored 6 years ago

* add cudnn include path to compiler.cpp

* seperate two path

* Skipping one_hot tests for CPU as
CI is failing. JIRA bug report: https://jira01.devtools.intel.com/browse/NGRAPH-1682.

b0421577

Fix cmake files and tests to allow building ngraph without the CPU backend (#910) · 30f7699b
Robert Kimball authored 6 years ago

30f7699b

24 Apr, 2018 2 commits

infra for algebraic simplification and simplifications for Add and Mu… (#878) · bd51497b

Nick Korovaiko authored 6 years ago

* infra for algebraic simplification and simplifications for Add and Multiply (including broadcast consts)

* add tests, fix bugs

* negative tests, 0*0, 0*1, 0+0

* possible fix for 0*1

* remove stale test

* fix merge comp errors

* fix comp errors

bd51497b

Update to enable pass backend unit tests (#904) · 1eb9f9bf
Robert Kimball authored 6 years ago
```
* get all ops working

* enable autodiff tests for IE backend
```
1eb9f9bf

23 Apr, 2018 4 commits

Add logical-and, logical-or ops (#892) · 12e8b9b7

Adam Procter authored 6 years ago

* Add logical-and, logical-or ops

* Restore accidentally-deleted test

* add new ops to IE backend

12e8b9b7

Enable users to request default/row-major layouts on result nodes (#884) · c74da83e

Jayaram Bobba authored 6 years ago

* Enable users to request default/row-major layouts on result nodes

* copy default layout attribute when copying the result ops

* Result nodes cannot be replaced. use direct graph manipulation instead

* Add unit test to verify default layouts on result nodes when requested

c74da83e

Rename Any to Skip since that is exactly what it should have been called from the beginning (#891) · 478a94f9
Nick Korovaiko authored 6 years ago
```
* any -> skip

* run style check
```
478a94f9

Add CUDNN_SAFE_CALL and CUBLAS_SAFE_CALL (#889) · 870200d1

Fenglei authored 6 years ago

* add CUDNN_SAFE_CALL and CUBLAS_SAFE_CALL

* using sstream

* passed all unit test

* format error msg

* fix ( ) bug

870200d1

21 Apr, 2018 2 commits

Add Inference Engine (IE) backend (#883) · 3d590dea

Adam Straw authored 6 years ago

* ie backend and manager with passing unit tests except for select/function

* fix function_call and select

* simplify implemenation by removing support for convert and select

* remove manager

3d590dea

Support concat with mkldnn and add a test case (#825) · 1a73f10c

Nishant Patel authored 6 years ago

* Support Concat with mkldnn (two inputs)

* Support concat with mkldnn (multiple inputs)

* Address feedback

* Remove unused variable

* Allow rank two tensor to mkldnn for concat & add a test case for 2D inputs

* Add mkldnn_any layout to concat

* Make API changes to get consistent with master

1a73f10c

20 Apr, 2018 1 commit
- Move runtime::Manager functionality into runtime::Backend (#875) · f430eac7
  Robert Kimball authored 6 years ago
```
* Move runtime::Manager functionality into runtime::Backend

* Remove unused files

* remove obsolete function
```
  f430eac7
18 Apr, 2018 2 commits

GPU Padding - add support for custom pad value and interior padding (#860) · 0be581c0

Chris Sullivan authored 6 years ago

* * cuda_emitter::build_pad now utilizes pad_value.

* Added TypeInfo class for dispatching c-type information from the underlying ngraph element::Type.
  Adjusted test to use all_close when comparing floating point values (max_pool_2d_1channel_1image_overpadded).

* Refactored max_pool_1d into cuda_emitter so that numeric_limits<c_type>::lowest() could be used for initial max value.
Test max_pool_2d_1channel_1image_padded_negative_values now enabled and passes.

* Removed old function and switch to size_t to match ngraph.

* Added virtual dtor.

* Adding support for interior padding. All op::Pad functionality is now included.

* More info in runtime_error for checking of tensor dimensions. Removed commented code.

0be581c0

Weight Fusion (#853) · 8cb48d37

Nick Korovaiko authored 6 years ago

* CPU weight fusion initial version

* add tests for weight_fusion

* address @jbobba's feedback

* before cleaning up convolution_weight_optimization.cpp

* clean up, rename, fix perms, fix format

8cb48d37

17 Apr, 2018 1 commit
- reenable unit test (#869) · 9d57eee5
  Robert Kimball authored 6 years ago
```
* reenable unit test
```
  9d57eee5
16 Apr, 2018 3 commits
- Fix element type for create_tensor of cached fprop nodes in backprop_derivative (#862) · aadc9ce4
  Adam Procter authored 6 years ago
  
  aadc9ce4
- get_input_op -> get_argument (#852) · 16571afd
  Nick Korovaiko authored 6 years ago
```
* get_input_op -> get_argument

* more replacing

* more replacing2
```
  16571afd
- working version (#858) · d7216dfc
  Fenglei authored 6 years ago
  
  d7216dfc
13 Apr, 2018 4 commits

Remove legacy Backend API (#848) · ec501913

Robert Kimball authored 6 years ago

* remove deprecated

* remove all legacy Backend API usage

remove deprecated files

* pull in changes from master

* fix GPU calls

* disable tests in convolution generator

* update per PR comments. Enable performance counter feature.

* update per PR comments

* fix build error

* fix conditionally compiled test :(

ec501913

make sure matcher respects argument order for non-commutative ops (#847) · b32b5c23
Nick Korovaiko authored 6 years ago

b32b5c23
Add backend call validation and unit tests (#857) · e7cf2662
Robert Kimball authored 6 years ago

e7cf2662

Add GPURuntimeContext and GPUPrimitiveEmitter to the gpu transformer (#837) · 026bede0

Chris Sullivan authored 6 years ago

* Begin prototype of cudnn_emitter.

* Added GPURuntimeContext to gpu_external_function for passing through to JIT functions.

* gpu_emitters now utilize gpu runtime context.

* Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint.

* Added CUDNNEmitter, comparable to MKLDNNEmitter,
which allows for cudnn kernels to be defined via
lambda primitives that are emitted and
subsequently called during graph execution.
An example implementation is provided for op::Sum.

* Added GPURuntimeContext to gpu_external_function for passing through to JIT functions.

* gpu_emitters now utilize gpu runtime context.

* Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint.

* GPURuntimeContext should be stored as unique_ptr in external function.

* Extract raw pointer from unique for cudnn_emitter.

* Removing unrelated code from PR.

* GPURuntimeContext needs to be a strict C interface in case
the native compiler and clang are utilizing different glibc ABIs.
Updated to reflect this.

* Added cudnn::primitive typedef for better readability.

* Moved allocation of CudaFunctionPool to external function
so that it is available during gpu emission.

* Fixed too-late initialization of cudart.

* CUDNNEmitter moved into superset class GPUPrimitiveEmitter.
The GPUPrimitiveEmitter handles the emission of all gpu primitives,
including cudnn, cuda, and cublas. CUBLASEmitter support not yet included.

* Added unordered_map for cacheing primitives in the gpu_emitter.

* Added dtor to GPUPrimitiveEmitter to cleanup compiled functions.

* Adding back a serialized model graph that was accidentally rem* Added a few additional helpers to use ngraph::row_major_strides.

* added whitespace per @fengleitian's comment

* Remove implicit type conversions from size_t to int.

* Add op::MaxPool, op::MaxPoolBackprop and op::Pad to GPU transformer (#817)

* Added pooling for 1 and 2dimensions. 1d uses a cuda kernel and 2d utilizes cudnn.
Padding is not yet supported.

* Normalized call signature on gpu emission for 1d max pool. Added a few comments.

* Max pool backprop impl. inprogress. Amend this commit.

* Max pool backprop implemented. Note that cuDNN
requests the output tensor for the maxpool operation but it is not required for computation.

* Formatting and invokation for maxpool changed.

* Fixed too-late initialization of cudart.

* Added padding kernel that is used with maxpool. Need to investigate remaining tests.

* Changed dimensionality check to correctly
determine if data is 1d or not.

* Added 3d MaxPooling (forward), verified by forcing 2d case to use Nd pooling routines.

* Added 3d MaxPooling (backward), verified by forcing 2d case to use Nd pooling routines.

* Moved cudnn prologues for maxpool into ngraph runtime and out of primitive so
that the only execution occuring on the JIT runtime is the evaluation of the op kernel.

* Refactored forward and backward pooling into single CUDNNEmitter::build_pooling interface
with a runtime switch to determine if the op is forward or backward propagation.

* Cache preconstructed cudnn kernel for maxpool if it has already been constructed.

* Forgot to add padding arrays back into cudnn kernel for MaxPool in the 2d case.

* Fixed namespace issues and use join(...,'_')

* Refactored 4d/Nd tensor descriptor builder into single function.

* Changed conditionals and comments. Now throws if MaxPool on more than 3 spatial dimensions is requested.

* Fixed forward declare for GPURuntimeContext (class -> struct).

* Clang complains about missing braces on brace-initializer. Fixed implicit conversions.

* Fixed implicit conversions (clang).

* Reverting changes on autodiff test for maxpool. @Krovatkin will update later.

026bede0

12 Apr, 2018 3 commits

gpu slice (#843) · 041dd524

Fenglei authored 6 years ago

* add slice op, first version

* change size to output size

* fix bugs

* working version

* using exist function for join and strides

* clang format

* revert accidental change

041dd524

RecurrentGraphRewrite + tests (#833) · b14d5665

Nick Korovaiko authored 6 years ago

* add a getter for root node

* recurrent graph rewrite

* fix perms, rename match_root -> get_match_root

* fix comp errors

* make match_root return the topmost match; fix tests

b14d5665

gpu convolution support nd(n<4) (#824) · b9b7845c

Fenglei authored 6 years ago

* add convolution in progress

* enable 1 test

* convolution in progress

* use filter descripter

* filter discreptor bug fix

* tensor format

* add missed dimension calculator

* forward convolution 4d without dilation and padding working

* data dilation(deconvolution) and enable some test

* add backprop convolution data and filter

* backprop can compile

* pass unit test, but still have problem on padding

* 2d, symmtric padding, no data dilation works now

* clean up code

* extend gpu convolution to nd

* fix some bugs

* working version for upto 3d convolution, code format.

* remove nunecessary changes

* add restriction for data dilation and asymmetric padding

* clang format

* support upto 3D convolution for now

* change comments to not implemented

* change comments to not implemented

* add quary for additional GPU workspace for convolution

* clang format

* code format

* using row_major_strides

* using join

* fix bug for join

* refactor dimension calculation

b9b7845c

10 Apr, 2018 3 commits

Use new backend API in graph partition (#844) · 6e1b6058
Yixing Lao authored 7 years ago
```
* new backend API in graph partition

* update API
```
6e1b6058

Zero Dimension Tensor Elimination (#617) · 2d75f665

Nick Korovaiko authored 7 years ago

*  zero dimension tensor elimination init

* more ops + refactor + tests

* revert pattern.cpp

* add internal zero-length test

* address Scott's feedback

* fix comp errors

* proper static init

* get rid of unique-ptr

* refactor hashmap into virtual get_default_values on op classes

* fix formatting

2d75f665

back out api change (#842) · 96604f12
Robert Kimball authored 7 years ago
```
* back out api change
```
96604f12

09 Apr, 2018 3 commits

Repackaging match_recurring_pattern into RecurrentMatcher (#832) · 10ef07e6

Nick Korovaiko authored 7 years ago

* repacking recurrent matching as a standalone class

* RecurrentMatcher

* add a getter for root node

* address Scott's feedback

10ef07e6

New backend/transformer API (#739) · 777600c6

Robert Kimball authored 7 years ago

* force backend compile() to make a copy of the graph

fix copy_with_new_args on ops that have function pointers internal

update unit test for new backend API

add unit test for multiple simulataneous backends

* move get_subdevices virtual method to Manager class

* update GPU to latest

* update call methods

* add remove_compiled_function()

777600c6

Fuse zero-padded convolution backprop filters (#828) · 81c0ef79

Jaikrishnan Menon authored 7 years ago

* CPU: Fuse zero-padded convolution backprop filters

* CPU: Add a testcase for zero-padded convolution backprop filters fusion

81c0ef79

06 Apr, 2018 1 commit

Support for Recurring Patterns (#782) · a8cd0e94

Nick Korovaiko authored 7 years ago

* initial support for recurring matching

* fix a bug where patterns weren't populated w/ matched nodes; add recurrent tests

* add a missing newline

* address feedback

* fix function comment

a8cd0e94