Commits · f75b80066ba538f12365c40c0743bc0fd968fbbd · submodule / ngraph

15 Jun, 2018 2 commits

RNN fusion across layers (#1085) · f75b8006

Pruthvi authored Jun 15, 2018

* - Added graph pass for fusing RNN op across layer
- Added test case for inter v/s cpu for verifying layer fused RNN
- more sanity checks in the RNN fusion graph pass
- added support to replace the recurrent cell state correctly in the fused RNN op

* Fixed multi layer rnn fusion unit test failure

* Addressed PR comments

f75b8006

gpu function call (#1111) · 7c8e9250

Fenglei authored Jun 15, 2018

* enable tests

* add funciton call

* working version

* remove test from ski list

7c8e9250

14 Jun, 2018 2 commits
- GPU Common Function Elimination for compile speedup (#1065) · 3d66cba4
  Robert Kimball authored Jun 14, 2018
```
* remove comments within body of function

* change to emit each op exactly once

* update code per review comments
```
  3d66cba4
- unsigned int -> uint32_t (#1106) · 6ccfbeb5
  Chris Sullivan authored Jun 14, 2018
  
  6ccfbeb5
13 Jun, 2018 3 commits

Ubuntu 18 build support (#1101) · 838ba3f1

Robert Kimball authored Jun 13, 2018

* backend libraries now found in tree

dynamically read header search paths

fix running from install

838ba3f1

Group Convolution (#1041) · 4a2c3c9c

Nick Korovaiko authored Jun 13, 2018

*  group conv init

* add GroupConvolution op; refine checks in fusion logic

* add an emitter, cpu assigment

* cpu_layout

* add checks to algebraic simplification

* updating emitter logic for groupconvolution

* working before refactoring

* moving primitive creation logic to mkldnn_emitter

* group convolution graph test

* rename an opt

* address jbobba's feedback

4a2c3c9c

gpu deconvolution (#1099) · 40069d27

Fenglei authored Jun 13, 2018

* add pad_dilation function

* add dilation to gpu_emitter

* add CoordinateDiff constructor to GPUShape

* remove unecessary cast

* working version for forward

* forward working

* forward test all pass

* deconvolution forward

* backward data dilation

* forward test passed

* initial to 0

* fix bug for get_padded_shape and clang format

* code style, change variable names

* refactor convolution conditions

* fix bug padding_below_diff

* change pad_dilation to pad_dynamic, compare to pad

* remove passed convolution test from skip list, clang format

* change pad to use GPUShape

40069d27

12 Jun, 2018 2 commits

CUDA softmax kernel and broadcast kernel support for multiple non-consecutive axes (#1070) · 83e6aa5f

Chris Sullivan authored Jun 12, 2018

* Added op::ReplaceSlice and enabled respective tests.

* div64 -> division_by_invariant_multiplication

* Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.

* Forgot to fix lambda for workspace allocations to match that of argspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Adjust row_major_strides to avoid reversed-copy.

* Moved declaration out of loop for clang.

* Moved gpu_shape to gpu transformer.

* Removed no longer necessary headers.

* Added stdexcept header to gpu_shape.hpp

* Coordinate->GPUShape

* Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.

* Refactor allocations to make use of primitive emitter.
Now memory primitives are registered at compile time and
the gpu memory address is resolved at runtime by ivoking
the primitive.

* Changed check on 64bit shape to check if high bits are set.

* Added const qualifier to data being copied in GPUAllocator::reserve_argspace

* Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.

* Forgot to remove no longer necessary buffer freeing from op emitters.

* Removed replace slice.

* Removed more replace_slice diffs.

* Updated replace_slice op to utilize GPUShape and GPUMemoryManager.

* Added back missing changes after timeline resolution.

* Added spacing between functions in GPUShape and boolean operators in shape.hpp.

* Template parameters are UPPER_SNAKE_CASE.

* Added unit tests for GPUMemoryManager and added checks that ensure the
device memory is allocated prior to address resolution by the memory_primitives.
Also exposed the allocation size of the memory manager.

* Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
This should be 64bits wide regardless of the underlying value_type of the ShapeType.

* Upstreaming changes to shape_size (which returns size_t).

* cuDNN softmax impl. for all axis activation.

* Added catch for per-axis activations.

* Removed commended headers.

* Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.

* Add softmax cuda kernel. It relies on atomic memory addition to global
memory, this will add contention and should be optimized in the
future. A multilevel reduction can be found in
cs/gpu_softmax_cuda_shfl but it requires some further engineering.

* Refactored reduce coordinate transform code into a helper and applied it to broadcast.
Broadcast added to CUDAEmitter, now supports multiple non-consecutive axes.

* Removed change to data_types variable and updated/removed comments.

* Refactored softmax into the emission of two fused elementwise collective ops.
Added fused elementwise + collective kernels. Softmax is then just the combination of exp_sum_reduce + div_broadcast.

* Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.

* GPU workspace memory is zero initialized by default but can be turned off if desired.

* Added template parameter to CUDAEmitter::build_elementwise, REDUCE_OP_TYPE,
to specify the ngraph op type to use for the reduction in the fusted ew_collective kernel.

* Renamed variables and updated a comment.

* Removed outdated softmax kernel to avoid confusion. Can be added later when atomic reduce is replaced.

* Clang complained about lack of explicit destructor for AxisSet. Since cuda_emitter doesn't need AxisSet specifically, switch to std::set<size_t>.
This also has the benefit that in the future, if we wish to emit kernels without ngraph core (for example in a standalone binary via a
serialized graph manifest, we don't depend on AxisSet.

* softmax -> broadcast in build_broadcast.

* Separate elementwise and elementwise_collective.

83e6aa5f

Replace Check (#1097) · 692101a7
Nick Korovaiko authored Jun 12, 2018

692101a7

11 Jun, 2018 2 commits

Zero init. GPU workspace allocations by default (#1095) · e682af65

Chris Sullivan authored Jun 11, 2018

* Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.

* GPU workspace memory is zero initialized by default but can be turned off if desired.

e682af65

Set MKLROOT when building MKLDNN (#1093) · b4435f29
Robert Kimball authored Jun 11, 2018
```
* finally have something almost acceptable
```
b4435f29

08 Jun, 2018 2 commits

Optimized eigen kernel for spatial mean (#1094) · 0b95efa6

Jayaram Bobba authored Jun 08, 2018

* Optimized eigen kernel for 2D reduction on a 4D tensor used for spatial mean

* revert change to serializer

0b95efa6

Jmenon/dexec (#1092) · abb68627

Jaikrishnan Menon authored Jun 08, 2018

* CPU: Direct Execution
Part 1 with bare minimum infrastructure

* Refactor: Move build related functionality to a separate TU
and external function method

* Add TU back after merge

* Remove an assert

* Remove commented-out code

abb68627

07 Jun, 2018 2 commits

make mkldnn install optional so it won't fail if it is not included in build (#1091) · 79dd92d3
Robert Kimball authored Jun 07, 2018

79dd92d3

ngraph-1676 batch dot fusion (#1071) · 6f5e3ac7

Louis Feng authored Jun 07, 2018

* batch dot pattern wip.

* batch dot pattern wip.

* added batch dot op.

* batch dot compute testing.

* correct gemm parameters.

* renaming matrix fusions passes and update tests.

* clean up.

* clang format.

* more clean ups.

* clang format.

* added CPUBatchDotFusion to default cpu passes.

* added missing header.

* added element type check.

6f5e3ac7

06 Jun, 2018 7 commits
- . helping people document code more efficiently (#1090) · 16d16df7
  L.S. Cook authored Jun 06, 2018
  
  16d16df7
- Fix proxy settings in contrib/docker so that builds work both on the Internet… · 6b84c5e6
  crlishka authored Jun 06, 2018
```
Fix proxy settings in contrib/docker so that builds work both on the Internet and within Intel (#1088)

* Added conditional code to only set proxies when hostname appears to be on an Intel network (.intel.com)

* Replaced Intel-network-conditional proxy setting code with a system that checks for existence of http_proxy and https_proxy, like the Makefile does

* Applied fix for NGRAPH-1862, as I ran into NVidia dockerfile hangs.  Temporarily use --no-cache with docker-build, to check that proxy fix really works (and we don't get a cached version).

* Restored an original line I accidentally deleted.

* Remove --no-cache, which I had added for thorough testing.
```
  6b84c5e6
- workaround for nGraph python runtime tensor write data corruption (#1087) · 0deea239
  Adam Straw authored Jun 06, 2018
  
  0deea239
- remove libraries not directly needed from mlkdnn linkage (#1089) · d783eece
  Robert Kimball authored Jun 06, 2018
  
  d783eece
- turn off building mkldnn tests and examples. set the suffix for shared libraries. (#1083) · f56eed6b
  Robert Kimball authored Jun 06, 2018
  
  f56eed6b
- Support 3-D pool with mkldnn (#1079) · bb5c7f07
  Nishant Patel authored Jun 06, 2018
```
* Support 3-D pool with mkldnn

* Move execute() to test_tools.hpp
```
  bb5c7f07
- Support 3-D convolution with mkldnn (#1061) · faad7d1b
  Nishant Patel authored Jun 06, 2018
  
  faad7d1b
05 Jun, 2018 6 commits

add nbench to install and setup nbench lib paths to work when running in tree (#1081) · 1f5b690d
Robert Kimball authored Jun 05, 2018

1f5b690d

Add StopGradient op to ngraph (#1067) · 52313f9e

Ashok Emani authored Jun 05, 2018

* add StopGradient op

* add StopGradient op src

* remove adjoints and add interpreter

* fix compile issue

* use nop_elimination and add unit-test

* update cmake

* update unit-tests

52313f9e

Slice Concat Elimination (#948) · 91ecac9d

Nick Korovaiko authored Jun 05, 2018

* slice elimination

* add comment for simplify_concat

* fix concat_slice

* another reshape-related fix

* added a missing header

* disable reshape-concat optimization

* test fix

91ecac9d

strip off attributes from backend type prior to shared lib dlopen (#1080) · 99ea4a4b
Adam Straw authored Jun 05, 2018
```
* strip off attributes from backend type prior to shared lib dlopen
```
99ea4a4b

gpu convolution asymmetric pad (#1064) · 7fef9aa9

Fenglei authored Jun 05, 2018

* change convolution to use cudnn emitter

* convolution working

* add asymmetric pad

* forward with asymmetric working

* backward asymmetric

* padding to padding_below

* pad still has bug on backward

* change name

* fix convolution back prop

* fix code block

* slice back from padded output:

* working code

* extra ,

* Update gpu_emitter.cpp

* splict build_convolution to 3 function

* format and fix bugs

* Update cudnn_emitter.hpp

7fef9aa9

Added per argument alignment to GPUAllocator::reserve_argspace. (#1069) · 6638e02b

Chris Sullivan authored Jun 05, 2018

* Added per argument alignment to GPUAllocator::reserve_argspace.

* Changed alignment in tests to match update to alignment in backend.

6638e02b

04 Jun, 2018 3 commits

The GPU impl. doesn't yet handle these corner cases and so they need to be disabled. (#1078) · f3c2160c
Chris Sullivan authored Jun 04, 2018
```
These tests should fail on all systems but they are somehow passing on the CI system.
```
f3c2160c
remove exception workflow (#1076) · edb1375a
Nick Korovaiko authored Jun 04, 2018

edb1375a

Modernize cmake usage (#1032) · eef750df

Robert Kimball authored Jun 04, 2018

* Update cmake files to more modern approach

* disable building libraries that are not required

* handle more build cases

* add versions to backend libs. add start of package target.

* add create_backend to backends

* temporary workaround to tbb not linking correctly with gcc

* install codegen lib

* force tbb to link to the cpu backend so that it is available for codegen

* fix clang build error

* fix warning for codegen build

* update cuda header paths

* change error message for opening backend shared library

* set lib path

eef750df

02 Jun, 2018 1 commit
- Floating point comparison with ULP, adding close_f and all_close_f (#1068) · b8e28555
  Yixing Lao authored Jun 02, 2018
  
  b8e28555
01 Jun, 2018 1 commit
- change get_shape().size() to get_size(), we need to check the actual size (#1051) · 680be054
  Fenglei authored Jun 01, 2018
  
  680be054
31 May, 2018 4 commits
- removed unecessary proxy config from contrib/docker/make-dimage.sh (#1062) · 1adcf3b2
  DawnStone authored May 31, 2018
  
  1adcf3b2
- NGRAPH-1605 Sigmoid multiply fusion (#964) · 5a7d60a1
  Louis Feng authored May 31, 2018
  
  5a7d60a1
- Add checks for conv + bias bprop fusion (#1063) · 83206a0a
  Nishant Patel authored May 31, 2018
  
  83206a0a
- Add missing ops to serializer (#1060) · ba2cbdd6
  Robert Kimball authored May 31, 2018
```
* update serializer for all new ops
```
  ba2cbdd6
30 May, 2018 3 commits

add gpu reduce_window op (#1020) · f6b84d67

Fenglei authored May 30, 2018

* add reduce op

* hack solution to get reduction function in reduct op

* hack version working on all tests

* add recude_window op

* fixed the reduction checking process

* add reduce window op, save progress, not compilable yet

* change puchback to pre allocate for vector

* fixed datatype vector

* dataype and comments

* pass op intead of using template

* using new GPUshape and allocator

* using GPUShape

* add comment, change map inside function.

* change to more menaful name

f6b84d67

remove functions from handlers in nop_elimination (#1007) · aacbb305
Nick Korovaiko authored May 30, 2018

aacbb305

Refactor CPUWorkspaceInsertion to simplify its use in MxNet (#988) · fa221c5f

Nick Korovaiko authored May 30, 2018

* refactor cpworkspaceinsertion for mxnet

* rename mxnet functions to adhere to ngraph naming convention

* define a member static const int in a cpp file to resolve a linking issue

fa221c5f