Commits · 692101a7c3c5e3a7cf49b99dec8862494a7a7dca · submodule / ngraph

12 Jun, 2018 1 commit
- Replace Check (#1097) · 692101a7
  Nick Korovaiko authored Jun 12, 2018
  
  692101a7
11 Jun, 2018 2 commits

Zero init. GPU workspace allocations by default (#1095) · e682af65

Chris Sullivan authored Jun 11, 2018

* Added default param to GPUAllocator::reserve_workspace to request memory initialization for each invocation of the memory primitive.

* GPU workspace memory is zero initialized by default but can be turned off if desired.

e682af65

Set MKLROOT when building MKLDNN (#1093) · b4435f29
Robert Kimball authored Jun 11, 2018
```
* finally have something almost acceptable
```
b4435f29

08 Jun, 2018 2 commits

Optimized eigen kernel for spatial mean (#1094) · 0b95efa6

Jayaram Bobba authored Jun 08, 2018

* Optimized eigen kernel for 2D reduction on a 4D tensor used for spatial mean

* revert change to serializer

0b95efa6

Jmenon/dexec (#1092) · abb68627

Jaikrishnan Menon authored Jun 08, 2018

* CPU: Direct Execution
Part 1 with bare minimum infrastructure

* Refactor: Move build related functionality to a separate TU
and external function method

* Add TU back after merge

* Remove an assert

* Remove commented-out code

abb68627

07 Jun, 2018 2 commits

make mkldnn install optional so it won't fail if it is not included in build (#1091) · 79dd92d3
Robert Kimball authored Jun 07, 2018

79dd92d3

ngraph-1676 batch dot fusion (#1071) · 6f5e3ac7

Louis Feng authored Jun 07, 2018

* batch dot pattern wip.

* batch dot pattern wip.

* added batch dot op.

* batch dot compute testing.

* correct gemm parameters.

* renaming matrix fusions passes and update tests.

* clean up.

* clang format.

* more clean ups.

* clang format.

* added CPUBatchDotFusion to default cpu passes.

* added missing header.

* added element type check.

6f5e3ac7

06 Jun, 2018 7 commits
- . helping people document code more efficiently (#1090) · 16d16df7
  L.S. Cook authored Jun 06, 2018
  
  16d16df7
- Fix proxy settings in contrib/docker so that builds work both on the Internet… · 6b84c5e6
  crlishka authored Jun 06, 2018
```
Fix proxy settings in contrib/docker so that builds work both on the Internet and within Intel (#1088)

* Added conditional code to only set proxies when hostname appears to be on an Intel network (.intel.com)

* Replaced Intel-network-conditional proxy setting code with a system that checks for existence of http_proxy and https_proxy, like the Makefile does

* Applied fix for NGRAPH-1862, as I ran into NVidia dockerfile hangs.  Temporarily use --no-cache with docker-build, to check that proxy fix really works (and we don't get a cached version).

* Restored an original line I accidentally deleted.

* Remove --no-cache, which I had added for thorough testing.
```
  6b84c5e6
- workaround for nGraph python runtime tensor write data corruption (#1087) · 0deea239
  Adam Straw authored Jun 06, 2018
  
  0deea239
- remove libraries not directly needed from mlkdnn linkage (#1089) · d783eece
  Robert Kimball authored Jun 06, 2018
  
  d783eece
- turn off building mkldnn tests and examples. set the suffix for shared libraries. (#1083) · f56eed6b
  Robert Kimball authored Jun 06, 2018
  
  f56eed6b
- Support 3-D pool with mkldnn (#1079) · bb5c7f07
  Nishant Patel authored Jun 06, 2018
```
* Support 3-D pool with mkldnn

* Move execute() to test_tools.hpp
```
  bb5c7f07
- Support 3-D convolution with mkldnn (#1061) · faad7d1b
  Nishant Patel authored Jun 06, 2018
  
  faad7d1b
05 Jun, 2018 6 commits

add nbench to install and setup nbench lib paths to work when running in tree (#1081) · 1f5b690d
Robert Kimball authored Jun 05, 2018

1f5b690d

Add StopGradient op to ngraph (#1067) · 52313f9e

Ashok Emani authored Jun 05, 2018

* add StopGradient op

* add StopGradient op src

* remove adjoints and add interpreter

* fix compile issue

* use nop_elimination and add unit-test

* update cmake

* update unit-tests

52313f9e

Slice Concat Elimination (#948) · 91ecac9d

Nick Korovaiko authored Jun 05, 2018

* slice elimination

* add comment for simplify_concat

* fix concat_slice

* another reshape-related fix

* added a missing header

* disable reshape-concat optimization

* test fix

91ecac9d

strip off attributes from backend type prior to shared lib dlopen (#1080) · 99ea4a4b
Adam Straw authored Jun 05, 2018
```
* strip off attributes from backend type prior to shared lib dlopen
```
99ea4a4b

gpu convolution asymmetric pad (#1064) · 7fef9aa9

Fenglei authored Jun 05, 2018

* change convolution to use cudnn emitter

* convolution working

* add asymmetric pad

* forward with asymmetric working

* backward asymmetric

* padding to padding_below

* pad still has bug on backward

* change name

* fix convolution back prop

* fix code block

* slice back from padded output:

* working code

* extra ,

* Update gpu_emitter.cpp

* splict build_convolution to 3 function

* format and fix bugs

* Update cudnn_emitter.hpp

7fef9aa9

Added per argument alignment to GPUAllocator::reserve_argspace. (#1069) · 6638e02b

Chris Sullivan authored Jun 05, 2018

* Added per argument alignment to GPUAllocator::reserve_argspace.

* Changed alignment in tests to match update to alignment in backend.

6638e02b

04 Jun, 2018 3 commits

The GPU impl. doesn't yet handle these corner cases and so they need to be disabled. (#1078) · f3c2160c
Chris Sullivan authored Jun 04, 2018
```
These tests should fail on all systems but they are somehow passing on the CI system.
```
f3c2160c
remove exception workflow (#1076) · edb1375a
Nick Korovaiko authored Jun 04, 2018

edb1375a

Modernize cmake usage (#1032) · eef750df

Robert Kimball authored Jun 04, 2018

* Update cmake files to more modern approach

* disable building libraries that are not required

* handle more build cases

* add versions to backend libs. add start of package target.

* add create_backend to backends

* temporary workaround to tbb not linking correctly with gcc

* install codegen lib

* force tbb to link to the cpu backend so that it is available for codegen

* fix clang build error

* fix warning for codegen build

* update cuda header paths

* change error message for opening backend shared library

* set lib path

eef750df

02 Jun, 2018 1 commit
- Floating point comparison with ULP, adding close_f and all_close_f (#1068) · b8e28555
  Yixing Lao authored Jun 02, 2018
  
  b8e28555
01 Jun, 2018 1 commit
- change get_shape().size() to get_size(), we need to check the actual size (#1051) · 680be054
  Fenglei authored Jun 01, 2018
  
  680be054
31 May, 2018 4 commits
- removed unecessary proxy config from contrib/docker/make-dimage.sh (#1062) · 1adcf3b2
  DawnStone authored May 31, 2018
  
  1adcf3b2
- NGRAPH-1605 Sigmoid multiply fusion (#964) · 5a7d60a1
  Louis Feng authored May 31, 2018
  
  5a7d60a1
- Add checks for conv + bias bprop fusion (#1063) · 83206a0a
  Nishant Patel authored May 31, 2018
  
  83206a0a
- Add missing ops to serializer (#1060) · ba2cbdd6
  Robert Kimball authored May 31, 2018
```
* update serializer for all new ops
```
  ba2cbdd6
30 May, 2018 4 commits

add gpu reduce_window op (#1020) · f6b84d67

Fenglei authored May 30, 2018

* add reduce op

* hack solution to get reduction function in reduct op

* hack version working on all tests

* add recude_window op

* fixed the reduction checking process

* add reduce window op, save progress, not compilable yet

* change puchback to pre allocate for vector

* fixed datatype vector

* dataype and comments

* pass op intead of using template

* using new GPUshape and allocator

* using GPUShape

* add comment, change map inside function.

* change to more menaful name

f6b84d67

remove functions from handlers in nop_elimination (#1007) · aacbb305
Nick Korovaiko authored May 30, 2018

aacbb305

Refactor CPUWorkspaceInsertion to simplify its use in MxNet (#988) · fa221c5f

Nick Korovaiko authored May 30, 2018

* refactor cpworkspaceinsertion for mxnet

* rename mxnet functions to adhere to ngraph naming convention

* define a member static const int in a cpp file to resolve a linking issue

fa221c5f

Fuse conv+bias bprop debug (#1038) · a1d78033
Nishant Patel authored May 30, 2018

a1d78033

29 May, 2018 2 commits

[CS:GPU::Part 1] Add GPUShape type, conversion operators, and generalized shape helpers. (#1031) · d051f5fa

Chris Sullivan authored May 29, 2018

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Adjust row_major_strides to avoid reversed-copy.

* Moved declaration out of loop for clang.

* Moved gpu_shape to gpu transformer.

* Removed no longer necessary headers.

* Added stdexcept header to gpu_shape.hpp

* Changed check on 64bit shape to check if high bits are set.

* Added spacing between functions in GPUShape and boolean operators in shape.hpp.

* Template parameters are UPPER_SNAKE_CASE.

* Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
This should be 64bits wide regardless of the underlying value_type of the ShapeType.

* [CS:GPU::Part 2] Add GPUMemoryManager, GPUAllocator, and memory primitives. (#1034)

This is a big PR which introduces the GPUMemoryManager, GPUAllocator, and the concept of memory primitives.

A memory primitive is a closure which yields the device memory address for a reserved memory space. When a memory reservation is made, the request is recorded along with the data that should be copied (for kernel arguments, but not for workspace memory). The reservation does not yield an address eagerly but instead does so lazily by returning an index which can be used to look up the memory_primitive at runtime. This allows the GPUMemoryManager to delay resolution of the memory address until all reservations have been made.

Ideally, the temporary allocations needed by each kernel could be captured by the liveness lists in the GPU_External_Function. This way the pass::MemoryManager would capture these allocations along with the needed tensor allocations.

For now, rather than rearchitect the gpu_emitter and external function, we utilize the GPUMemoryManager, which maintains its own internal pass::MemoryManager, and the GPUAllocator. Liveness is handled by the GPUAllocator: all workspace allocation/reservations created in the same (or sub)scope as the GPUAllocator will persist until the GPUAllocator goes out of scope and deconstructs. At that time, the GPUAllocator will mark the requested temporary buffers as free, and their liveness will be removed (effectively). That way the next kernels that construct a GPUAllocator can reuse the workspace memory that was needed for previous kernels.

Additional notes:
* This PR updates the CUDAEmitter to exclusively utilize GPUShape instead of Shape.

Commits:
* Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.

* Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.

* Coordinate->GPUShape

* Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.

* Refactor allocations to make use of primitive emitter. Now memory primitives are registered at compile time and the gpu memory address is resolved at runtime by invoking the primitive.

* Added const qualifier to data being copied in GPUAllocator::reserve_argspace

* Removed more replace_slice diffs.

* Added unit tests for GPUMemoryManager and added checks that ensure the
device memory is allocated prior to address resolution by the memory_primitives.
Also exposed the allocation size of the memory manager.

* Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.

[CS:GPU::Part 3] Refactoring of several ops to use GPUMemoryManager (#1035)

This PR implements the new GPUMemoryManager and allocator for all the ops which were previously implemented but required allocations and copies for kernel arguments at runtime.

Limitations:
The convolution workspaces could not be added because the relevant descriptors were not available at compile time due to the codegen. If convolution is later added to the CUDNN emitter, the GPUAllocator can be used to avoid workspace allocation at runtime.

Commits:
* Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.

* Forgot to remove no longer necessary buffer freeing from op emitters.

[CS:GPU::Part4] Added op::ReplaceSlice and enabled respective tests. (#999)

This PR implements ReplaceSlice using the coordinate transformation strategy. A thread for each tensor element of the input tensor is chosen and it's position in the source tensor coordinate system is calculated. If it is within the source slice, the source is loaded and written out, otherwise the input tensor is loaded.

* Relevant tests are enabled.

* This op was refactored to utilize the new GPUAllocator and memory manager.

Commits:

* Updated replace_slice op to utilize GPUShape and GPUMemoryManager.

* Added back missing changes after timeline resolution.

* Fixed clang warnings and bug. The cudnn_handle was not initialized ahead of emission time and so any eager cudnn calls would fail.
To fix this, the cudnn and cublas handle creation was moved to the external function constructor.

* Changed row_major_strides to always return vector<size_t> to avoid overflow for tensors with many dimensions. Handle the conversion to 32 bits for GPU shapes with an explicit conversion constructor from vector<size_t>.

* During merge the allocation line from external_function was left out. Adding it back.

d051f5fa

Add markdown version of CONTRIB guidelines to ngraph root (#1052) · 61b2e93a

L.S. Cook authored May 29, 2018

* Add markdown version of CONTRIB buidelines to ngraph root

* Fix weird markdown issue with cpp code block rendering

61b2e93a

26 May, 2018 2 commits
- Serializer Pass (#1050) · bb7f083e
  Nick Korovaiko authored May 26, 2018
```
* serializer pass
```
  bb7f083e
- Bug fix to graph control logic to always compute output tensors (#1053) · 2f776ef0
  Jayaram Bobba authored May 26, 2018
```
* Bug fix to graph control logic to always compute output tensors

* Remove stale comments
```
  2f776ef0
25 May, 2018 3 commits
- fix invalid context when run mxnet and nbench (#1047) · 7d29490f
  Fenglei authored May 25, 2018
  
  7d29490f
- cuDNN Softmax implementation for all axis activation (#1045) · 5dcd835f
  Chris Sullivan authored May 25, 2018
```
* cuDNN softmax impl. for all axis activation.

* Added catch for per-axis activations.
```
  5dcd835f
- Any op (#1036) · bff65fe3
  Nick Korovaiko authored May 25, 2018
```
* add any op
```
  bff65fe3