Commits · 8df8e5d6ae51cea5d56fba75a7efb695e8716329 · submodule / ngraph

22 Feb, 2019 1 commit
- use calls for new backend API in unit tests (#2427) · 26bba737
  Robert Kimball authored 5 years ago
```
* use calls for new backend API in unit tests

* fix compile error

* fix compile error
```
  26bba737
07 Jan, 2019 1 commit

Simplified all_close_f interface and tightened default criteria (#2285) · 0eaa960c

gcwenger authored 6 years ago

* Simplified & tightened all_close_f parameters

Removed specification of mantissa bits for all_close_f in favor
of just specifying tolerance bits. Tightened up all_close_f default.
Fixed LRN unit test which had insufficient result precision to pass
tighter all_close_f tolerance.

* Addressed PR comments.

Reworked mantissa bit and tolerance constants.
Clarified and improved graph comparison tolerance calculation flexibility.
Clarified unit test tolerance testing.

0eaa960c

03 Jan, 2019 1 commit
- update licenses for 2019 (#2275) · ba299b93
  Robert Kimball authored 6 years ago
```
* update licenses for 2019

* style
```
  ba299b93
19 Dec, 2018 1 commit

Make explicit compile call in unit tests (#2224) · 7693f74e

Robert Kimball authored 6 years ago

* make validate public

* move compile call outside of call for unit tests

* fix compile error

* one more error

7693f74e

08 Dec, 2018 1 commit

move GPU specific test to GPU only (#2191) · 40dda4eb

Robert Kimball authored 6 years ago

* move GPU specific test to GPU only

* fix unit test invocation

* fix compile error

* fix compile error

* style

* fix runtime error

40dda4eb

07 Dec, 2018 1 commit

Backend API change pre-work (#2064) · e0933553

Robert Kimball authored 6 years ago

* change compile call to return Handle

* make CPU require compile() before call()

* fix unit tests to call compile() before call()

* fix failing ops

* update unit test

* revert some changes

* more fixups

* more diff cleanup

* a few more issues addressed

* more fixes

* update API

* more updates

* fix test_ops.py

* fix

* another attempt to fix

* fix unit test

* fix test error

e0933553

16 Nov, 2018 1 commit

Move ParameterVector and ResultVector to the ngraph namespace (#2054) · 803c38aa

Robert Kimball authored 6 years ago

* Move ParameterVector and ResultVector to the ngraph namespace where they belong

* update python wrapper

* more python fixes

* style

* Update setup.py

* fix some new code

803c38aa

05 Nov, 2018 1 commit

TopK additional tests for nvGPU backend (#1946) · 37dc586c

Ayan Moitra authored 6 years ago

* added tests for malloc mode and graph transform

* Comment incorporation

* changed comparing backend to INTERPRETER

* COmments resolved+clang

* Adressed all comments

* IntelGPU does not support topk

37dc586c

16 Oct, 2018 1 commit
- reset buffer size, use original input size for memcpy (#1786) · 05aa1be8
  Fenglei authored 6 years ago
```
* reset buffer size, use original input size for memcpy

* resolve comment and add test

* update comment
```
  05aa1be8
12 Oct, 2018 1 commit
- Handle workspace reservation when workspace size is zero (#1797) · 3d4f98e2
  Ayan Moitra authored 6 years ago
```
* return nullptr when workspace size is zero+modify insert method to accept const lvalue ref

* Unit test added
```
  3d4f98e2
29 Aug, 2018 1 commit

Change license header to use single-line comment (#1508) · a17ec605

Robert Kimball authored 6 years ago

* use line comments instead of multiline comments for license header

* update more

* update new files

* more header updates

* style

a17ec605

08 Aug, 2018 1 commit

Revert changes to gpu shape and update (#1354) · b8de3b7d

Chris Sullivan authored 6 years ago

* GPUShape(int32_t) -> NVShape(uint32_2), NVDiff(int32_t)

* Update code merged from master.

* Add nvshape.hpp and nvdiff.hpp.

b8de3b7d

03 Aug, 2018 1 commit

Preallocate intermediate buffers (#1231) · 0599a628

Chris Sullivan authored 6 years ago

* Utilize GPUMemoryManager/Allocator for preallocation of intermediate tensor buffer memory.

* Formatting.

* Merge with master required rework of memory due to CFE pass. Moved function memory pool allocation to pass as a result.

* Formatting.

* Added pass source files.

* Updated tests to account for new assert check. All GPUAllocators should be deconstructed before allocation is made in GPUMemoryManager.

* GPUAllocator::close() can be used to close the allocator prior to destruction

* Removed open allocators. Replaced check with inspection of pass::MemoryManager node list.

* Formatting.

* Rename m_memory_buffers -> m_tensor_memory_buffers. Use full path to static alignment variable.

* FunctionMemoryReservation -> TensorMemoryReservation. Only return true in pass if reservation is made (bug fix).

* Moved static compilation mutex.

* Update external function with new pass name.

* GPU_ExternalFunction: Add s_memory_pool_alignment, remove optimize_and_assemble method.

0599a628

05 Jun, 2018 1 commit

Added per argument alignment to GPUAllocator::reserve_argspace. (#1069) · 6638e02b

Chris Sullivan authored 6 years ago

* Added per argument alignment to GPUAllocator::reserve_argspace.

* Changed alignment in tests to match update to alignment in backend.

6638e02b

29 May, 2018 1 commit

[CS:GPU::Part 1] Add GPUShape type, conversion operators, and generalized shape helpers. (#1031) · d051f5fa

Chris Sullivan authored 6 years ago

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.
Shape is now implicitly convertable to GPUShape.

* Updated shape helpers signature and add conversion operators/constructors for GPUShape.

* Adjust row_major_strides to avoid reversed-copy.

* Moved declaration out of loop for clang.

* Moved gpu_shape to gpu transformer.

* Removed no longer necessary headers.

* Added stdexcept header to gpu_shape.hpp

* Changed check on 64bit shape to check if high bits are set.

* Added spacing between functions in GPUShape and boolean operators in shape.hpp.

* Template parameters are UPPER_SNAKE_CASE.

* Return type of shape_size should be large enough to encapsulate the full stride of the tensor.
This should be 64bits wide regardless of the underlying value_type of the ShapeType.

* [CS:GPU::Part 2] Add GPUMemoryManager, GPUAllocator, and memory primitives. (#1034)

This is a big PR which introduces the GPUMemoryManager, GPUAllocator, and the concept of memory primitives.

A memory primitive is a closure which yields the device memory address for a reserved memory space. When a memory reservation is made, the request is recorded along with the data that should be copied (for kernel arguments, but not for workspace memory). The reservation does not yield an address eagerly but instead does so lazily by returning an index which can be used to look up the memory_primitive at runtime. This allows the GPUMemoryManager to delay resolution of the memory address until all reservations have been made.

Ideally, the temporary allocations needed by each kernel could be captured by the liveness lists in the GPU_External_Function. This way the pass::MemoryManager would capture these allocations along with the needed tensor allocations.

For now, rather than rearchitect the gpu_emitter and external function, we utilize the GPUMemoryManager, which maintains its own internal pass::MemoryManager, and the GPUAllocator. Liveness is handled by the GPUAllocator: all workspace allocation/reservations created in the same (or sub)scope as the GPUAllocator will persist until the GPUAllocator goes out of scope and deconstructs. At that time, the GPUAllocator will mark the requested temporary buffers as free, and their liveness will be removed (effectively). That way the next kernels that construct a GPUAllocator can reuse the workspace memory that was needed for previous kernels.

Additional notes:
* This PR updates the CUDAEmitter to exclusively utilize GPUShape instead of Shape.

Commits:
* Added GPUMemoryManager for aggregating memory allocations and copies into a single operation for kernel arguments, and a reusuable memory space for workspace allocations.

* Added GPUShape and reworked Shape helpers to be
compatible with different shape types.

* Removed several unecessary static_casts now that GPUShape is utilized. GPUTensorViewWrapper had a few functions returning std::vector<size_t> instead of Shape/Strides. These were updated as well to take advantage of GPUShape convertion operators.

* Coordinate->GPUShape

* Refactored replace_slice into CudaKernelBuilder. Simplified allocations using new GPUAllocator and GPUMemoryManager.

* Refactor allocations to make use of primitive emitter. Now memory primitives are registered at compile time and the gpu memory address is resolved at runtime by invoking the primitive.

* Added const qualifier to data being copied in GPUAllocator::reserve_argspace

* Removed more replace_slice diffs.

* Added unit tests for GPUMemoryManager and added checks that ensure the
device memory is allocated prior to address resolution by the memory_primitives.
Also exposed the allocation size of the memory manager.

* Added explicit function for queueing kernel argument data rather than inline in the reservation function per @fengleitian recommendation.

[CS:GPU::Part 3] Refactoring of several ops to use GPUMemoryManager (#1035)

This PR implements the new GPUMemoryManager and allocator for all the ops which were previously implemented but required allocations and copies for kernel arguments at runtime.

Limitations:
The convolution workspaces could not be added because the relevant descriptors were not available at compile time due to the codegen. If convolution is later added to the CUDNN emitter, the GPUAllocator can be used to avoid workspace allocation at runtime.

Commits:
* Replaced runtime host to device memcpys with GPUAllocator reservations in order to move them to compile time.

* Forgot to remove no longer necessary buffer freeing from op emitters.

[CS:GPU::Part4] Added op::ReplaceSlice and enabled respective tests. (#999)

This PR implements ReplaceSlice using the coordinate transformation strategy. A thread for each tensor element of the input tensor is chosen and it's position in the source tensor coordinate system is calculated. If it is within the source slice, the source is loaded and written out, otherwise the input tensor is loaded.

* Relevant tests are enabled.

* This op was refactored to utilize the new GPUAllocator and memory manager.

Commits:

* Updated replace_slice op to utilize GPUShape and GPUMemoryManager.

* Added back missing changes after timeline resolution.

* Fixed clang warnings and bug. The cudnn_handle was not initialized ahead of emission time and so any eager cudnn calls would fail.
To fix this, the cudnn and cublas handle creation was moved to the external function constructor.

* Changed row_major_strides to always return vector<size_t> to avoid overflow for tensors with many dimensions. Handle the conversion to 32 bits for GPU shapes with an explicit conversion constructor from vector<size_t>.

* During merge the allocation line from external_function was left out. Adding it back.

d051f5fa