Commits · 561c81e2dc73c931739cc54640f2a1ad84d2488e · submodule / ngraph

24 Jan, 2018 3 commits

Drwebb/gpu backend dot op (#413) · 94d80ffa

Tristan Webb authored 7 years ago

* Drwebb/gpu backend dot op (#387)

* GPU Dot prod emitter switch statement

* cuBLAS dot kernel call

* Flush out arg substitution into gpu dot kernel call

* Drwebb/gpu backend dot op (#392)

* Take in CodeWriter into gpu op emitters

* Introduce GPU function gen based on pass functions

* Additional gpu emitter stubs

* link cublas in to unit test and ngraph

* Use static code gen methods for GPU, add new GPU op stubs

* use pass manager to declare functions / cublas Updates

* Prune down gpu_external_function wip

* Switch back to GPU tensor views in GPU backend

* Pass in cublas handle to GPU external function

* cuMalloc memory in gpu tensor view

* Use cuda runtime malloc and free for tensor view managment c

* change GPU tensor view init, and use GPU tensor view for GPU call frame

* include headers as system dirs

* GPU tensor printing utility function

* cublasSetPointer to device mode / Fix copyright notification lowercasing

* Passing GPU dot product test using cuBLAS

Clean up

* Changes from review

94d80ffa

Change convolution reference to work with f32 (#409) · 2b0a5489
Adam Procter authored 7 years ago

2b0a5489
Remove TupleType, ValueType (#411) · d87b0065
Scott Cyphers authored 7 years ago
```
* Remove TupleType, ValueType

* Fix compile error.
```
d87b0065

23 Jan, 2018 1 commit

convolution backprop (#404) · 72a2ce72

adstraw authored 7 years ago

* fix convlution reference script

* convolution backprop

* cleanup

* fix build warnings

* Missing include

* fix build warning part 2

* move numeric_compare to its own header
code review feedback

* fix build warnings 3

* fix build warnings 4

* clang-format

* cast to avoid implicit cast warning

72a2ce72

20 Jan, 2018 3 commits
- using namespace flatten (#400) · ad6b0f07
  Robert Kimball authored 7 years ago
```
* wip

* using namespace cleanup
```
  ad6b0f07
- fix/enable zero_sized unit tests for non-INTERPRETER backends (#401) · 379300b7
  Robert Kimball authored 7 years ago
  
  379300b7
- Move std::vector read/write from runtime::TensorView to unit test directory (#397) · bd01bf2c
  Robert Kimball authored 7 years ago
```
* wip

* wip

* remove get_vector from runtime::TensorView class as it was for unit test only

* cleanup

* move writting vector to runtime::TensorView to the unit test dir

* merge fix

* PR review change

* update from PR comment

* update changes file
```
  bd01bf2c
19 Jan, 2018 5 commits

Negative convolution padding (#396) · c5144d48
Adam Procter authored 7 years ago

c5144d48
Generalized constant-padding op (#383) · 68ef3faa
Adam Procter authored 7 years ago

68ef3faa

Add flag to enable memory sanitizer (#393) · 0f836183

Robert Kimball authored 7 years ago

* cleanup in-memory header files

* add switch to enable memory sanitizer (works like valgrind)

* removed header file cleanup as it was causing a segfault on program termination

0f836183

Drwebb/gpu doc (#386) · 408f3b25

Tristan Webb authored 7 years ago

* Add mention of blob ref of original file from caffe2

* Mention location of source listing originally from LLVM project

408f3b25

Forward prop for average pooling (#380) · 0931b83b

Adam Procter authored 7 years ago

* Average pool type checking and kernel; type checking tests

* Fix and enable average-pool tests

* Docstring fix

* Extend AvgPool op type checking to support padding

* Untested code for padded avg-pool

* Unit tests for padded avg-pool

* Add CPU implementation

* Temp delete

* Docstring fix

* Docstring fix

* Add tests mixing padding and stride

* Temporary cut to ease merge

* Restore temporary cut for merge

* Empty commit to try to force CI to wake up

0931b83b

18 Jan, 2018 3 commits
- bprop for MaxPool (#391) · 9264bc16
  Nick Korovaiko authored 7 years ago
  
  9264bc16
- zero-sized tensor tests with multiple data types (#378) · d43a0557
  Matthew Brookhart authored 7 years ago
  
  d43a0557
- Yixing/empty tuple (#390) · 9d0d7a7c
  Robert Kimball authored 7 years ago
```
* add test for empty tuple

* fix null function breaking
```
  9d0d7a7c
17 Jan, 2018 3 commits

Add mxnet seq2seq serialized model for benchmarking (#385) · 5ad1de22
Robert Kimball authored 7 years ago
```
* add mxnet seq2seq forward and backward

* add benchmarks for seq2seq forward and backward
```
5ad1de22
Numerically stable sum so we can pass mxnet unit tests (#381) · b6c98de1
Matthew Brookhart authored 7 years ago
```
* Numerically stable sum so we can pass mxnet unit tests

* Add a small initial residual
```
b6c98de1

Drwebb/gpu external function (#367) · c5549682

Tristan Webb authored 7 years ago

* Initial GPU_ExternalFunction implementation

Other changes:

Add GPU runtime to same cmake block as GPU, include CUDA headers if GPU enabled

Initial passing (a+b)*c test

Properly link cuda libraries

Simple GPUTensorView implementation

Initial GPU emitter

GPU codegen initial function gen, no kernels yet

Rename GPU emitter and tensor_view_wrapper to match naming convention

* GPU external function based on BASE

* Fix stray base -> gpu

* TensorViewWrapper -> GPU_TensorViewWrapper

* Copy over emitter from base transformer

* Fix for naming dense layout

* Copy kernel emitters from base -> gpu and strip out kernel_utils

* Add aliases to GPU_TensorViewWrappers

* More fixes for naming descriptor::TensorViews

* Move in call_frame implementation from base -> gpu

* apply code format

* GPU codegen running A+B*C

gpu emitters
gpu ctx setup cuda_module kernels
Remove GPU_CF perf counters
Use gpu kernels in external function
Add GPU 1d dot test

Review Changes:
* Remove CPU specific kernel emitting method bodies

* Use copy_data from test/util.cpp, uncomment compileTest

* Use test_utils copy_data function

* Grab function name from pass manager for def, clean up indentation

c5549682

16 Jan, 2018 1 commit
- Implement select-and-scatter (#364) · 29231e11
  Adam Procter authored 7 years ago
  
  29231e11
12 Jan, 2018 1 commit
- Image batch dilation for convolution (#363) · c682fbf4
  Adam Procter authored 7 years ago
```
Sub-PR: image dilation tests (#362) via @adstraw 
```
  c682fbf4
11 Jan, 2018 2 commits
- add interpreter nan check option (#368) · 74850150
  Robert Kimball authored 7 years ago
```
* add interpreter nan check option

* add unit test
```
  74850150
- Better error message from runtime::Manager. · a2d97200
  Christian Convey authored 7 years ago
  
  a2d97200
10 Jan, 2018 3 commits

Pattern matching for sum (#293) · 4345e39d

Nick Korovaiko authored 7 years ago

* the first stab at pattern for sum

test refactoring, debug msg clean up, formatting fixes

removing v1 and cleaning up v2 + formatting

rollback the changes in reduce_ops

rename v2 -> sum_pred

remove unused funcs

switch to new c-tors

remove TensorViewType

removing an assert

fix a docstring to match a c-tor

* fixes after rebase

4345e39d

Implement reduce-window in interpreter and CPU (#359) · c5ffe8e9
Adam Procter authored 7 years ago

c5ffe8e9

Switch from Eigen to OpenMP for loops for DS2 kernels (#345) · 7df687c1

Matthew Brookhart authored 7 years ago

* speed up reduceslice with kernel emitter

* const-ify and fix a clang warning

* add elementwise ops, slice to for loops

* add broadcast codegen

* add Exp

* fix bugs introduced in eigen kernels

* fix another introduced bug in Eigen

* Fix an Atomic Bug with Sum, do some cleanup

* unit tests pass

* Add Reshape Op, passes Tests

* rewrite sum to correctly handle muti-threading

* Code Cleanup

* add some extra unary ops

* Address review comments

* fix an error in the review comment refactor

* Add Power op

* Add (most) of the Logic Ops

* Make Concat default to OpenMP kernel

* fix n-D reshape issue

7df687c1

09 Jan, 2018 2 commits

Remove an optimization for caching a list of ordered ops (#360) · 7e89f1bb

Nick Korovaiko authored 7 years ago

* remove caching of ordered_ops

* graph_util logging msgs

* small cleanup

* remove files for the TopologicalSort pass

* remove NGRAPH_DEBUG from graph_util.hpp

7e89f1bb

Optimizations to reduce compile time (#357) · 7f3dc2d7

Robert Kimball authored 7 years ago

* much faster compile time
* Remove all variables and just directly access inputs, output, and temps.
* compare layouts when checking if two ops are equal
* make performance counters available to all backends

7f3dc2d7

08 Jan, 2018 1 commit
- Definitions of XLA ConvNet MNIST ops (#324) · 524d04fc
  Adam Procter authored 7 years ago
  
  524d04fc
05 Jan, 2018 3 commits

Zero padding for convolution (#352) · 8c4ae5ea
Adam Procter authored 7 years ago

8c4ae5ea
Remove descriptor::Value and runtime::Value (#355) · 06f9efd9
Robert Kimball authored 7 years ago
```
* general cleanup

* remove runtime::Value

* more cleanup

* more cleanup
```
06f9efd9

Drwebb/gpu runtime boilerplate (#314) · feab44b5

Tristan Webb authored 7 years ago

* Simple boilerplate for GPU runtime files

  - GPUBackend
  - GPU ExternalFunction
  - GPUManager
  - GPUCallFrame

* Test for construction all GPU runtime classes

* Comment out calls, constructors haven't been defined

* Clang CUDA source example to later test compiling

Clang cuda example from:
https://gist.github.com/anonymous/855e277884eb6b388cd2f00d956c2fd4

* Initial nvptx compiler copied from CPU compiler sources

* Define FunctionMap and Instruction for gpu external function

* Rename Compiler -> NVPTXCompiler for gpu compile. Add call to compile for test

* Rename StaticCompiler -> NVPTXStaticCompiler for GPU code gen

* CAdd nvptx_compiler and nvptx_execution_engine to gpu sources

* Compiling source unit test using hardcoded PTX

* (a+b)*c test for GPU

* WIP Fix compile

* rmed accidentally included file

* Fix compile, and LLVM link errosr from nvptx_compiler.cpp

* Stub out parts needed for GPU manager

* Test GPU runtime method stubs

* Cleanup

* Add GPU runtime to same cmake block as GPU, include CUDA headers if GPU enabled

* Kill reflexive assertion

* change GPU naming convention to match CPU

* Snake case functions and identifiers in test case

* Change element type to match changes in master

* Make CUDA headers accessible for codegen with GPU transformer

* clang-format

* apply-code-format

feab44b5

30 Dec, 2017 2 commits

Forward prop for max pooling (#305) · d901282e

Adam Procter authored 7 years ago

* Definition and type checking for max pool

* Implement kernel, integrate into INTERPRETER, add a few unit tests, make function result type mismatch error message more informative (still need to update tests to reflect that)

* Temporarily delete unit tests to ease merge

* Temporarily delete unit tests to ease merge

* Restore deleted unit tests

* Fix a broken error message check in the unit tests

* Update to handle various TensorViewType-related things going away; add NGVM support

* Add codegen case

* Change various get_blah_shape methods to return const refs, and while we're here, make a similar change where it should have been done in convolution

* Use NDArray for max-pool tests

d901282e

recreate ops (#325) · 66d06693

varun-intel authored 7 years ago

* recreate ops

* style

* recompute ops

* style

* fix

* recreate ops

* style

* recompute ops

* style

* fix

* some

* more

* style

* remove a line

* const

* style

* NodeMap was using non-standard operator[] behavior.

* Missing include

66d06693

29 Dec, 2017 1 commit

Get value types out of public API, multi-values from Function (#340) · d092cb91

Scott Cyphers authored 7 years ago

* Function can have multiple results
Remove external use of ValueType, TupleType, Tuple
Remove many external uses of Output and Input

* corresponding CPU backend changes

* Update master changes.

* Remove type arg from Function, add changes.md

* Merge changes.

* Move bodies to .cpp, add brief doc

* Merge CPU changes.

* Remove xla includes from non-xla files

* Remove xla from tests

* First part of xla tuple support

* change fprop_cache to assume multi-output bprop functions

* New wrappers for handling tuples with XLA

* Review comments

* remove old xla files

* fix merge errors

* hand edit models to use multi output instead of tuples

d092cb91

28 Dec, 2017 4 commits

support build from ngraph repo with argon as external · 1c5abc19
Yixing Lao authored 7 years ago

1c5abc19
Add bigger models to performance benchmarks (#342) · 2d2fc8c2
Robert Kimball authored 7 years ago
```
* add larger test models
```
2d2fc8c2

Build and execute TBB flow graphs in the CPU backend (#304) · c2c33748

Jai Menon authored 7 years ago

* CMake: TBB integration placeholder

* CMake: Integrate TBB

* CMake: Indent

* CMake: Rewrite TBB integration

* CMake: More TBB integration changes

* CMake: Install TBB headers and DSOs

* CMake: Don't install the TBB debug DSO

* CMake: Propagate ngraph's configured compiler setting over to MKL-DNN

* CMake: Restore TBB debug DSO installation

* CMake: Add installed headers to search path.
This needs to be cleaned up along with other header search cleanup

* CPU: Build and execute TBB flowgraphs

* CPU: TBB fixes

* CPU: More TBB fixes

* CPU: Allow both TBB and serial codegen for now

* TBB: get_arguments -> get_input_ops

* CPU: Use node methods

* CPU: Add TBB headers in the build directory to the search path

* TBB: Incorporate various changes from master

* CMake: Indentation fix

* CMake: Indentation fix

* CMake: TBB is mandatory so remove additional predicates

* TBB: Add a test

* CMake: Fix linker flags with GCC

c2c33748

Fprop Cache Util Function (#312) · bc63f7bb

Matthew Brookhart authored 7 years ago

* in progress

* working cache_fprop, no tests

* style fix

* all inputs to bprop (except adjoints) are cached from fprop

* fix typos, make sure to check count == 0

* fix code format

bc63f7bb

27 Dec, 2017 2 commits
- Bob/benchmark cleanup (#338) · 8f3da6b8
  Robert Kimball authored 7 years ago
```
* cleanup

* cleanup

* expand

* wip

* undo
```
  8f3da6b8
- Bob/nan (#335) · 2466bacd
  Robert Kimball authored 7 years ago
```
* nan unit test

* fix NAN issue

* add INFINITY support
```
  2466bacd