- 13 Apr, 2018 1 commit
-
-
Chris Sullivan authored
* Begin prototype of cudnn_emitter. * Added GPURuntimeContext to gpu_external_function for passing through to JIT functions. * gpu_emitters now utilize gpu runtime context. * Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint. * Added CUDNNEmitter, comparable to MKLDNNEmitter, which allows for cudnn kernels to be defined via lambda primitives that are emitted and subsequently called during graph execution. An example implementation is provided for op::Sum. * Added GPURuntimeContext to gpu_external_function for passing through to JIT functions. * gpu_emitters now utilize gpu runtime context. * Moved cublas and cudnn handles into GPURuntimeContext pointer and out of callframe EntryPoint. * GPURuntimeContext should be stored as unique_ptr in external function. * GPURuntimeContext should be stored as unique_ptr in external function. * Extract raw pointer from unique for cudnn_emitter. * Removing unrelated code from PR. * GPURuntimeContext needs to be a strict C interface in case the native compiler and clang are utilizing different glibc ABIs. Updated to reflect this. * Added cudnn::primitive typedef for better readability. * Moved allocation of CudaFunctionPool to external function so that it is available during gpu emission. * Fixed too-late initialization of cudart. * Fixed too-late initialization of cudart. * CUDNNEmitter moved into superset class GPUPrimitiveEmitter. The GPUPrimitiveEmitter handles the emission of all gpu primitives, including cudnn, cuda, and cublas. CUBLASEmitter support not yet included. * Added unordered_map for cacheing primitives in the gpu_emitter. * Added dtor to GPUPrimitiveEmitter to cleanup compiled functions. * Adding back a serialized model graph that was accidentally rem* Added a few additional helpers to use ngraph::row_major_strides. * added whitespace per @fengleitian's comment * added whitespace per @fengleitian's comment * Remove implicit type conversions from size_t to int. * Add op::MaxPool, op::MaxPoolBackprop and op::Pad to GPU transformer (#817) * Added pooling for 1 and 2dimensions. 1d uses a cuda kernel and 2d utilizes cudnn. Padding is not yet supported. * Normalized call signature on gpu emission for 1d max pool. Added a few comments. * Max pool backprop impl. inprogress. Amend this commit. * Max pool backprop implemented. Note that cuDNN requests the output tensor for the maxpool operation but it is not required for computation. * Formatting and invokation for maxpool changed. * Fixed too-late initialization of cudart. * Added padding kernel that is used with maxpool. Need to investigate remaining tests. * Changed dimensionality check to correctly determine if data is 1d or not. * Added 3d MaxPooling (forward), verified by forcing 2d case to use Nd pooling routines. * Added 3d MaxPooling (backward), verified by forcing 2d case to use Nd pooling routines. * Moved cudnn prologues for maxpool into ngraph runtime and out of primitive so that the only execution occuring on the JIT runtime is the evaluation of the op kernel. * Refactored forward and backward pooling into single CUDNNEmitter::build_pooling interface with a runtime switch to determine if the op is forward or backward propagation. * Cache preconstructed cudnn kernel for maxpool if it has already been constructed. * Forgot to add padding arrays back into cudnn kernel for MaxPool in the 2d case. * Fixed namespace issues and use join(...,'_') * Refactored 4d/Nd tensor descriptor builder into single function. * Changed conditionals and comments. Now throws if MaxPool on more than 3 spatial dimensions is requested. * Fixed forward declare for GPURuntimeContext (class -> struct). * Clang complains about missing braces on brace-initializer. Fixed implicit conversions. * Fixed implicit conversions (clang). * Reverting changes on autodiff test for maxpool. @Krovatkin will update later.
-
- 12 Apr, 2018 6 commits
-
-
Jaikrishnan Menon authored
-
Fenglei authored
* add slice op, first version * change size to output size * fix bugs * working version * using exist function for join and strides * clang format * revert accidental change
-
Nick Korovaiko authored
* add a getter for root node * recurrent graph rewrite * fix perms, rename match_root -> get_match_root * fix comp errors * make match_root return the topmost match; fix tests
-
Fenglei authored
* add convolution in progress * enable 1 test * convolution in progress * use filter descripter * filter discreptor bug fix * tensor format * add missed dimension calculator * forward convolution 4d without dilation and padding working * data dilation(deconvolution) and enable some test * add backprop convolution data and filter * backprop can compile * pass unit test, but still have problem on padding * 2d, symmtric padding, no data dilation works now * clean up code * extend gpu convolution to nd * fix some bugs * working version for upto 3d convolution, code format. * remove nunecessary changes * add restriction for data dilation and asymmetric padding * clang format * support upto 3D convolution for now * change comments to not implemented * change comments to not implemented * add quary for additional GPU workspace for convolution * clang format * code format * using row_major_strides * using join * fix bug for join * refactor dimension calculation
-
tsocha authored
* Enable BatchNorm op * Enable function call op * Enable get output element op
-
Jaikrishnan Menon authored
-
- 10 Apr, 2018 6 commits
-
-
Yixing Lao authored
* new backend API in graph partition * update API
-
Matthew Brookhart authored
-
Nick Korovaiko authored
* zero dimension tensor elimination init * more ops + refactor + tests * revert pattern.cpp * add internal zero-length test * address Scott's feedback * fix comp errors * proper static init * get rid of unique-ptr * refactor hashmap into virtual get_default_values on op classes * fix formatting
-
Robert Kimball authored
* back out api change
-
Sang Ik Lee authored
* Remove the no longer supported alternative installation method for python binding. * Put back CMakeLists.txt as it is used by travis ci dockerfile. * Remove python/CMakeLists.txt and update Travis CI
-
Jaikrishnan Menon authored
* CPU: Optimize 4D "nGraph" Reshapes (shuffle+reshape) * CPU: Add kernel sources * CPU: Replace 2D with 3D reshape * CPU: Fixes * CPU: Simplify
-
- 09 Apr, 2018 10 commits
-
-
Robert Kimball authored
* remove parameter check from Function::get_ops() * create validate pass to hold parameter validation
-
raramer01 authored
* unskipping passing gpu tests * skipping failing gpu tests * import pytest as needed * fix style issues * unskip passing test * add additional skip reason, unable to compile
-
Nick Korovaiko authored
* repacking recurrent matching as a standalone class * RecurrentMatcher * add a getter for root node * address Scott's feedback
-
L.S. Cook authored
* WIP editing so far for review and feedback * Add missing env var export for neon install new process * Add modified venv setup for TF * More edits for FW integration and landpage * Revise from PR feedback * More PR feedback and editing for clarity * Minor rewording, clearer explanation * Final pass edit * more editing
-
DawnStone authored
* adding support for GPU backend to contrib/docker added gpu dockerfiles renamed Dockerfile for centos74 fixed NGRAPH_GPU_ENABLE cmake flag name * Check for GPU support on the host system and fall back to CPU if not present * removed double option for PREBUILT_LLVM * updated README.md with additional references for GPU support * added clarifying comments cleaned up duplicate settings * removed deprecated targets from the contrib/docker/Makefile * resolved absolute vs. conditional assignment for variables based on reference OS * removed example using a custom DOCKERFILE from README file
-
Robert Kimball authored
* force backend compile() to make a copy of the graph fix copy_with_new_args on ops that have function pointers internal update unit test for new backend API add unit test for multiple simulataneous backends * move get_subdevices virtual method to Manager class * update GPU to latest * update call methods * add remove_compiled_function()
-
Michał Karzyński authored
[Py] Change Python version for tox
-
Tomasz Socha authored
-
Robert Kimball authored
-
Jaikrishnan Menon authored
* CPU: Fuse zero-padded convolution backprop filters * CPU: Add a testcase for zero-padded convolution backprop filters fusion
-
- 08 Apr, 2018 1 commit
-
-
Matthew Brookhart authored
-
- 06 Apr, 2018 6 commits
-
-
Nick Korovaiko authored
* initial support for recurring matching * fix a bug where patterns weren't populated w/ matched nodes; add recurrent tests * add a missing newline * address feedback * fix function comment
-
arogowie-intel authored
* Add/update Python wrappers for nGraph operations. - NotEqual, OneHot, Power, Sqrt, Relu, Sign, Sin, Sinh, Tan, Subtract, Select, Tanh, Sum, Reduce, Softmax, ReplaceSlice, Reverse - Add UT for Relu, Sign, Sin, Sinh, Sqrt, Tan, Tanh, * Add UT for cases when Cos and Sin are giving incorrect results. * Alphabetically sorted imports. * Small refactoring. - Update docstrings - Remove unnecesary auxiliary local variable.
-
tsocha authored
-
Nick Korovaiko authored
* make Input descriptors node owners * rename src_node to m_src_node
-
Jaikrishnan Menon authored
-
Nishant Patel authored
* Update README to make it consistent with ngraph-neon * Add instruction to clone pybind in README * Address feedback on README
-
- 05 Apr, 2018 7 commits
-
-
Nick Korovaiko authored
* visualization tracing * visualize -> m_visualize. add a programmatic way to enable visualization. tweak pass names
-
tsocha authored
- Enable Padding op - Supress multiline comment warning - improve tox configuration
-
Ashok Emani authored
* enable TensorView to use pre-allocated mem * proper check for nullptr * add unittest for custom mem with tensorview and feedback * minor fix from feedback * support GPU TensorView custom mem * feedback fix and code format
-
Jaikrishnan Menon authored
-
Jaikrishnan Menon authored
* CPU: Optimize Sum reductions * CPU: Optimize 1D reduce all case * CPU: Optimize 4D reduce all sum * CPU: Tweaks * Formatting fixes
-
Michał Karzyński authored
* Enable Travis CI * Enable Travis CI - update README * Travis CI - Parallel testing
-
Robert Kimball authored
-
- 04 Apr, 2018 3 commits
-
-
DawnStone authored
* saved a simplified contrib/docker/Makefile and helper scripts * fixed flow for make targets * restored original definition for setting the PARALLEL option on the command line per github comments * remove double build for make install targets * added a save for the ngraph_dist_gcc.tgz to maintain existing behavior * fixed passing the PARALLEL value to the make targets * integrated latest working build-ngraph-and-test script * integrated the latest working Makefile * removed reference to the THIRD_PARTY_CACHE_DIR (for future) * updated the contrib/docker/README.md file
-
Nick Korovaiko authored
* refactor Adjoints to support multi-output ops * passing tests * switch to generate_adjoints(deltas) and backprop_node * remove debugging code * fix error msg * fix typo adjoitns * fix comp errors in mnist_mlp
-
tsocha authored
* [Py]Fix problem with double set layout * Extend UT for coverage double set layout
-