Commit b1a22df1 authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Architecture and feature docs2 (#2038)

* editing docs

* more doc updates

* Cleanup theme, update backends for PlaidML, remove stale font

* Add PlaidML description and doc update that should have been added with PR 1888

* Add PlaidML description and doc update that should have been added with PR 1888

* Latest release doc updates

* Add PlaidML description and doc update for PR 1888
* Update glossary with tensor description and quantization def
* Refactor landpage with QuickStart guides
* Add better details about nGraph features and roadmap

* Placeholder detail for comparison section

* Add section link

* order sections alphabetically for now

* update compiler illustration

* Address feedback from doc review

* Update illustration wording

* Formatting and final edits

* keep tables consistent

* Clarify doc on bridge and compiler docs

* Clarify doc on bridge and compiler docs

* yay for more feedback and improvements

* edit with built doc

* Fix typo

* Another phase of PR review editing

* Final review comment resolved

* Update README with different sort of navigation options

* Remove unnecessary repetition

* Add links to announcement blogs for previous contributions to ONNX and PyTorch projects

* Better link

* Add syllabus for perf criterion

* Editing and readability on README and add page for performance-validated workloads

* Editing and readability on README and add page for performance-validated workloads

* Editing and readability on README and add page for performance-validated workloads

* Update README

* Update illustrations with detail pertinent to br

* Documenting diagram and updating about features faqs doc

* Latest Beta doc updates

* clarify wording on arch doc

* Update deprecated INSTALL.md file and CONTRIB.md

* Legacy framework support for neon removed; instead show how nGraph enables custom or customizable frameworks

* nGraph Compiler stack beta

* Add markdown version of some docs as requested

* update full ng stack diagram

* Make sure diagrams work after moving them to tld

* Update ABOUT tld info doc with PR review feedback
parent 6eefbce4
About nGraph Compiler stack
===========================
nGraph Compiler stack architecture
----------------------------------
The diagram below represents our current Beta release stack. Please note
that the stack diagram is simplified to show how nGraph executes deep
learning workloads with two hardware backends; however, many other
deep learning frameworks and backends currently are functioning.
![](doc/sphinx/source/graphics/stackngrknl.png)
Starting from the top of the diagram, we present a simplified view of
the nGraph Intermediate Representation (IR). The nGraph IR is a format
which works with a framework such as TensorFlow\* or MXNet\* when there
is a corresponding "Bridge" or import method, such as from NNVM or via
[ONNX](http://onnx.ai). Once the nGraph IR can begin using nGraph's
Core ops, components lower in the stack can begin parsing and
pattern-matching subgraphs for device-specific optimizations; these
are then encapsulated. This encapsulation is represented on the diagram
as the colored background between the `ngraph` kernel(s) and the the
stack above.
Note that everything at or below the "Kernel APIs" and "Subgraph
APIs" gets executed "automatically" during training runs. In other
words, the accelerations are automatic: parts of the graph that
are not encapsulated default to framework implementation when
executed. For example, if nGraph optimizes ResNet50 for TensorFlow,
the same optimization can be readily applied to the NNVM/MXNet
implementation of ResNet50. This works efficiently because the
nGraph (IR) Intermediate Representation, which keeps the input
and output semantics of encapsulated subgraphs, rebuilds an
encapsulated subgraph that can efficiently make use or re-use
of operations. Such an approach significantly cuts down on the
time needed to compile; when we're not relying upon the framework's
ops alone, memory management and data layouts can be more efficiently
applied to the hardware backends in use.
The nGraph Core uses a strongly-typed and platform-neutral (IR)
Intermediate Representation to construct a "stateless" graph.
Each node, or `op`, in the graph corresponds to one step in
a computation, where each step produces zero or more tensor
outputs from zero or more tensor inputs.
After construction, our Hybrid transformer takes the IR, further
partitions it into subgraphs, and assigns them to the best-performing
backend. There are two hardware backends shown in the stack diagram
to demonstrate nGraph's graph partitioning. The Hybrid transformer
assigns complex operations (subgraphs) to the Intel® Nervana™ Neural
Network Processor (NNP), or to a different CPU backend to expedite
the computation, and the remaining operations default to CPU. In the
future, we will further expand the capabilities of Hybrid transformer
by enabling more features, such as localized cost modeling and memory
sharing, when the next generation of NNP (Neural Network Processor)
is released. In the meantime, your deep learning software engineering
or modeling can be confidently built upon this stable anchorage.
The Intel® Architecture IA (Intel® Architecture) transformer provides
two modes that reduce compilation time, and have already been shown
as useful for training, deploying, and retraining a deep learning
workload in production. For example, in our tests, DEX mode reduced
ResNet50 compilation time by 30X.
We are excited to continue our work in enabling distributed training,
and we plan to expand the nodes to 256 in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.
In this Beta release, nGraph via Bridge code supports only Just In
Time (JiT) compilation; the ONNX importer does not support anything
that nGraph cannot support. While nGraph currently has very limited
support for dynamic graphs, it is possible to get dynamic graphs
working. Future releases will add better support and use case
examples for such things as Ahead of Time compilation.
Features
--------
The nGraph (IR) Intermediate Representation contains a combination
of device-specific and non-device-specific optimization :
- **Fusion** -- Fuse multiple ops to to decrease memory usage.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for a given or
available device.
- **Data reuse** -- Save results and reuse for subgraphs with the
same input.
- **Graph scheduling** -- Run similar subgraphs in parallel via
multi-threading.
- **Graph partitioning** -- Partition subgraphs to run on different
devices to speed up computation; make better use of spare CPU cycles
with nGraph.
- **Memory management** -- Prevent peak memory usage by intercepting
a graph with or by a "saved checkpoint," and to enable data auditing.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for whatever given
or available device.
Current nGraph Compiler full stack
----------------------------------
![](doc/sphinx/source/graphics/full-ngstck.png)
In addition to IA and NNP transformers, nGraph Compiler stack has transformers
for multiple GPU types and an upcoming Intel deep learning accelerator. To
support the growing number of transformers, we plan to expand the capabilities
of the hybrid transformer with a cost model and memory sharing. With these new
features, even if nGraph has multiple backends targeting the same hardware, it
will partition the graph into multiple subgraphs and determine the best way to
execute each subgraph.
Contributor Guidelines Contributor Guidelines
====================== ======================
http://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html https://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html
License License
......
FAQs
----
### Why nGraph?
We developed nGraph to simplify the realization of optimized deep learning
performance across frameworks and hardware platforms. The value we're offering
to the developer community is empowerment: we are confident that Intel®
Architecture already provides the best computational resources available
for the breadth of ML/DL tasks.
### How do I connect a framework?
The nGraph Library manages framework bridges for some of the more widely-known
frameworks. A bridge acts as an intermediary between the nGraph core and the
framework, and the result is a function that can be compiled from a framework.
A fully-compiled function that makes use of bridge code thus becomes a
"function graph", or what we sometimes call an **nGraph graph**.
Low-level nGraph APIs are not accessible *dynamically* via bridge code; this
is the nature of stateless graphs. However, do note that a graph with a
"saved" checkpoint can be "continued" to run from a previously-applied checkpoint,
or it can loaded as static graph for further inspection.
For a more detailed dive into how custom bridge code can be implemented, see our
documentation on [Working with other frameworks]. To learn how TensorFlow and MXNet
currently make use of custom bridge code, see [Integrate supported frameworks].
![](doc/sphinx/source/graphics/bridge-to-graph-compiler.png)
<alt="JiT Compiling for computation" width="733" />
Although we only directly support a few frameworks at this time, we provide
documentation to help developers and engineers create custom solutions.
### How do I run an inference model?
Framework bridge code is *not* the only way to connect a model (function graph) to
nGraph's ../ops/index. We've also built an importer for models that have been
exported from a framework and saved as serialized file, such as ONNX. To learn
how to convert such serialized files to an nGraph model, please see the "How to"
documentation.
### What's next?
The Gold release is targeted for April 2019; it will feature broader workload
coverage, including support for quantized graphs, and more detail on our
advanced support for ``int8``. We developed nGraph to simplify the realization
of optimized deep learning performance across frameworks and hardware platforms.
You can read more about design decisions and what is tentatively in the pipeline
for development in our [arXiv paper](https://arxiv.org/pdf/1801.08058.pdf) from
the 2018 SysML conference.
[Working with other frameworks]: http://ngraph.nervanasys.com/docs/latest/frameworks/index.html
[Integrate supported frameworks]: http://ngraph.nervanasys.com/docs/latest/framework-integration-guides.html
Currently two platforms are known to work: Tested Platforms:
- Ubuntu 16.04 - Ubuntu 16.04 and 18.04
- CentOS 7.4 - CentOS 7.4
Ubuntu 16.04 Prerequisites Our latest instructions for how to build the library are available
========================== [in the documentation](https://ngraph.nervanasys.com/docs/latest/buildlb.html).
Compilers currently known to work are gcc-5.4.0, clang-3.9, and gcc-4.8.5. Use `cmake -LH` after cloning the repo to see the currently-supported
build options. We recommend using, at the least, something like:
If you are using gcc-5.4.0 or clang-3.9, it is recommended to add the $ cmake ../ -DCMAKE_INSTALL_PREFIX=~/ngraph_dist -DNGRAPH_USE_PREBUILT_LLVM
option `-DNGRAPH_USE_PREBUILT_LLVM=TRUE` to the `cmake` command. This causes -DNGRAPH_ONNX_IMPORT_ENABLE=ON
the build system to fetch a pre-built tarball of LLVM+Clang from `llvm.org`,
which substantially cuts down on build times.
If you are using gcc-4.8, it may be necessary to add symlinksfrom `gcc` to
`gcc-4.8`, and from `g++` to `g++-4.8`, in your PATH, even if you have
specify CMAKE_C_COMPILER and CMAKE_CXX_COMPILER when building. (You should
NOT supply the `-DNGRAPH_USE_PREBUILT_LLVM` flag in this case, because the
prebuilt tarball supplied on llvm.org is not compatible with a gcc-4.8
based build.)
CentOS 7.4 Prerequisites
========================
CentOS supplies an older version of CMake that is not compatible with
LLVM-5.0.1, which we build as an external dependency. There are two options:
1. (requires root privileges) install the the `cmake3` package from EPEL, or
2. (does not require root privileges) build CMake (3.1 or newer) from source,
and run it from its build directory.
General Instructions
====================
These instructions assume that your system has been prepared in accordance
with the above prerequisites.
```
$ cd ngraph
$ mkdir build
$ cd build
$ cmake .. \
-DCMAKE_C_COMPILER=<path to C compiler> \
-DCMAKE_CXX_COMPILER=<path to C++ compiler>
$ make -j install
```
# nGraph Library [![Build Status][build-status-badge]][build-status] # nGraph Compiler Stack Beta
Welcome to the open-source repository for the **Intel® nGraph Library**. Our code
base provides a Compiler and runtime suite of tools (APIs) designed to give
developers maximum flexibility for their software design, allowing them to
create or customize a scalable solution using any framework while also avoiding
device-level hardware lock-in that is so common with many AI vendors. A neural
network model compiled with nGraph can run on any of our currently-supported
backends, and it will be able to run on any backends we support in the future
with minimal disruption to your model. With nGraph, you can co-evolve your
software and hardware's capabilities to stay at the forefront of your industry.
![nGraph ecosystem][ngraph-ecosystem] [![Build Status][build-status-badge]][build-status] [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/NervanaSystems/ngraph/blob/master/LICENSE)
The **nGraph Compiler** is Intel's graph compiler for Artificial Neural Networks. <div align="left">
Documentation in this repo describes how you can program any framework <h3>
to run training and inference computations on a variety of Backends including <a href="https://ngraph.nervanasys.com/docs/latest/project/about.html">
Intel® Architecture Processors (CPUs), Intel® Nervana™ Neural Network Processors Architecture and features</a> | <a href="#Ecosystem" >nGraph ecosystem</a><span> </span> <span> | </span>
(NNPs), cuDNN-compatible graphics cards (GPUs), custom VPUs like [Movidius], and <a href="https://ngraph.nervanasys.com/docs/latest/project/release-notes.html">
many others. The default CPU Backend also provides an interactive *Interpreter* Beta release notes</a><span> | </span> <br />
mode that can be used to zero in on a DL model and create custom nGraph <a href="https://ngraph.nervanasys.com/docs/latest">Documentation</a><span> | </span>
optimizations that can be used to further accelerate training or inference, in <a href="#How-to-contribute" >How to contribute</a>
whatever scenario you need. </h3>
</div>
nGraph provides both a C++ API for framework developers and a Python API which ## Quick start
can run inference on models imported from ONNX.
See the [Release Notes] for recent changes.
To begin using nGraph with popular frameworks to accelerate deep learning
workloads on CPU for inference, please refer to the links below.
| Framework | bridge available? | ONNX support? | | Framework / Version | Installation guide | Notes
|----------------|-------------------|----------------| |----------------------------|----------------------------------------|-----------------------------------
| TensorFlow* | yes | yes | | TensorFlow* 1.12 | [Pip package] or [Build from source] | 17 [Validated workloads]
| MXNet* | yes | yes | | MXNet* 1.4 | [Enable the module] or [Source compile]| 17 [Validated workloads]
| PaddlePaddle | yes | yes | | ONNX 1.3 | [Pip package] | 13 [Functional] workloads with DenseNet-121, Inception-v1, ResNet-50, Inception-v2, ShuffleNet, SqueezeNet, VGG-19, and 7 more
| PyTorch* | no | yes |
| Chainer* | no | yes |
| CNTK* | no | yes |
| Caffe2* | no | yes |
Frameworks using nGraph Compiler stack to execute workloads have shown
**3X** to **45X** performance boost when compared to native framework
implementations. We've also seen performance boosts running workloads that
are not included on the list of [Validated workloads], thanks to our
powerful subgraph pattern matching and thanks to the collaborative efforts
we've put into the DL community, such as with [nGraph-ONNX adaptable] APIs
and [nGraph for PyTorch developers].
| Backend | current support | future support | Additional work is also being done via [PlaidML] which will feature running
|-----------------------------------------------|-------------------|----------------| compute for Deep Learning with GPU accleration and support for MacOS. See our
| Intel® Architecture CPU | yes | yes | [Architecture and features] for what the stack looks like today and watch our
| Intel® Nervana™ Neural Network Processor (NNP)| yes | yes | [Release Notes] for recent changes.
| Intel [Movidius™ Myriad™ 2] VPUs | coming soon | yes |
| Intel® Architecture GPUs | via PlaidML | yes |
| AMD* GPUs | via PlaidML | yes | ## What is nGraph Compiler?
| NVIDIA* GPUs | via PlaidML | some |
| Field Programmable Gate Arrays (FPGA) | no | yes |
nGraph Compiler aims to accelerate developing and deploying AI workloads
using any deep learning framework with a variety of hardware targets.
We strongly believe in providing freedom, performance, and ease-of-use to AI
developers.
## Documentation The diagram below shows what deep learning frameworks and hardware targets
we support. More details on these current and future plans are in the ecosystem
section.
![nGraph ecosystem][ngraph-ecosystem]
See our [install] docs for how to get started.
For this early release, we provide [framework integration guides] to While the ecosystem shown above is all functioning, we have validated
compile MXNet and TensorFlow-based projects. If you already have a performance metrics for deep learning inference on CPU processors including
trained model, we've put together a getting started guide for as Intel® Xeon®. Please refer to the [Beta release notes] to learn more.
[how to import] a deep learning model and start working with the nGraph The Gold release is targeted for April 2019; it will feature broader workload
APIs. coverage, including support for quantized graphs, and more detail on our
advanced support for ``int8``.
## Support Our documentation has extensive information about how to use nGraph Compiler
stack to create an nGraph computational graph, integrate custom frameworks,
and interact with supported backends. If you wish to contribute to the
project, please don't hesitate to ask questions in [GitHub issues] after
reviewing our contribution guide below.
Please submit your questions, feature requests and bug reports via
[GitHub issues].
## How to Contribute ## How to contribute
We welcome community contributions to nGraph. If you have an idea how We welcome community contributions to nGraph. If you have an idea how
to improve the Library: to improve it:
* See the [contrib guide] for code formatting and style guidelines. * See the [contrib guide] for code formatting and style guidelines.
* Share your proposal via [GitHub issues]. * Share your proposal via [GitHub issues].
...@@ -82,15 +85,45 @@ to improve the Library: ...@@ -82,15 +85,45 @@ to improve the Library:
modifications are necessary, may provide feedback to guide you. When modifications are necessary, may provide feedback to guide you. When
accepted, your pull request will be merged to the repository. accepted, your pull request will be merged to the repository.
[install]: http://ngraph.nervanasys.com/docs/latest/buildlb.html ![nGraph Compiler Stack][ngraph-compiler-stack-readme]
| Backend | current support | future support |
|-----------------------------------------------|-------------------|----------------|
| Intel® Architecture CPU | yes | yes |
| Intel® Nervana™ Neural Network Processor (NNP)| yes | yes |
| Intel [Movidius™ Myriad™ 2] VPUs | coming soon | yes |
| Intel® Architecture GPUs | via PlaidML | yes |
| AMD* GPUs | via PlaidML | yes |
| NVIDIA* GPUs | via PlaidML | some |
| Field Programmable Gate Arrays (FPGA) | no | yes |
[Architecture and features]:https://ngraph.nervanasys.com/docs/latest/project/about.html
[Documentation]: https://ngraph.nervanasys.com/docs/latest
[build the Library]: https://ngraph.nervanasys.com/docs/latest/buildlb.html
[Getting Started Guides]: Getting-started-guides
[Validated workloads]: https://ngraph.nervanasys.com/docs/latest/frameworks/validation-testing.html
[Functional]: https://github.com/NervanaSystems/ngraph-onnx/
[How to contribute]: How-to-contribute
[framework integration guides]: http://ngraph.nervanasys.com/docs/latest/framework-integration-guides.html [framework integration guides]: http://ngraph.nervanasys.com/docs/latest/framework-integration-guides.html
[release notes]: http://ngraph.nervanasys.com/docs/latest/project/release-notes.html [release notes]: https://ngraph.nervanasys.com/docs/latest/project/release-notes.html
[Github issues]: https://github.com/NervanaSystems/ngraph/issues [Github issues]: https://github.com/NervanaSystems/ngraph/issues
[contrib guide]: http://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html [contrib guide]: https://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html
[pull request]: https://github.com/NervanaSystems/ngraph/pulls [pull request]: https://github.com/NervanaSystems/ngraph/pulls
[how to import]: http://ngraph.nervanasys.com/docs/latest/howto/import.html [how to import]: https://ngraph.nervanasys.com/docs/latest/howto/import.html
[ngraph-ecosystem]: doc/sphinx/source/graphics/599px-Intel-ngraph-ecosystem.png "nGraph Ecosystem" [ngraph-ecosystem]: doc/sphinx/source/graphics/599px-Intel-ngraph-ecosystem.png "nGraph Ecosystem"
[ngraph-compiler-stack-readme]: doc/sphinx/source/graphics/ngraph-compiler-stack-readme.png "nGraph Compiler Stack"
[build-status]: https://travis-ci.org/NervanaSystems/ngraph/branches [build-status]: https://travis-ci.org/NervanaSystems/ngraph/branches
[build-status-badge]: https://travis-ci.org/NervanaSystems/ngraph.svg?branch=master [build-status-badge]: https://travis-ci.org/NervanaSystems/ngraph.svg?branch=master
[develop-without-lockin]: doc/sphinx/source/graphics/develop-without-lockin.png "Develop on any part of the stack wtihout lockin" [develop-without-lockin]: doc/sphinx/source/graphics/develop-without-lockin.png "Develop on any part of the stack wtihout lockin"
[Movidius™ Myriad™ 2]:https://www.movidius.com/solutions/vision-processing-unit [Movidius™ Myriad™ 2]:https://www.movidius.com/solutions/vision-processing-unit
[PlaidML]: https://github.com/plaidml/plaidml
[Pip package]: https://github.com/NervanaSystems/ngraph-onnx#installing-ngraph-onnx
[Build from source]: https://github.com/NervanaSystems/ngraph-tf
[Source compile]: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/NGRAPH_README.md
[nGraph-ONNX]: https://github.com/NervanaSystems/ngraph-onnx/blob/master/README.md
[nGraph-ONNX adaptable]: https://ai.intel.com/adaptable-deep-learning-solutions-with-ngraph-compiler-and-onnx/
[nGraph for PyTorch developers]: https://ai.intel.com/investing-in-the-pytorch-developer-community
[Validated workloads]: https://ngraph.nervanasys.com/docs/latest/frameworks/validation-testing.html
...@@ -35,7 +35,7 @@ repeated use of the trademark / branding symbols. ...@@ -35,7 +35,7 @@ repeated use of the trademark / branding symbols.
* Intel® Xeon® (CPU processor) * Intel® Xeon® (CPU processor)
* Intel® Architecture * Intel® Architecture (IA)
* Intel® nGraph™ * Intel® nGraph™
......
...@@ -6,7 +6,6 @@ Integrate Supported Frameworks ...@@ -6,7 +6,6 @@ Integrate Supported Frameworks
* :ref:`mxnet_intg` * :ref:`mxnet_intg`
* :ref:`tensorflow_intg` * :ref:`tensorflow_intg`
* :ref:`neon_intg`
A framework is "supported" when there is a framework :term:`bridge` that can be A framework is "supported" when there is a framework :term:`bridge` that can be
cloned from one of our GitHub repos and built to connect to nGraph device backends, cloned from one of our GitHub repos and built to connect to nGraph device backends,
...@@ -46,80 +45,9 @@ See the `ngraph tensorflow bridge README`_ for how to install the `DSO`_ for the ...@@ -46,80 +45,9 @@ See the `ngraph tensorflow bridge README`_ for how to install the `DSO`_ for the
nGraph-TensorFlow bridge. nGraph-TensorFlow bridge.
.. _neon_intg:
neon |trade|
============
Use ``neon`` as a frontend for nGraph backends
-----------------------------------------------
``neon`` is an open source Deep Learning framework that has a history
of `being the fastest`_ framework `for training CNN-based models with GPUs`_.
Detailed info about neon's features and functionality can be found in the
`neon docs`_. This section covers installing neon on an existing
system that already has an ``ngraph_dist`` installed.
.. important:: As of version |version|, these instructions presume that your
system already has the library installed to the default location, as outlined
in our :doc:`buildlb` documentation.
#. Set the ``NGRAPH_CPP_BUILD_PATH`` and the ``LD_LIBRARY_PATH``. You can use
the ``env`` command to see if these paths have been set already and if they
have not, they can be set with something like:
.. code-block:: bash
export NGRAPH_CPP_BUILD_PATH=$HOME/ngraph_dist/
export LD_LIBRARY_PATH=$HOME/ngraph_dist/lib/
#. The neon framework uses the :command:`pip` package manager during installation;
install it with Python version 3.5 or higher:
.. code-block:: console
$ sudo apt-get install python3-pip python3-venv
$ python3 -m venv neon_venv
$ cd neon_venv
$ . bin/activate
(neon_venv) ~/frameworks$
#. Go to the "python" subdirectory of the ``ngraph`` repo we cloned during the
previous :doc:`buildlb`, and complete these actions:
.. code-block:: console
(neon_venv)$ cd /opt/libraries/ngraph/python
(neon_venv)$ git clone --recursive -b allow-nonconstructible-holders https://github.com/jagerman/pybind11.git
(neon_venv)$ export PYBIND_HEADERS_PATH=/opt/libraries/ngraph/python/pybind11
(neon_venv)$ pip install -U .
#. Finally we're ready to install the `neon` integration:
.. code-block:: console
(neon_venv)$ git clone git@github.com:NervanaSystems/ngraph-neon
(neon_venv)$ cd ngraph-neon
(neon_venv)$ make install
#. To test a training example, you can run the following from ``ngraph-neon/examples/cifar10``
.. code-block:: console
(neon_venv)$ python cifar10_conv.py
#. (Optional) For experimental or alternative approaches to distributed training
methodologies, including data parallel training, see the :doc:`distr/index`
and :doc:`How to <howto/index>` articles on :doc:`howto/distribute-train`.
.. _nGraph-MXNet: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/NGRAPH_README.md .. _nGraph-MXNet: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/NGRAPH_README.md
.. _MXNet: http://mxnet.incubator.apache.org .. _MXNet: http://mxnet.incubator.apache.org
.. _DSO: http://csweb.cs.wfu.edu/%7Etorgerse/Kokua/More_SGI/007-2360-010/sgi_html/ch03.html .. _DSO: http://csweb.cs.wfu.edu/%7Etorgerse/Kokua/More_SGI/007-2360-010/sgi_html/ch03.html
.. _ngraph-neon python README: https://github.com/NervanaSystems/ngraph/blob/master/python/README.md
.. _ngraph neon repo's README: https://github.com/NervanaSystems/ngraph-neon/blob/master/README.md
.. _neon docs: https://github.com/NervanaSystems/neon/tree/master/doc
.. _being the fastest: https://github.com/soumith/convnet-benchmarks .. _being the fastest: https://github.com/soumith/convnet-benchmarks
.. _for training CNN-based models with GPUs: https://www.microway.com/hpc-tech-tips/deep-learning-frameworks-survey-tensorflow-torch-theano-caffe-neon-ibm-machine-learning-stack
.. _ngraph tensorflow bridge README: https://github.com/NervanaSystems/ngraph-tf .. _ngraph tensorflow bridge README: https://github.com/NervanaSystems/ngraph-tf
...@@ -5,7 +5,7 @@ Working with other frameworks ...@@ -5,7 +5,7 @@ Working with other frameworks
############################## ##############################
An engineer may want to work with a deep learning framework that does not yet An engineer may want to work with a deep learning framework that does not yet
have bridge code written. For non-supported or “generic” frameworks, it is have bridge code written. For non-supported or "generic" frameworks, it is
expected that engineers will use the nGraph library to create custom bridge code, expected that engineers will use the nGraph library to create custom bridge code,
and/or to design and document a user interface (UI) with specific runtime and/or to design and document a user interface (UI) with specific runtime
options for whatever custom use case they need. options for whatever custom use case they need.
......
.. framework/index: .. framework/index:
############################# Integrate Other Frameworks
Integrate Generic Frameworks ###########################
#############################
In this section, written for framework architects or engineers who want In this section, written for framework architects or engineers who want
to optimize brand new, generic, or less widely-supported frameworks, we provide to optimize brand new, generic, or less widely-supported frameworks, we provide
...@@ -20,6 +19,7 @@ work and custom bridge code, such as that for our `ngraph tensorflow bridge`_. ...@@ -20,6 +19,7 @@ work and custom bridge code, such as that for our `ngraph tensorflow bridge`_.
:maxdepth: 1 :maxdepth: 1
generic.rst generic.rst
validation-testing.rst
......
.. frameworks/validation-testing:
Validation and testing
######################
* **Validating** -- To provide optimizations with nGraph, we first
confirm that a given workload is "validated" as being functional;
that is, we can successfully load its serialized graph as an nGraph
:term:`function graph`. Following here is a list of 14 workloads
we've tested with success.
.. csv-table::
:header: "Workload", "Validated"
:widths: 27, 53
:escape: ~
DenseNet-121, Functional
Inception-v1, Functional
Inception-v2, Functional
ResNet-50, Functional
Shufflenet, Functional
SqueezeNet, Functional
VGG-19, Functional
ZFNet-512, Functional
MNIST, Functional
Emotion-FERPlus, Functional
BVLC AlexNet, Functional
BVLC GoogleNet, Functional
BVLC CaffeNet, Functional
BVLC R-CNN ILSVRC13, Functional
* **Testing & Performance Optimizations** for workloads that have been
"validated" with nGraph are also available via the nGraph
:abbr:`Intermediate Representation (IR)`). For example, a common use
case for data scientists is to train a new model with a large dataset,
and so nGraph already has several accelerations available "out of the
box" for the workloads noted below.
TensorFlow
==========
.. csv-table::
:header: "TensorFlow Workloads", "Performance"
:widths: 27, 53
:escape: ~
Resnet50 v1 and v2, 50% of P40
Inception V3 and V4, 50% of P40
Inception-ResNetv2, 50% of P40
MobileNet v1, 50% of P40
SqueezeNet v1.1, 50% of P40
SSD-VGG16, 50% of P40
R-FCN, 50% of P40
Faster RCNN, 50% of P40
Yolo v2, 50% of P40
GNMT, Greater than or equal to :abbr:`Direct Optimization (DO)`
Transformer-LT, 50% of P40
Wide & Deep, 50% of P40
WaveNet, Functional
U-Net, Greater than DO
DRAW, 50% of P40
A3C, 50% of P40
MXNet
=====
.. csv-table::
:header: "MXNet Workloads", "Performance"
:widths: 27, 53
:escape: ~
Resnet50 v1 and v2, 50% of P40
DenseNet (121 161 169 201), 50% of P40
InceptionV3, 50% of P40
InceptionV4, 50% of P40
Inception-ResNetv2, 50% of P40
MobileNet v1, 50% of P40
SqueezeNet v1 and v1.1, 50% of P40
VGG16, Functional (No DO available)
Faster RCNN, 50% of P40
SSD-VGG16, 50% of P40
GNMT, Greater than or equal to :abbr:`Direct Optimization (DO)`
Transformer-LT, 50% of P40
Wide & Deep, 50% of P40
WaveNet, Functional
DeepSpeech2, 50% of P40
DCGAN, 50% of P40
A3C, Greater than or equal to DO
.. about: .. about:
About Features, FAQs Architecture, Features, FAQs
#################### ############################
* :ref:`architecture`
* :ref:`features` * :ref:`features`
* :ref:`faq` * :ref:`faq`
* :ref:`whats_next` * :ref:`whats_next`
.. _architecture:
nGraph Compiler stack architecture
==================================
The diagram below represents our current Beta release stack. Please note that
the stack diagram is simplified to show how nGraph executes deep learning
workloads with two hardware backends; however, many other deep learning
frameworks and backends currently are functioning.
.. figure:: ../graphics/stackngrknl.png
:width: 771px
:alt: Current Beta release stack
Simplified stack diagram for nGraph Compiler and components Beta
Starting from the top of the diagram, we present a simplified view of how
the nGraph :abbr:`Intermediate Representation (IR)` can receive a graph from a
framework such as TensorFlow\* or MXNet\* when there is a corresponding
"Bridge" or import method, such as from NNVM or via `ONNX`_. Once the nGraph
:doc:`../ops/index` can begin parsing the graph as a computation graph, they
can pattern-match subgraphs for device-specific optimizations; these are then
encapsulated. This encapsulation is represented on the diagram as the colored
background between the ``ngraph`` kernel(s) and the the stack above.
Note that everything at or below the "Kernel APIs" and "Subgraph APIs" gets
executed "automatically" during training runs. In other words, the accelerations
are automatic: parts of the graph that are not encapsulated default to framework
implementation when executed. For example, if nGraph optimizes ResNet50 for
TensorFlow, the same optimization can be readily applied to the NNVM/MXNet
implementation of ResNet50. This works efficiently because the nGraph
:abbr:`(IR) Intermediate Representation`, which keeps the input and output
semantics of encapsulated subgraphs, rebuilds an encapsulated subgraph that can
efficiently make use or re-use of operations. Such an approach significantly
cuts down on the time needed to compile; when we're not relying upon the
framework's ops alone, memory management and data layouts can be more efficiently
applied to the hardware backends in use.
The :doc:`nGraph Core <../ops/index>` uses a strongly-typed and platform-neutral
:abbr:`(IR) Intermediate Representation` to construct a "stateless" graph. Each
node, or ``op``, in the graph corresponds to one :term:`step` in a computation,
where each step produces zero or more tensor outputs from zero or more tensor
inputs.
After construction, our Hybrid transformer takes the IR, further partitions it
into subgraphs, and assigns them to the best-performing backend. There are two
hardware backends shown in the stack diagram to demonstrate nGraph's graph
partitioning. The Hybrid transformer assigns complex operations (subgraphs) to
the Intel® Nervana™ :abbr:`Neural Network Processor (NNP)`, or to a different
CPU backend to expedite the computation, and the remaining operations default
to CPU. In the future, we will further expand the capabilities of Hybrid
transformer by enabling more features, such as localized cost modeling and
memory sharing, when the next generation of :abbr:`NNP (Neural Network Processor)`
is released. In the meantime, your deep learning software engineering or modeling
can be confidently built upon this stable anchorage.
The Intel® Architecture :abbr:`IA (Intel® Architecture)` transformer provides
two modes that reduce compilation time, and have already been shown as useful
for training, deploying, and retraining a deep learning workload in production.
For example, in our tests, DEX mode reduced ResNet50 compilation time by 30X.
We are excited to continue our work in enabling distributed training, and we
plan to expand the nodes to 256 in Q4 ‘18. Additionally, we are testing model
parallelism in addition to data parallelism.
.. note:: In this Beta release, nGraph via Bridge code supports only :abbr:`Just In Time (JiT)`
compilation; the nGraph ONNX companion tool supports dynamic graphs and will
add additional support for Ahead of Time compilation in the official release.
.. _features: .. _features:
Features Features
======== ========
The nGraph :abbr:`Intermediate Representation (IR)` contains a combination of The nGraph :abbr:`(IR) Intermediate Representation` contains a combination of
device-specific and non device-specific optimization and compilations to device-specific and non-device-specific optimization :
enable:
* **Fusion** -- Fuse multiple ``ops`` to to decrease memory usage "localities". * **Fusion** -- Fuse multiple ops to to decrease memory usage "localities".
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph
translating element order to work best for a given or available device.
* **Data reuse** -- Save results and reuse for subgraphs with the same input.
* **Graph scheduling** -- Run similar subgraphs in parallel via multi-threading.
* **Graph partitioning** -- Partition subgraphs to run on different devices to
speed up computation; make better use of spare CPU cycles with nGraph.
* **Memory management** -- Prevent peak memory usage by intercepting a graph * **Memory management** -- Prevent peak memory usage by intercepting a graph
with or by a "saved checkpoint," and to enable data auditing. with or by a "saved checkpoint," and to enable data auditing.
* **Data reuse** -- Save result and reuse for subgraphs with the same input * **Data layout abstraction** -- Make abstraction easier and faster with nGraph
* **Graph scheduling** -- Run similar subgraphs in parallel translating element order to work best for whatever given or available device.
* **Graph partitioning** -- Partition subgraphs to run on different devices to
speed up computation.
* :abbr:`Direct EXecution mode (DEX)` or **DEX** -- Execute kernels for the
op directly instead of using codegen when traversing the computation graph.
.. important:: See :doc:`../ops/index` to learn the nGraph means for graph .. important:: See :doc:`../ops/index` to learn the nGraph means for graph computations.
computations.
.. Our design philosophy is that the graph is not a script for running kernels; .. Our design philosophy is that the graph is not a script for running kernels;
rather, our compilation will match ``ops`` to appropriate available kernels rather, our compilation will match ``ops`` to appropriate available kernels
...@@ -37,8 +109,6 @@ enable: ...@@ -37,8 +109,6 @@ enable:
new Core ops should be infrequent and that most functionality instead gets new Core ops should be infrequent and that most functionality instead gets
added with new functions that build sub-graphs from existing core ops. added with new functions that build sub-graphs from existing core ops.
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph
translating element order to work best for whatever given or available device.
.. _portable: .. _portable:
...@@ -112,15 +182,17 @@ into their frameworks. ...@@ -112,15 +182,17 @@ into their frameworks.
.. figure:: ../graphics/develop-without-lockin.png .. figure:: ../graphics/develop-without-lockin.png
The value we're offering to the developer community is empowerment: we are
confident that Intel® Architecture already provides the best computational
resources available for the breadth of ML/DL tasks.
.. _faq: .. _faq:
FAQs FAQs
===== ====
Why nGraph?
-----------
The value we're offering to the developer community is empowerment: we are
confident that Intel® Architecture already provides the best computational
resources available for the breadth of ML/DL tasks.
How does it work? How does it work?
------------------ ------------------
...@@ -198,6 +270,7 @@ our `arXiv paper`_ from the 2018 SysML conference. ...@@ -198,6 +270,7 @@ our `arXiv paper`_ from the 2018 SysML conference.
.. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf .. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
.. _ONNX: http://onnx.ai .. _ONNX: http://onnx.ai
.. _NNVM: http://
.. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx .. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
.. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn .. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
.. _Movidius: https://developer.movidius.com/ .. _Movidius: https://developer.movidius.com/
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment