Commit 014e4fbd authored by Leona C's avatar Leona C Committed by Scott Cyphers

Fix ops, cleanup ToC, add latest illustrations (#3700)

* Fix ops, cleanup ToC, add latest illustrations

* PR feedback
parent f0bc6c12
......@@ -139,9 +139,9 @@
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
{% block menu %}
{% set toctree = toctree(maxdepth=2, collapse=theme_collapse_navigation, includehidden=True) %}
{% set toctree = toctree(maxdepth=2, collapse=theme_collapse_navigation, includehidden=False) %}
{% if toctree %}
{{ toctree }}
<span class="toctree-expand">{{ toctree }}</span>
{% else %}
<!-- Local TOC -->
<div class="local-toc">{{ toc }}</div>
......
......@@ -1622,6 +1622,7 @@ a.wy-text-neutral:hover {
h1, h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend {
margin-top: 0.33em;
padding-top: 0.31em;
font-weight: 700;
font-family: Nunito, 'Nunito Sans', "IntelClear-Regular", Lato, Helvetica, "Helvetica Neue", sans;
}
......@@ -1630,8 +1631,8 @@ p {
line-height: 23px;
padding-bottom: 0.11em;
padding-top: 0.11em;
font-size: 1.19em;
margin-bottom: 0.27em;
font-size: 1.07em;
margin-bottom: 0.31em;
margin-top: 0.27em;
font-family: Nunito, 'Nunito Sans', Helvetica, 'Helvetica Neue', sans;
}
......@@ -1696,7 +1697,6 @@ code.code-large, .rst-content tt.code-large {
.wy-plain-list-disc, .rst-content .section ul, .rst-content .toctree-wrapper ul, article ul {
list-style: disc;
line-height: 24px;
margin-bottom: 24px;
}
.wy-plain-list-disc li, .rst-content .section ul li, .rst-content .toctree-wrapper ul li, article ul li {
......@@ -1721,7 +1721,6 @@ code.code-large, .rst-content tt.code-large {
.wy-plain-list-decimal, .rst-content .section ol, .rst-content ol.arabic, article ol {
list-style: decimal;
line-height: 24px;
margin-bottom: 0.5em;
}
.wy-plain-list-decimal li, .rst-content .section ol li, .rst-content ol.arabic li, article ol li {
......@@ -2428,8 +2427,9 @@ div[class^='highlight'] pre {
line-height: 1.0em;
}
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
font-size: 101% !important;
color: #2e2b27;
font-size: 91% !important;
color: #152a58;
text-decoration: underline 2px dotted #cdcac5;
background-color: #fff;
line-height: 0.8955em;
}
......@@ -2654,9 +2654,9 @@ span[id*='MathJax-Span'] {
/* border-right: 5px solid #999;*/
}
.wy-menu-vertical header, .wy-menu-vertical p.caption {
height: 33.1px;
height: 29.1px;
display: inline-block;
line-height: 35.3px;
line-height: 31.3px;
padding: 0.1 0.1 0.1 0.1;
margin-left: 0.43em;
padding-left: 0.23em;
......@@ -2666,7 +2666,7 @@ span[id*='MathJax-Span'] {
font-weight: bolder;
border-top: 5px dotted #FFA400;
border-bottom: 29px solid #d3d3d3;
font-size: 139%;
font-size: 131%;
color: #393F4D;
width: auto;
white-space: nowrap;
......@@ -2720,15 +2720,16 @@ span[id*='MathJax-Span'] {
font-size: 0.8em;
line-height: 1.6em;
color: #393f4d;
max-height: 0;
overflow: hidden;
}
.wy-menu-vertical li.on a, .wy-menu-vertical li.current > a {
font-family: Nunito, 'Nunito Sans', Helvetica, 'Helvetica Neue', sans;
position: relative;
padding-left: 1.1em;
padding-left: 0.55em;
background-color: #fafbfd;
color: #5c5955;
border-right: 4px solid #FFA400;
margin-left: 23px;
}
.wy-menu-vertical li.on a:hover, .wy-menu-vertical li.current > a:hover {
......@@ -2806,10 +2807,9 @@ span[id*='MathJax-Span'] {
}
.wy-menu-vertical a {
display: inline-block;
line-height: 1.21em;
line-height: 0.93em;
padding: 0.4045em 1.31em;
display: block;
position: relative;
font-size: 96%;
font-family: Nunito, 'Nunito Sans', Helvetica, 'Helvetica Neue', sans;
}
......@@ -2830,6 +2830,8 @@ span[id*='MathJax-Span'] {
color: #555;
}
.wy-side-nav-search {
display: block;
width: 300px;
......
......@@ -9,7 +9,6 @@ Working with Backends
* :ref:`ngraph_bridge`
* :ref:`opencl`
.. _what_is_backend:
What is a backend?
......
.. features.rst
.. _features:
Features
========
What follows here are a few notable features of the nGraph Compiler stack.
.. as well as a brief illustration or demonstration of that feature.
* **Fusion** -- Fuse multiple ops to to decrease memory usage.
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph
translating element order to work best for a given or available device.
* **Data reuse** -- Save results and reuse for subgraphs with the same input.
* **Graph scheduling** -- Run similar subgraphs in parallel via multi-threading.
* **Graph partitioning** -- Partition subgraphs to run on different devices to
speed up computation; make better use of spare CPU cycles with nGraph.
* **Memory management** -- Prevent peak memory usage by intercepting a graph
with or by a "saved checkpoint," and to enable data auditing.
.. frameworks/generic-configs.rst:
.. frameworks/generic_configs.rst:
.. _generic_configs:
Integrating new frameworks
==========================
......@@ -20,7 +22,7 @@ something like:
Find or display version
=======================
-----------------------
If you're working with the :doc:`../python_api/index`, the following command
may be useful:
......@@ -35,7 +37,7 @@ documentation.
Activate logtrace-related environment variables
===============================================
-----------------------------------------------
Another configuration option is to activate ``NGRAPH_CPU_DEBUG_TRACER``,
a runtime environment variable that supports extra logging and debug detail.
......
.. frameworks/index.rst
Using Frameworks
================
Working with Frameworks
=======================
.. include:: overview.rst
.. toctree::
:maxdepth: 1
......@@ -10,3 +13,4 @@ Using Frameworks
onnx_integ.rst
paddle_integ.rst
tensorflow_connect.rst
generic_configs.rst
.. frameworks/overview.rst
.. _fw_overview:
Overview
========
A framework is "supported" with a framework :term:`bridge` that can be written or
cloned and used to connect to nGraph device backends while maintaining the
framework's programmatic or user interface. A `bridge currently exists`_ for the
TensorFlow framework. We also have a bridge to do :doc:`paddle_integ`. Intel
previously contributed work to an MXNet bridge; however, support for this
bridge is no longer active.
`ONNX`_ on its own is not a framework; however, it can be used with nGraph's
:doc:`../python_api/index` to import and execute ONNX models.
.. figure:: ../graphics/overview-framework-bridges.svg
:width: 960px
:alt: JiT compiling of a computation
:abbr:`Just-in-Time (JiT)` Compiling for computation. nGraph `Core`
components are colored in blue.
Once connected via the bridge, the framework can then run and train a deep
learning model with various workloads on various backends using nGraph Compiler
as an optimizing compiler available through the framework.
While a :abbr:`Deep Learning (DL)` :term:`framework` is ultimately meant for
end use by data scientists, or for deployment in cloud container environments,
nGraph Core ops and the nGraph C++ Library are designed for framework builders
themselves. We invite anyone working on new and novel frameworks or neural
network designs to explore our highly-modularized stack of components that
can be implemented or integrated in countless ways.
Please read this section if you are considering incorporating components from
the nGraph Compiler stack in your framework or neural network design. Contents
here are also useful if you are working on something built-from-scratch, or on
an existing framework that is less widely-supported than the popular frameworks
like TensorFlow and PyTorch.
.. figure:: ../graphics/overview-translation-flow.svg
:width: 725px
:alt: Translation flow to nGraph function graph
.. _bridge currently exists: https://github.com/tensorflow/ngraph-bridge/README.md
.. _ONNX: http://onnx.ai/
.. _tune the workload to extract best performance: https://ai.intel.com/accelerating-deep-learning-training-inference-system-level-optimizations
.. _a few small: https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi
\ No newline at end of file
.. frameworks/validated/list.rst:
#################################
Validated workloads by framework
#################################
.. _validated:
We validated performance [#f1]_ for the following TensorFlow\* and MXNet\*
workloads:
* :ref:`tensorflow_valid`
* :ref:`mxnet_valid`
* :ref:`onnx_valid`
* :ref:`testing_latency`
Validated workloads
###################
.. _tensorflow_valid:
We have validated performance [#f1]_ for the following workloads:
TensorFlow
==========
.. contents::
:local:
.. _cpu_tensorflow:
CPU Tensorflow
==============
.. csv-table::
:header: "TensorFlow Workload", "Genre of Deep Learning"
:header: "TensorFlow Workload", "Genre of Deep learning"
:widths: 27, 53
:escape: ~
......@@ -28,12 +27,11 @@ TensorFlow
Inception V4, Image recognition
Inception-ResNetv2, Image recognition
MobileNet v1, Image recognition
MobileNet v2, Image recognition
Faster RCNN, Object detection
VGG16, Image recognition
SSD-VGG16, Object detection
SSD-MobileNetv1, Object detection
R-FCN, Object detection
Faster RCNN, Object detection
Yolo v2, Object detection
Transformer-LT, Language translation
Wide & Deep, Recommender system
......@@ -44,40 +42,10 @@ TensorFlow
A3C, Reinforcement learning
.. _mxnet_valid:
MXNet
=====
.. csv-table::
:header: "MXNet Workload", "Genre of Deep Learning"
:widths: 27, 53
:escape: ~
Resnet50 v1, Image recognition
Resnet50 v2, Image recognition
DenseNet-121, Image recognition
InceptionV3, Image recognition
InceptionV4, Image recognition
Inception-ResNetv2, Image recognition
MobileNet v1, Image recognition
SqueezeNet v1 and v1.1, Image recognition
VGG16, Image recognition
Faster RCNN, Object detection
SSD-VGG16, Object detection
GNMT, Language translation
Transformer-LT, Language translation
Wide & Deep, Recommender system
WaveNet, Speech generation
DeepSpeech2, Speech recognition
DCGAN, Generative adversarial network
A3C, Reinforcement learning
.. _onnx_valid:
.. _cpu_onnx:
ONNX
====
CPU ONNX
========
Additionally, we validated the following workloads are functional through
`nGraph ONNX importer`_. ONNX models can be downloaded from the `ONNX Model Zoo`_.
......@@ -87,15 +55,14 @@ Additionally, we validated the following workloads are functional through
:widths: 27, 53
:escape: ~
ResNet-50, Image recognition
ResNet-50-v2, Image recognition
DenseNet-121, Image recognition
Inception-v1, Image recognition
Inception-v2, Image recognition
ResNet-50, Image recognition
Mobilenet, Image recognition
Shufflenet, Image recognition
SqueezeNet, Image recognition
VGG-19, Image recognition
VGG-16, Image recognition
ZFNet-512, Image recognition
MNIST, Image recognition
Emotion-FERPlus, Image recognition
......@@ -106,6 +73,40 @@ Additionally, we validated the following workloads are functional through
ArcFace, Face Detection and Recognition
.. _gpu_tensorflow:
GPU TensorFlow
==============
.. csv-table::
:header: "TensorFlow Workload", "Genre of Deep Learning"
:escape: ~
Resnet50 v2, Image recognition
Inception V3, Image recognition
Inception V4, Image recognition
Inception-ResNetv2, Image recognition
VGG-16, Image recognition
.. _gpu_onnx:
GPU ONNX
========
.. csv-table::
:header: "ONNX Workload", "Genre of Deep Learning"
:escape: ~
Inception V1, Image recognition
Inception V2, Image recognition
ResNet-50, Image recognition
SqueezeNet, Image recognition
.. important:: Please see Intel's `Optimization Notice`_ for details on disclaimers.
.. rubric:: Footnotes
......@@ -133,4 +134,3 @@ Additionally, we validated the following workloads are functional through
to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice.
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -21,7 +21,7 @@ nGraph Compiler stack
.. only:: release
nGraph Compiler stack documentation for version |version|.
nGraph Compiler stack documentation for version |version|.
Documentation for the latest (master) development branch can be found
at https://ngraph.nervanasys.com/docs/latest
......@@ -47,11 +47,16 @@ packages, scripts, and other files that use licensing.
.. toctree::
:maxdepth: 1
:caption: Getting Started
:titlesonly:
introduction.rst
features.rst
.. toctree::
:maxdepth: 1
:caption: Framework Support
frameworks/index.rst
frameworks/validated/list.rst
frameworks/generic-configs.rst
.. toctree::
......@@ -68,15 +73,9 @@ packages, scripts, and other files that use licensing.
.. toctree::
:maxdepth: 1
:caption: nGraph Python API
:caption: APIs
python_api/index.rst
.. toctree::
:maxdepth: 1
:caption: Backend Developers
backends/index.rst
backends/cpp-api.rst
......@@ -93,9 +92,7 @@ packages, scripts, and other files that use licensing.
:caption: Project Metadata
project/release-notes.rst
project/introduction.rst
project/contribution-guide.rst
project/doc-contributor-README.rst
project/index.rst
project/extras/index.rst
glossary.rst
......
.. project/introduction.rst:
.. introduction.rst:
.. _introduction:
Introduction
############
......@@ -47,7 +48,7 @@ However, kernel libraries have three main problems:
nGraph Compiler addresses the first two problems, and nGraph Compiler combined
with PlaidML addresses the third problem. nGraph applies graph-level
optimizations by taking the computational graph from a deep learning framework
such as TensorFlow\* and reconstructing it with nGraph's
such as TensorFlow and reconstructing it with nGraph's
:abbr: `IR (Intermediate Representation)`. nGraph IR centralizes computational
graphs from various frameworks and provides a unified way to connect backends
for targeted hardware. To address the third problem, nGraph is integrated with
......@@ -68,10 +69,13 @@ graph, but the choice of operations in the graph may not be optimal.
.. _figure-A:
.. figure:: ../graphics/kernel-problem-1.png
.. figure:: graphics/kernel-problem-1.png
:width: 100%
:alt:
**Figure A**: The mathematical operations in a Deep Learning stack can be
simplified significantly with a graph compiler
The computation is constructed to execute ``(A+B)*C``. With nGraph, we can
further optimize the graph to be represented as ``A*C``. From the first graph
......@@ -94,10 +98,12 @@ diagram.
.. _figure-B:
.. figure:: ../graphics/kernel-problem-2.png
.. figure:: graphics/kernel-problem-2.png
:width: 100%
:alt:
**Figure B**: A many-to-many problem
Each framework must be manually integrated with each hardware-specific kernel
library. Additionally, each integration is unique to the framework and its set
of deep learning operators, view on memory layout, feature set, etc. Each
......@@ -129,10 +135,12 @@ work for what will ultimately be a fragile setup that is costly to maintain.
.. _figure-C:
.. figure:: ../graphics/kernel-problem-3.png
.. figure:: graphics/kernel-problem-3.png
:width: 100%
:alt:
**Figure C**: Inevitable scaling problem
Integrating PlaidML with nGraph provides flexbility to support the latest deep
learning models in the absence of hand-optimized kernels for new operations.
......
......@@ -195,7 +195,7 @@ Backprop
C++ Interface
=============
.. doxygenclass:: ngraph::op::AvgPool
.. doxygenclass:: ngraph::op::v0::AvgPool
:project: ngraph
:members:
......@@ -17,7 +17,7 @@ Description
C++ Interface
=============
.. doxygenclass:: ngraph::op::AvgPoolBackprop
.. doxygenclass:: ngraph::op::v0::AvgPoolBackprop
:project: ngraph
:members:
......
......@@ -138,7 +138,7 @@ Batched, Padded, Dilated, Strided Convolution
C++ Interface
=============
.. doxygenclass:: ngraph::op::Convolution
.. doxygenclass:: ngraph::op::v0::Convolution
:project: ngraph
:members:
......@@ -78,6 +78,6 @@ Given an input data batch tensor :math:`T_{in}`, the output tensor is defined by
C++ Interface
=============
.. doxygenclass:: ngraph::op::MaxPool
.. doxygenclass:: ngraph::op::v0::MaxPool
:project: ngraph
:members:
.. about:
Architecture, Features, FAQs
############################
* :ref:`architecture`
* :ref:`features`
* :ref:`faq`
* :ref:`whats_next`
.. _architecture:
nGraph Compiler stack architecture
==================================
The diagram below represents our current Beta release stack. In the
diagram, nGraph components are colored in gray. Please note that the
stack diagram is simplified to show how nGraph executes deep learning
workloads with two hardware backends; however, many other deep learning
frameworks and backends currently are functioning.
.. figure:: ../graphics/stackngrknl-notice.png
:alt:
Bridge
------
Starting from the top of the stack, nGraph receives a computational
graph from a deep learning framework such as TensorFlow\* or MXNet\*.
The computational graph is converted to an nGraph internal
representation by a bridge created for the corresponding framework.
An nGraph bridge examines the whole graph to pattern match subgraphs
which nGraph knows how to execute, and these subgraphs are encapsulated.
Parts of the graph that are not encapsulated will default to framework
implementation when executed.
nGraph Core
-----------
nGraph uses a strongly-typed and platform-neutral
``Intermediate Representation (IR)`` to construct a "stateless"
computational graph. Each node, or op, in the graph corresponds to one
``step`` in a computation, where each step produces zero or more tensor
outputs from zero or more tensor inputs.
This allows nGraph to apply its state of the art optimizations instead
of having to follow how a particular framework implements op execution,
memory management, data layouts, etc.
In addition, using nGraph IR allows faster optimization delivery for
many of the supported frameworks. For example, if nGraph optimizes
ResNet\* for TensorFlow\ *, the same optimization can be readily applied
to MXNet* or ONNX\* implementations of ResNet\*.
Hybrid Transformer
------------------
Hybrid transformer takes the nGraph IR, and partitions it into
subgraphs, which can then be assigned to the best-performing backend.
There are two hardware backends shown in the stack diagram to
demonstrate this graph partitioning. The Hybrid transformer assigns
complex operations (subgraphs) to Intel® Nervana™ Neural Network
Processor (NNP) to expedite the computation, and the remaining
operations default to CPU. In the future, we will further expand the
capabilities of Hybrid transformer by enabling more features, such as
localized cost modeling and memory sharing.
Once the subgraphs are assigned, the corresponding backend will execute
the IR.
Backends
--------
Focusing our attention on the CPU backend, when the IR is passed to the
Intel® Architecture (IA) transformer, it can be executed in two modes:
Direct EXecution (DEX) and code generation (``codegen``).
In ``codegen`` mode, nGraph generates and compiles code which can either
call into highly optimized kernels like MKL-DNN or JITers like Halide.
Although our team wrote kernels for nGraph for some operations, nGraph
leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.
MLSL library is called when nGraph executes distributed training. At the
time of the nGraph Beta release, nGraph achieved state of the art
results for ResNet50 with 16 nodes and 32 nodes for TensorFlow\* and
MXNet\*. We are excited to continue our work in enabling distributed
training, and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.
The other mode of execution is Direct EXecution (DEX). In DEX mode,
nGraph can execute the operations by directly calling associated kernels
as it walks though the IR instead of compiling via ``codegen``. This
mode reduces the compilation time, and it will be useful for training,
deploying, and retraining a deep learning workload in production. In our
tests, DEX mode reduced ResNet50 compilation time by 30X.
nGraph further tries to speed up the computation by leveraging
multi-threading and graph scheduling libraries such as OpenMP and TBB
Flow Graph.
.. _features:
Features
########
nGraph performs a combination of device-specific and non-device-specific
optimizations:
- **Fusion** -- Fuse multiple ops to to decrease memory usage.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for a given or
available device.
- **Data reuse** -- Save results and reuse for subgraphs with the same
input.
- **Graph scheduling** -- Run similar subgraphs in parallel via
multi-threading.
- **Graph partitioning** -- Partition subgraphs to run on different
devices to speed up computation; make better use of spare CPU cycles
with nGraph.
- **Memory management** -- Prevent peak memory usage by intercepting a
graph with or by a "saved checkpoint," and to enable data auditing.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for whatever given
or available device.
.. _faq:
FAQs
####
Why nGraph?
===========
The value we're offering to the developer community is empowerment: we are
confident that Intel® Architecture already provides the best computational
resources available for the breadth of ML/DL tasks.
How does it work?
=================
The :doc:`nGraph Core <../ops/index>` uses a **strongly-typed** and
**platform-neutral** :abbr:`Intermediate Representation (IR)` to construct a
"stateless" graph. Each node, or *op*, in the graph corresponds to one
:term:`step` in a computation, where each step produces zero or more tensor
outputs from zero or more tensor inputs.
How do I connect a framework?
=============================
The nGraph Library manages framework bridges for some of the more widely-known
frameworks. A bridge acts as an intermediary between the nGraph core and the
framework, and the result is a function that can be compiled from a framework.
A fully-compiled function that makes use of bridge code thus becomes a "function
graph", or what we sometimes call an **nGraph graph**.
.. important:: See :doc:`../ops/index` to learn about Core Ops.
Our design philosophy is that the graph is not a script for running kernels;
rather, our compilation will match ``ops`` to appropriate available kernels
(or when available, such as with CPU cycles). Thus, we expect that adding of
new Core ops should be infrequent and that most functionality instead gets
added with new functions that build sub-graphs from existing core ops.
For a more detailed dive into how custom bridge code can be implemented, see our
documentation on how to :doc:`../core/constructing-graphs/execute`. To learn how TensorFlow and
MXNet currently make use of custom bridge code, see the section on
:doc:`../frameworks/index`.
.. figure:: ../graphics/bridge-to-graph-compiler.png
:width: 733px
:alt: Compiling a computation
JiT Compiling for computation
How do I run an inference model?
================================
Framework bridge code is *not* the only way to connect a model (function graph)
to nGraph's :doc:`../ops/index`. We've also built an importer for models that
have been exported from a framework and saved as serialized file, such as ONNX.
To learn how to convert such serialized files to an nGraph model, please see
the :doc:`../core/constructing-graphs/import` documentation.
.. _whats_next:
What's next?
############
We developed nGraph to simplify the realization of optimized deep learning
performance across frameworks and hardware platforms. You can read more about
design decisions and what is tentatively in the pipeline for development in
our `arXiv paper`_ from the 2018 SysML conference.
.. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
.. _ONNX: http://onnx.ai
.. _NNVM: https://github.com/dmlc/nnvm
.. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
.. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
.. _Movidius: https://developer.movidius.com/
.. project/contribution-guide.rst:
.. contribution_guide:
.. _contribution_guide:
##################
Contribution guide
##################
.. contents::
License
=======
......
......@@ -15,6 +15,8 @@
.. limitations under the License.
.. ---------------------------------------------------------------------------
:orphan:
Contributing to documentation
=============================
......
......@@ -10,7 +10,5 @@ This section contains documentation about the project and how to contribute.
.. toctree::
:maxdepth: 2
about.rst
contribution-guide.rst
governance.rst
doc-contributor-README.rst
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment