about.rst 7.9 KB
Newer Older
L.S. Cook's avatar
L.S. Cook committed
1
.. about: 
L.S. Cook's avatar
L.S. Cook committed
2

3 4
Architecture, Features, FAQs
############################
5

6
* :ref:`architecture`
7 8 9
* :ref:`features`
* :ref:`faq`
* :ref:`whats_next`
10 11


12 13 14 15 16
.. _architecture:

nGraph Compiler stack architecture
==================================

L.S. Cook's avatar
L.S. Cook committed
17 18 19 20 21
The diagram below represents our current Beta release stack. In the
diagram, nGraph components are colored in gray. Please note that the
stack diagram is simplified to show how nGraph executes deep learning
workloads with two hardware backends; however, many other deep learning
frameworks and backends currently are functioning.
22

harryskim's avatar
harryskim committed
23
.. figure:: ../graphics/stackngrknl-notice.png
L.S. Cook's avatar
L.S. Cook committed
24 25 26
   :alt: 

Bridge
L.S. Cook's avatar
L.S. Cook committed
27
------
L.S. Cook's avatar
L.S. Cook committed
28 29 30 31 32 33 34 35 36 37 38 39

Starting from the top of the stack, nGraph receives a computational
graph from a deep learning framework such as TensorFlow\* or MXNet\*.
The computational graph is converted to an nGraph internal
representation by a bridge created for the corresponding framework.

An nGraph bridge examines the whole graph to pattern match subgraphs
which nGraph knows how to execute, and these subgraphs are encapsulated.
Parts of the graph that are not encapsulated will default to framework
implementation when executed.

nGraph Core
L.S. Cook's avatar
L.S. Cook committed
40
-----------
L.S. Cook's avatar
L.S. Cook committed
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57

nGraph uses a strongly-typed and platform-neutral
``Intermediate Representation (IR)`` to construct a "stateless"
computational graph. Each node, or op, in the graph corresponds to one
``step`` in a computation, where each step produces zero or more tensor
outputs from zero or more tensor inputs.

This allows nGraph to apply its state of the art optimizations instead
of having to follow how a particular framework implements op execution,
memory management, data layouts, etc.

In addition, using nGraph IR allows faster optimization delivery for
many of the supported frameworks. For example, if nGraph optimizes
ResNet\* for TensorFlow\ *, the same optimization can be readily applied
to MXNet* or ONNX\* implementations of ResNet\*.

Hybrid Transformer
L.S. Cook's avatar
L.S. Cook committed
58
------------------
L.S. Cook's avatar
L.S. Cook committed
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

Hybrid transformer takes the nGraph IR, and partitions it into
subgraphs, which can then be assigned to the best-performing backend.
There are two hardware backends shown in the stack diagram to
demonstrate this graph partitioning. The Hybrid transformer assigns
complex operations (subgraphs) to Intel® Nervana™ Neural Network
Processor (NNP) to expedite the computation, and the remaining
operations default to CPU. In the future, we will further expand the
capabilities of Hybrid transformer by enabling more features, such as
localized cost modeling and memory sharing.

Once the subgraphs are assigned, the corresponding backend will execute
the IR.

Backends
L.S. Cook's avatar
L.S. Cook committed
74
--------
L.S. Cook's avatar
L.S. Cook committed
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101

Focusing our attention on the CPU backend, when the IR is passed to the
Intel® Architecture (IA) transformer, it can be executed in two modes:
Direct EXecution (DEX) and code generation (``codegen``).

In ``codegen`` mode, nGraph generates and compiles code which can either
call into highly optimized kernels like MKL-DNN or JITers like Halide.
Although our team wrote kernels for nGraph for some operations, nGraph
leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.

MLSL library is called when nGraph executes distributed training. At the
time of the nGraph Beta release, nGraph achieved state of the art
results for ResNet50 with 16 nodes and 32 nodes for TensorFlow\* and
MXNet\*. We are excited to continue our work in enabling distributed
training, and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.

The other mode of execution is Direct EXecution (DEX). In DEX mode,
nGraph can execute the operations by directly calling associated kernels
as it walks though the IR instead of compiling via ``codegen``. This
mode reduces the compilation time, and it will be useful for training,
deploying, and retraining a deep learning workload in production. In our
tests, DEX mode reduced ResNet50 compilation time by 30X.

nGraph further tries to speed up the computation by leveraging
multi-threading and graph scheduling libraries such as OpenMP and TBB
Flow Graph.
102

103
.. _features:
104 105

Features
L.S. Cook's avatar
L.S. Cook committed
106
########
107

L.S. Cook's avatar
L.S. Cook committed
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
nGraph performs a combination of device-specific and non-device-specific
optimizations:

-  **Fusion** -- Fuse multiple ops to to decrease memory usage.
-  **Data layout abstraction** -- Make abstraction easier and faster
   with nGraph translating element order to work best for a given or
   available device.
-  **Data reuse** -- Save results and reuse for subgraphs with the same
   input.
-  **Graph scheduling** -- Run similar subgraphs in parallel via
   multi-threading.
-  **Graph partitioning** -- Partition subgraphs to run on different
   devices to speed up computation; make better use of spare CPU cycles
   with nGraph.
-  **Memory management** -- Prevent peak memory usage by intercepting a
   graph with or by a "saved checkpoint," and to enable data auditing.
-  **Data layout abstraction** -- Make abstraction easier and faster
   with nGraph translating element order to work best for whatever given
   or available device.

128 129 130
.. _faq:

FAQs
L.S. Cook's avatar
L.S. Cook committed
131
####
132 133

Why nGraph? 
L.S. Cook's avatar
L.S. Cook committed
134
===========
135 136 137 138

The value we're offering to the developer community is empowerment: we are 
confident that Intel® Architecture already provides the best computational 
resources available for the breadth of ML/DL tasks. 
139 140

How does it work?
L.S. Cook's avatar
L.S. Cook committed
141
=================
142

143 144 145 146 147 148 149 150
The :doc:`nGraph Core <../ops/index>` uses a **strongly-typed** and 
**platform-neutral** :abbr:`Intermediate Representation (IR)` to construct a 
"stateless" graph. Each node, or *op*, in the graph corresponds to one 
:term:`step` in a computation, where each step produces zero or more tensor 
outputs from zero or more tensor inputs. 


How do I connect a framework? 
L.S. Cook's avatar
L.S. Cook committed
151
=============================
152 153 154 155 156 157

The nGraph Library manages framework bridges for some of the more widely-known 
frameworks. A bridge acts as an intermediary between the nGraph core and the 
framework, and the result is a function that can be compiled from a framework. 
A fully-compiled function that makes use of bridge code thus becomes a "function
graph", or what we sometimes call an **nGraph graph**.  
158

L.S. Cook's avatar
L.S. Cook committed
159 160 161 162 163 164 165
.. important:: See :doc:`../ops/index` to learn about Core Ops.

Our design philosophy is that the graph is not a script for running kernels; 
rather, our compilation will match ``ops`` to appropriate available kernels
(or when available, such as with CPU cycles). Thus, we expect that adding of 
new Core ops should be infrequent and that most functionality instead gets 
added with new functions that build sub-graphs from existing core ops.   
166

167
For a more detailed dive into how custom bridge code can be implemented, see our 
168
documentation on how to :doc:`../core/constructing-graphs/execute`. To learn how TensorFlow and 
169
MXNet currently make use of custom bridge code, see the section on 
170
:doc:`../frameworks/index`.
171

172 173 174
.. figure:: ../graphics/bridge-to-graph-compiler.png
    :width: 733px
    :alt: Compiling a computation
175

176
    JiT Compiling for computation
177

178 179

How do I run an inference model?
L.S. Cook's avatar
L.S. Cook committed
180
================================
181 182 183 184 185

Framework bridge code is *not* the only way to connect a model (function graph) 
to nGraph's :doc:`../ops/index`. We've also built an importer for models that 
have been exported from a framework and saved as serialized file, such as ONNX. 
To learn how to convert such serialized files to an nGraph model, please see 
186
the :doc:`../core/constructing-graphs/import` documentation.  
187 188


189 190
.. _whats_next:

191
What's next?
L.S. Cook's avatar
L.S. Cook committed
192
############
L.S. Cook's avatar
L.S. Cook committed
193
  
194 195 196 197 198 199 200
We developed nGraph to simplify the realization of optimized deep learning 
performance across frameworks and hardware platforms. You can read more about 
design decisions and what is tentatively in the pipeline for development in 
our `arXiv paper`_ from the 2018 SysML conference.


.. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
201
.. _ONNX: http://onnx.ai
L.S. Cook's avatar
L.S. Cook committed
202
.. _NNVM: https://github.com/dmlc/nnvm
203
.. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
204
.. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
205
.. _Movidius: https://developer.movidius.com/
206