Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
N
ngraph
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
ngraph
Commits
e5a69122
Unverified
Commit
e5a69122
authored
Nov 25, 2018
by
Michał Karzyński
Committed by
GitHub
Nov 25, 2018
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Update ABOUT.md (#2107)
parent
79878d71
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
98 additions
and
79 deletions
+98
-79
ABOUT.md
ABOUT.md
+98
-79
full-ngstck.png
doc/sphinx/source/graphics/full-ngstck.png
+0
-0
stackngrknl.png
doc/sphinx/source/graphics/stackngrknl.png
+0
-0
No files found.
ABOUT.md
View file @
e5a69122
...
@@ -4,99 +4,118 @@ About nGraph Compiler stack
...
@@ -4,99 +4,118 @@ About nGraph Compiler stack
nGraph Compiler stack architecture
nGraph Compiler stack architecture
----------------------------------
----------------------------------
The diagram below represents our current Beta release stack. Please note
The diagram below represents our current Beta release stack.
that the stack diagram is simplified to show how nGraph executes deep
In the diagram, nGraph components are colored in gray. Please note
learning workloads with two hardware backends; however, many other
that the stack diagram is simplified to show how nGraph executes deep
learning workloads with two hardware backends; however, many other
deep learning frameworks and backends currently are functioning.
deep learning frameworks and backends currently are functioning.
![](
doc/sphinx/source/graphics/stackngrknl.png
)
![](
doc/sphinx/source/graphics/stackngrknl.png
)
Starting from the top of the diagram, we present a simplified view of
#### Bridge
the nGraph Intermediate Representation (IR). The nGraph IR is a format
which works with a framework such as TensorFlow
\*
or MXNet
\*
when there
Starting from the top of the stack, nGraph receives a computational graph
is a corresponding "Bridge" or import method, such as from NNVM or via
from a deep learning framework such as TensorFlow
* or MXNet*
. The
[
ONNX
](
http://onnx.ai
)
. Once the nGraph IR can begin using nGraph's
computational graph is converted to an nGraph internal representation
Core ops, components lower in the stack can begin parsing and
by a bridge created for the corresponding framework.
pattern-matching subgraphs for device-specific optimizations; these
are then encapsulated. This encapsulation is represented on the diagram
An nGraph bridge examines the whole graph to pattern match subgraphs
as the colored background between the
`ngraph`
kernel(s) and the the
which nGraph knows how to execute, and these subgraphs are encapsulated.
stack above.
Parts of the graph that are not encapsulated will default to framework
implementation when executed.
Note that everything at or below the "Kernel APIs" and "Subgraph
APIs" gets executed "automatically" during training runs. In other
#### nGraph Core
words, the accelerations are automatic: parts of the graph that
are not encapsulated default to framework implementation when
nGraph uses a strongly-typed and platform-neutral
executed. For example, if nGraph optimizes ResNet50 for TensorFlow,
`Intermediate Representation (IR)`
to construct a "stateless"
the same optimization can be readily applied to the NNVM/MXNet
computational graph. Each node, or op, in the graph corresponds to
implementation of ResNet50. This works efficiently because the
one
`step`
in a computation, where each step produces zero or
nGraph (IR) Intermediate Representation, which keeps the input
more tensor outputs from zero or more tensor inputs.
and output semantics of encapsulated subgraphs, rebuilds an
encapsulated subgraph that can efficiently make use or re-use
This allows nGraph to apply its state of the art optimizations instead
of operations. Such an approach significantly cuts down on the
of having to follow how a particular framework implements op execution,
time needed to compile; when we're not relying upon the framework's
memory management, data layouts, etc.
ops alone, memory management and data layouts can be more efficiently
applied to the hardware backends in use.
In addition, using nGraph IR allows faster optimization delivery
for many of the supported frameworks. For example, if nGraph optimizes
The nGraph Core uses a strongly-typed and platform-neutral (IR)
ResNet
* for TensorFlow*
, the same optimization can be readily applied
Intermediate Representation to construct a "stateless" graph.
to MXNet
* or ONNX*
implementations of ResNet
*
.
Each node, or
`op`
, in the graph corresponds to one step in
a computation, where each step produces zero or more tensor
#### Hybrid Transformer
outputs from zero or more tensor inputs.
Hybrid transformer takes the nGraph IR, and partitions it into
After construction, our Hybrid transformer takes the IR, further
subgraphs, which can then be assigned to the best-performing backend.
partitions it into subgraphs, and assigns them to the best-performing
There are two hardware backends shown in the stack diagram to demonstrate
backend. There are two hardware backends shown in the stack diagram
this graph partitioning. The Hybrid transformer assigns complex operations
to demonstrate nGraph's graph partitioning. The Hybrid transformer
(subgraphs) to Intel® Nervana™ Neural Network Processor (NNP) to expedite the
assigns complex operations (subgraphs) to the Intel® Nervana™ Neural
computation, and the remaining operations default to CPU. In the future,
Network Processor (NNP), or to a different CPU backend to expedite
we will further expand the capabilities of Hybrid transformer
the computation, and the remaining operations default to CPU. In the
by enabling more features, such as localized cost modeling and memory
future, we will further expand the capabilities of Hybrid transformer
sharing.
by enabling more features, such as localized cost modeling and memory
sharing, when the next generation of NNP (Neural Network Processor)
Once the subgraphs are assigned, the corresponding backend will
is released. In the meantime, your deep learning software engineering
execute the IR.
or modeling can be confidently built upon this stable anchorage.
The Intel® Architecture IA (Intel® Architecture) transformer provides
#### Backends
two modes that reduce compilation time, and have already been shown
as useful for training, deploying, and retraining a deep learning
Focusing our attention on the CPU backend, when the IR is passed to
workload in production. For example, in our tests, DEX mode reduced
the Intel® Architecture (IA) transformer, it can be executed in two modes:
ResNet50 compilation time by 30X.
Direct EXecution (DEX) and code generation (
`codegen`
).
We are excited to continue our work in enabling distributed training,
In
`codegen`
mode, nGraph generates and compiles code which can
and we plan to expand the nodes to 256 in Q4 ‘18. Additionally, we
either call into highly optimized kernels like MKL-DNN or JITers like Halide.
Although our team wrote kernels for nGraph for some operations,
nGraph leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.
MLSL library is called when nGraph executes distributed training.
At the time of the nGraph Beta release, nGraph achieved state of the art
results for ResNet50 with 16 nodes and 32 nodes for TensorFlow
* and MXNet*
.
We are excited to continue our work in enabling distributed training,
and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.
are testing model parallelism in addition to data parallelism.
In this Beta release, nGraph via Bridge code supports only Just In
The other mode of execution is Direct EXecution (DEX). In DEX mode,
Time (JiT) compilation; the ONNX importer does not support anything
nGraph can execute the operations by directly calling associated kernels
that nGraph cannot support. While nGraph currently has very limited
as it walks though the IR instead of compiling via
`codegen`
. This mode
support for dynamic graphs, it is possible to get dynamic graphs
reduces the compilation time, and it will be useful for training,
working. Future releases will add better support and use case
deploying, and retraining a deep learning workload in production.
examples for such things as Ahead of Time compilation.
In our tests, DEX mode reduced ResNet50 compilation time by 30X.
nGraph further tries to speed up the computation by leveraging
multi-threading and graph scheduling libraries such as OpenMP and
TBB Flow Graph.
Features
Features
--------
--------
The nGraph (IR) Intermediate Representation contains a combination
nGraph performs a combination of device-specific and
of device-specific and non-device-specific optimization
:
non-device-specific optimizations
:
-
**Fusion**
-- Fuse multiple ops to to decrease memory usage.
-
**Fusion**
-- Fuse multiple ops to to decrease memory usage.
-
**Data layout abstraction**
-- Make abstraction easier and faster
-
**Data layout abstraction**
-- Make abstraction easier and faster
with nGraph translating element order to work best for a given or
with nGraph translating element order to work best for a given or
available device.
available device.
-
**Data reuse**
-- Save results and reuse for subgraphs with the
-
**Data reuse**
-- Save results and reuse for subgraphs with the
same input.
same input.
-
**Graph scheduling**
-- Run similar subgraphs in parallel via
-
**Graph scheduling**
-- Run similar subgraphs in parallel via
multi-threading.
multi-threading.
-
**Graph partitioning**
-- Partition subgraphs to run on different
-
**Graph partitioning**
-- Partition subgraphs to run on different
devices to speed up computation; make better use of spare CPU cycles
devices to speed up computation; make better use of spare CPU cycles
with nGraph.
with nGraph.
-
**Memory management**
-- Prevent peak memory usage by intercepting
-
**Memory management**
-- Prevent peak memory usage by intercepting
a graph with or by a "saved checkpoint," and to enable data auditing.
a graph with or by a "saved checkpoint," and to enable data auditing.
-
**Data layout abstraction**
-- Make abstraction easier and faster
-
**Data layout abstraction**
-- Make abstraction easier and faster
with nGraph translating element order to work best for whatever given
with nGraph translating element order to work best for whatever given
or available device.
or available device.
Beta Limitations
----------------
In this Beta release, nGraph only supports Just In Time compilation,
but we plan to add support for Ahead of Time compilation in the official
release of nGraph. nGraph currently has limited support for dynamic graphs.
Current nGraph Compiler full stack
Current nGraph Compiler full stack
----------------------------------
----------------------------------
...
@@ -105,9 +124,9 @@ Current nGraph Compiler full stack
...
@@ -105,9 +124,9 @@ Current nGraph Compiler full stack
In addition to IA and NNP transformers, nGraph Compiler stack has transformers
In addition to IA and NNP transformers, nGraph Compiler stack has transformers
for multiple GPU types and an upcoming Intel deep learning accelerator. To
for multiple GPU types and an upcoming Intel deep learning accelerator. To
support the growing number of transformers, we plan to expand the capabilities
support the growing number of transformers, we plan to expand the capabilities
of the hybrid transformer with a cost model and memory sharing. With these new
of the hybrid transformer with a cost model and memory sharing. With these new
features, even if nGraph has multiple backends targeting the same hardware, it
features, even if nGraph has multiple backends targeting the same hardware, it
will partition the graph into multiple subgraphs and determine the best way to
will partition the graph into multiple subgraphs and determine the best way to
execute each subgraph.
execute each subgraph.
doc/sphinx/source/graphics/full-ngstck.png
View replaced file @
79878d71
View file @
e5a69122
193 KB
|
W:
|
H:
333 KB
|
W:
|
H:
2-up
Swipe
Onion skin
doc/sphinx/source/graphics/stackngrknl.png
View replaced file @
79878d71
View file @
e5a69122
205 KB
|
W:
|
H:
268 KB
|
W:
|
H:
2-up
Swipe
Onion skin
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment