Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
N
ngraph
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
ngraph
Commits
e10fa7d2
Commit
e10fa7d2
authored
Nov 28, 2018
by
Adam Procter
Browse files
Options
Browse Files
Download
Plain Diff
Merge remote-tracking branch 'origin/master' into r0.10
parents
f8a0f784
ca4437bb
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
165 additions
and
184 deletions
+165
-184
external_mkldnn.cmake
cmake/external_mkldnn.cmake
+2
-0
branding-notice.rst
doc/sphinx/source/branding-notice.rst
+36
-15
conf.py
doc/sphinx/source/conf.py
+1
-1
index.rst
doc/sphinx/source/index.rst
+8
-6
about.rst
doc/sphinx/source/project/about.rst
+118
-162
No files found.
cmake/external_mkldnn.cmake
View file @
e10fa7d2
...
...
@@ -114,6 +114,7 @@ if(${CMAKE_VERSION} VERSION_LESS 3.2)
-DCMAKE_CXX_COMPILER=
${
CMAKE_CXX_COMPILER
}
-DCMAKE_INSTALL_PREFIX=
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn
-DMKLROOT=
${
MKL_ROOT
}
"-DARCH_OPT_FLAGS=-march=
${
NGRAPH_TARGET_ARCH
}
-mtune=
${
NGRAPH_TARGET_ARCH
}
"
TMP_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/tmp"
STAMP_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/stamp"
DOWNLOAD_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/download"
...
...
@@ -145,6 +146,7 @@ else()
-DCMAKE_CXX_COMPILER=
${
CMAKE_CXX_COMPILER
}
-DCMAKE_INSTALL_PREFIX=
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn
-DMKLROOT=
${
MKL_ROOT
}
"-DARCH_OPT_FLAGS=-march=
${
NGRAPH_TARGET_ARCH
}
-mtune=
${
NGRAPH_TARGET_ARCH
}
"
TMP_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/tmp"
STAMP_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/stamp"
DOWNLOAD_DIR
"
${
EXTERNAL_PROJECTS_ROOT
}
/mkldnn/download"
...
...
doc/sphinx/source/branding-notice.rst
View file @
e10fa7d2
...
...
@@ -6,19 +6,22 @@
Branding Notice
===============
The Intel® nGraph™ library is an open source project providing code and component
reference for many kinds of machine learning, deep learning, and DNN applications.
The Intel® nGraph Library and Compiler stack is an open source project providing
code and component reference for many kinds of machine learning, deep learning,
and DNN applications.
Documentation may include references to frontend frameworks, modules, extension
s,
or other libraries that may be wholly or partially open source, or that may be
claimed as the property of others.
Our documentation may include references to various frontends / framework
s,
modules, extensions, or other libraries that may be wholly or partially open
source, or that may be claimed as the property of others.
Intel, the Intel logo and Intel Nervana are trademarks of Intel Corporation or
its subsidiaries in the U.S. and/or other countries.
Intel nGraph library core documentation
---------------------
------------------
Documentation notice
---------------------
.. note:: The branding notice below applies to code and documentation
contributions intended to be added directly to
Intel nGraph library core
.
contributions intended to be added directly to
the Intel nGraph repo
.
Use the first or most prominent usage with symbols as described below.
...
...
@@ -39,14 +42,15 @@ repeated use of the trademark / branding symbols.
* Intel® nGraph™
* Intel® nGraph
™ l
ibrary
* Intel® nGraph
L
ibrary
* nGraph library
* ``ngraph`` API
* ``ngraph`` library
* ``ngraph`` backend
* nGraph abstraction layer
* neon™ frontend framework
* Intel® nGraph Compiler
* Intel® nGraph Backend
* Intel® nGraph API
* Movidius™ Myriad™
* Intel® Math Kernel Library
...
...
@@ -59,3 +63,20 @@ repeated use of the trademark / branding symbols.
* Intel® Nervana™ Graph (deprecated)
Optimization Notices
====================
Software and workloads used in performance tests may have been optimized for
performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including
the performance of that product when combined with other products. For more
complete information visit http://www.intel.com/benchmarks.
Intel technologies' features and benefits depend on system configuration and may
require enabled hardware, software or service activation. Performance varies
depending on system configuration. No computer system can be absolutely secure.
Check with your system manufacturer or retailer or learn more at intel.com.
doc/sphinx/source/conf.py
View file @
e10fa7d2
...
...
@@ -62,7 +62,7 @@ source_suffix = '.rst'
master_doc
=
'index'
# General information about the project.
project
=
u'
Intel® nGraph Library
'
project
=
u'
nGraph Compiler stack
'
copyright
=
'2018, Intel Corporation'
author
=
'Intel Corporation'
...
...
doc/sphinx/source/index.rst
View file @
e10fa7d2
...
...
@@ -28,12 +28,13 @@ See the latest :doc:`project/release-notes`.
:width: 599px
nGraph is an open-source C++ library, compiler
, and runtime accelerator f
or
software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph
nGraph is an open-source C++ library, compiler
stack, and runtime accelerat
or
for
software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph
simplifies development and makes it possible to design, write, compile, and
deploy :abbr:`Deep Neural Network (DNN)`-based solutions. A more detailed
explanation of the feature set of nGraph Compiler, as well as a high-level
overview, can be found on our project :doc:`project/about`.
deploy :abbr:`Deep Neural Network (DNN)`-based solutions that can be adapted and
deployed across many frameworks and backends. A more detailed explanation, as
well as a high-level overview, can be found on our project :doc:`project/about`.
For more generalized discussion on the ecosystem, see the `ecosystem`_ document.
.. _quickstart:
...
...
@@ -89,7 +90,7 @@ We have many documentation pages to help you get started.
Intel Movidius™ Myriad™ 2 (VPU), Coming soon, Yes
.. note:: The
Library code
is under active development as we're continually
.. note:: The
code in this repo
is under active development as we're continually
adding support for more kinds of DL models and ops, compiler optimizations,
and backend optimizations.
...
...
@@ -131,3 +132,4 @@ Indices and tables
.. _contributions: https://github.com/NervanaSystems/ngraph#how-to-contribute
.. _TensorFlow bridge to nGraph: https://github.com/NervanaSystems/ngraph-tf/blob/master/README.md
.. _Compiling MXNet with nGraph: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/README.md
.. _ecosystem: https://github.com/NervanaSystems/ngraph/blob/master/ecosystem-overview.md
doc/sphinx/source/project/about.rst
View file @
e10fa7d2
.
.
about
:
. about:
Architecture, Features, FAQs
...
...
@@ -15,154 +15,124 @@ Architecture, Features, FAQs
nGraph Compiler stack architecture
==================================
The
diagram
below
represents
our
current
|
release
|
release
stack
.
Please
note
that
the
stack
diagram
is
simplified
to
show
how
nGraph
executes
deep
learning
workloads
with
two
hardware
backends
;
however
,
many
other
deep
learning
frameworks
and
backends
currently
are
functioning
.
The diagram below represents our current Beta release stack. In the
diagram, nGraph components are colored in gray. Please note that the
stack diagram is simplified to show how nGraph executes deep learning
workloads with two hardware backends; however, many other deep learning
frameworks and backends currently are functioning.
.. figure:: ../graphics/stackngrknl.png
:
width
:
455
px
:
alt
:
Current
Beta
release
stack
Simplified
stack
diagram
for
nGraph
Compiler
and
components
Beta
Starting
from
the
top
of
the
diagram
,
we
present
a
simplified
view
of
the
nGraph
Intermediate
Representation
(
IR
).
The
nGraph
IR
is
a
format
which
works
with
a
framework
such
as
TensorFlow
*
or
MXNet
*
when
there
is
a
corresponding
"Bridge"
or
import
method
,
such
as
from
NNVM
or
via
`
ONNX
`
_
.
Once
the
nGraph
IR
can
begin
using
nGraph
's Core ops, components lower in the stack can begin parsing and
pattern-matching subgraphs for device-specific optimizations; these are then
encapsulated. This encapsulation is represented on the diagram as the colored
background between the ``ngraph`` kernel(s) and the the stack above.
Note that everything at or below the **Kernel APIs** and **Subgraph APIs** gets
executed "automatically" during training runs. In other words, the accelerations
are automatic: parts of the graph that are not encapsulated default to framework
implementation when executed. For example, if nGraph optimizes ResNet50 for
TensorFlow, the same optimization can be readily applied to the NNVM/MXNet
implementation of ResNet50. This works efficiently because the nGraph
:abbr:`(IR) Intermediate Representation`, which keeps the input and output
semantics of encapsulated subgraphs, rebuilds an encapsulated subgraph that can
efficiently make use or re-use of operations. Such an approach significantly
cuts down on the time needed to compile; when we'
re
not
relying
upon
the
framework
's ops alone, memory management and data layouts can be more efficiently
applied to the hardware backends in use.
The :doc:`nGraph Core <../ops/index>` uses a strongly-typed and platform-neutral
:abbr:`(IR) Intermediate Representation` to construct a "stateless" graph. Each
node, or ``op``, in the graph corresponds to one :term:`step` in a computation,
where each step produces zero or more tensor outputs from zero or more tensor
inputs.
After construction, our Hybrid transformer takes the IR, further partitions it
into subgraphs, and assigns them to the best-performing backend. There are two
hardware backends shown in the stack diagram to demonstrate nGraph'
s
graph
partitioning
.
The
Hybrid
transformer
assigns
complex
operations
(
subgraphs
)
to
the
Intel
®
Nervana
™
:
abbr
:`
Neural
Network
Processor
(
NNP
)`,
or
to
a
different
CPU
backend
to
expedite
the
computation
,
and
the
remaining
operations
default
to
CPU
.
In
the
future
,
we
will
further
expand
the
capabilities
of
Hybrid
transformer
by
enabling
more
features
,
such
as
localized
cost
modeling
and
memory
sharing
,
when
the
next
generation
of
:
abbr
:`
NNP
(
Neural
Network
Processor
)`
is
released
.
In
the
meantime
,
your
deep
learning
software
engineering
or
modeling
can
be
confidently
built
upon
this
stable
anchorage
.
The
Intel
®
Architecture
:
abbr
:`
IA
(
Intel
®
Architecture
)`
transformer
provides
two
modes
that
reduce
compilation
time
,
and
have
already
been
shown
as
useful
for
training
,
deploying
,
and
retraining
a
deep
learning
workload
in
production
.
For
example
,
in
our
tests
,
DEX
mode
reduced
ResNet50
compilation
time
by
30
X
.
We
are
excited
to
continue
our
work
in
enabling
distributed
training
,
and
we
plan
to
expand
the
nodes
to
256
in
Q4
‘
18.
Additionally
,
we
are
testing
model
parallelism
in
addition
to
data
parallelism
.
..
note
::
In
this
Beta
release
,
nGraph
via
Bridge
code
supports
only
:
abbr
:`
Just
In
Time
(
JiT
)`
compilation
;
the
nGraph
ONNX
companion
tool
supports
dynamic
graphs
and
will
add
additional
support
for
Ahead
of
Time
compilation
in
the
official
release
.
:alt:
Bridge
^^^^^^
Starting from the top of the stack, nGraph receives a computational
graph from a deep learning framework such as TensorFlow\* or MXNet\*.
The computational graph is converted to an nGraph internal
representation by a bridge created for the corresponding framework.
An nGraph bridge examines the whole graph to pattern match subgraphs
which nGraph knows how to execute, and these subgraphs are encapsulated.
Parts of the graph that are not encapsulated will default to framework
implementation when executed.
nGraph Core
^^^^^^^^^^^
nGraph uses a strongly-typed and platform-neutral
``Intermediate Representation (IR)`` to construct a "stateless"
computational graph. Each node, or op, in the graph corresponds to one
``step`` in a computation, where each step produces zero or more tensor
outputs from zero or more tensor inputs.
This allows nGraph to apply its state of the art optimizations instead
of having to follow how a particular framework implements op execution,
memory management, data layouts, etc.
In addition, using nGraph IR allows faster optimization delivery for
many of the supported frameworks. For example, if nGraph optimizes
ResNet\* for TensorFlow\ *, the same optimization can be readily applied
to MXNet* or ONNX\* implementations of ResNet\*.
Hybrid Transformer
^^^^^^^^^^^^^^^^^^
Hybrid transformer takes the nGraph IR, and partitions it into
subgraphs, which can then be assigned to the best-performing backend.
There are two hardware backends shown in the stack diagram to
demonstrate this graph partitioning. The Hybrid transformer assigns
complex operations (subgraphs) to Intel® Nervana™ Neural Network
Processor (NNP) to expedite the computation, and the remaining
operations default to CPU. In the future, we will further expand the
capabilities of Hybrid transformer by enabling more features, such as
localized cost modeling and memory sharing.
Once the subgraphs are assigned, the corresponding backend will execute
the IR.
Backends
^^^^^^^^
Focusing our attention on the CPU backend, when the IR is passed to the
Intel® Architecture (IA) transformer, it can be executed in two modes:
Direct EXecution (DEX) and code generation (``codegen``).
In ``codegen`` mode, nGraph generates and compiles code which can either
call into highly optimized kernels like MKL-DNN or JITers like Halide.
Although our team wrote kernels for nGraph for some operations, nGraph
leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.
MLSL library is called when nGraph executes distributed training. At the
time of the nGraph Beta release, nGraph achieved state of the art
results for ResNet50 with 16 nodes and 32 nodes for TensorFlow\* and
MXNet\*. We are excited to continue our work in enabling distributed
training, and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.
The other mode of execution is Direct EXecution (DEX). In DEX mode,
nGraph can execute the operations by directly calling associated kernels
as it walks though the IR instead of compiling via ``codegen``. This
mode reduces the compilation time, and it will be useful for training,
deploying, and retraining a deep learning workload in production. In our
tests, DEX mode reduced ResNet50 compilation time by 30X.
nGraph further tries to speed up the computation by leveraging
multi-threading and graph scheduling libraries such as OpenMP and TBB
Flow Graph.
.. _features:
Features
========
The
nGraph
:
abbr
:`(
IR
)
Intermediate
Representation
`
contains
a
combination
of
device
-
specific
and
non
-
device
-
specific
optimization
:
*
**
Fusion
**
--
Fuse
multiple
ops
to
to
decrease
memory
usage
.
*
**
Data
layout
abstraction
**
--
Make
abstraction
easier
and
faster
with
nGraph
translating
element
order
to
work
best
for
a
given
or
available
device
.
*
**
Data
reuse
**
--
Save
results
and
reuse
for
subgraphs
with
the
same
input
.
*
**
Graph
scheduling
**
--
Run
similar
subgraphs
in
parallel
via
multi
-
threading
.
*
**
Graph
partitioning
**
--
Partition
subgraphs
to
run
on
different
devices
to
speed
up
computation
;
make
better
use
of
spare
CPU
cycles
with
nGraph
.
*
**
Memory
management
**
--
Prevent
peak
memory
usage
by
intercepting
a
graph
with
or
by
a
"saved checkpoint,"
and
to
enable
data
auditing
.
*
**
Data
layout
abstraction
**
--
Make
abstraction
easier
and
faster
with
nGraph
translating
element
order
to
work
best
for
whatever
given
or
available
device
.
..
important
::
See
:
doc
:`../
ops
/
index
`
to
learn
the
nGraph
means
for
graph
computations
.
..
Our
design
philosophy
is
that
the
graph
is
not
a
script
for
running
kernels
;
rather
,
our
compilation
will
match
``
ops
``
to
appropriate
available
kernels
(
or
when
available
,
such
as
with
CPU
cycles
).
Thus
,
we
expect
that
adding
of
new
Core
ops
should
be
infrequent
and
that
most
functionality
instead
gets
added
with
new
functions
that
build
sub
-
graphs
from
existing
core
ops
.
..
_portable
:
Portable
--------
One
of
nGraph
's key features is **framework neutrality**. While we currently
support :doc:`three popular <../framework-integration-guides>` frameworks with
pre-optimized deployment runtimes for training :abbr:`Deep Neural Network (DNN)`,
models, you are not limited to these when choosing among frontends. Architects
of any framework (even those not listed above) can use our documentation for how
to :doc:`compile and run <../howto/execute>` a training model and design or tweak
a framework to bridge directly to the nGraph compiler. With a *portable* model
at the core of your :abbr:`DL (Deep Learning)` ecosystem, it'
s
no
longer
necessary
to
bring
large
datasets
to
the
model
for
training
;
you
can
take
your
model
--
in
whole
,
or
in
part
--
to
where
the
data
lives
and
save
potentially
significant
or
quantifiable
machine
resources
.
..
_adaptable
:
Adaptable
---------
We
've recently begun support for the `ONNX`_ format. Developers who already have
a "trained" :abbr:`DNN (Deep Neural Network)` model can use nGraph to bypass
significant framework-based complexity and :doc:`import it <../howto/import>`
to test or run on targeted and efficient backends with our user-friendly
Python-based API. See the `ngraph onnx companion tool`_ to get started.
.. _deployable:
Deployable
----------
It'
s
no
secret
that
the
:
abbr
:`
DL
(
Deep
Learning
)`
ecosystem
is
evolving
rapidly
.
Benchmarking
comparisons
can
be
blown
steeply
out
of
proportion
by
subtle
tweaks
to
batch
or
latency
numbers
here
and
there
.
Where
traditional
GPU
-
based
training
excels
,
inference
can
lag
and
vice
versa
.
Sometimes
what
we
care
about
is
not
"speed at training a large dataset"
but
rather
latency
compiling
a
complex
multi
-
layer
algorithm
locally
,
and
then
outputting
back
to
an
edge
network
,
where
it
can
be
analyzed
by
an
already
-
trained
model
.
Indeed
,
when
choosing
among
topologies
,
it
is
important
to
not
lose
sight
of
the
ultimate
deployability
and
machine
-
runtime
demands
of
your
component
in
the
larger
ecosystem
.
It
doesn
't make sense to use a heavy-duty backhoe to
plant a flower bulb. Furthermore, if you are trying to develop an entirely
new genre of modeling for a :abbr:`DNN (Deep Neural Network)` component, it
may be especially beneficial to consider ahead of time how portable and
mobile you want that model to be within the rapidly-changing ecosystem.
With nGraph, any modern CPU can be used to design, write, test, and deploy
a training or inference model. You can then adapt and update that same core
model to run on a variety of backends
nGraph performs a combination of device-specific and non-device-specific
optimizations:
- **Fusion** -- Fuse multiple ops to to decrease memory usage.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for a given or
available device.
- **Data reuse** -- Save results and reuse for subgraphs with the same
input.
- **Graph scheduling** -- Run similar subgraphs in parallel via
multi-threading.
- **Graph partitioning** -- Partition subgraphs to run on different
devices to speed up computation; make better use of spare CPU cycles
with nGraph.
- **Memory management** -- Prevent peak memory usage by intercepting a
graph with or by a "saved checkpoint," and to enable data auditing.
- **Data layout abstraction** -- Make abstraction easier and faster
with nGraph translating element order to work best for whatever given
or available device.
Beta Limitations
----------------
In this Beta release, nGraph only supports Just In Time compilation, but
we plan to add support for Ahead of Time compilation in the official
release of nGraph. nGraph currently has limited support for dynamic
graphs.
.. _no-lockin:
...
...
@@ -212,10 +182,13 @@ framework, and the result is a function that can be compiled from a framework.
A fully-compiled function that makes use of bridge code thus becomes a "function
graph", or what we sometimes call an **nGraph graph**.
.. note:: Low-level nGraph APIs are not accessible *dynamically* via bridge code;
this is the nature of stateless graphs. However, do note that a graph with a
"saved" checkpoint can be "continued" to run from a previously-applied
checkpoint, or it can loaded as static graph for further inspection.
.. important:: See :doc:`../ops/index` to learn about Core Ops.
Our design philosophy is that the graph is not a script for running kernels;
rather, our compilation will match ``ops`` to appropriate available kernels
(or when available, such as with CPU cycles). Thus, we expect that adding of
new Core ops should be infrequent and that most functionality instead gets
added with new functions that build sub-graphs from existing core ops.
For a more detailed dive into how custom bridge code can be implemented, see our
documentation on how to :doc:`../howto/execute`. To learn how TensorFlow and
...
...
@@ -228,23 +201,6 @@ MXNet currently make use of custom bridge code, see the section on
JiT Compiling for computation
Given that we have no way to predict how many other frameworks designed around
model, workload, or framework-specific purposes there may be, it would be
impossible for us to create bridges for every framework that currently exists
(or that will exist in the future). Although we only support a few frameworks,
we provide documentation to help developers and engineers figure out how to
get custom solutions working, such as for edge cases.
.. csv-table::
:header: "Framework", "Bridge Available?", "ONNX Support?"
:widths: 27, 10, 10
TensorFlow, Yes, Yes
MXNet, Yes, Yes
PaddlePaddle, Coming Soon, Yes
PyTorch, No, Yes
Other, Write your own, Custom
How do I run an inference model?
--------------------------------
...
...
@@ -269,7 +225,7 @@ our `arXiv paper`_ from the 2018 SysML conference.
.. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
.. _ONNX: http://onnx.ai
..
_NNVM
:
http
://
.. _NNVM: http
s://github.com/dmlc/nnvm
.. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
.. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
.. _Movidius: https://developer.movidius.com/
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment