Commit e10fa7d2 authored by Adam Procter's avatar Adam Procter

Merge remote-tracking branch 'origin/master' into r0.10

parents f8a0f784 ca4437bb
...@@ -114,6 +114,7 @@ if(${CMAKE_VERSION} VERSION_LESS 3.2) ...@@ -114,6 +114,7 @@ if(${CMAKE_VERSION} VERSION_LESS 3.2)
-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER} -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn -DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn
-DMKLROOT=${MKL_ROOT} -DMKLROOT=${MKL_ROOT}
"-DARCH_OPT_FLAGS=-march=${NGRAPH_TARGET_ARCH} -mtune=${NGRAPH_TARGET_ARCH}"
TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp" TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp"
STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp" STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp"
DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download" DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download"
...@@ -145,6 +146,7 @@ else() ...@@ -145,6 +146,7 @@ else()
-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER} -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
-DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn -DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn
-DMKLROOT=${MKL_ROOT} -DMKLROOT=${MKL_ROOT}
"-DARCH_OPT_FLAGS=-march=${NGRAPH_TARGET_ARCH} -mtune=${NGRAPH_TARGET_ARCH}"
TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp" TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp"
STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp" STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp"
DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download" DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download"
......
...@@ -6,26 +6,29 @@ ...@@ -6,26 +6,29 @@
Branding Notice Branding Notice
=============== ===============
The Intel® nGraph™ library is an open source project providing code and component The Intel® nGraph Library and Compiler stack is an open source project providing
reference for many kinds of machine learning, deep learning, and DNN applications. code and component reference for many kinds of machine learning, deep learning,
and DNN applications.
Documentation may include references to frontend frameworks, modules, extensions, Our documentation may include references to various frontends / frameworks,
or other libraries that may be wholly or partially open source, or that may be modules, extensions, or other libraries that may be wholly or partially open
claimed as the property of others. source, or that may be claimed as the property of others.
Intel, the Intel logo and Intel Nervana are trademarks of Intel Corporation or
its subsidiaries in the U.S. and/or other countries.
Intel nGraph library core documentation Documentation notice
--------------------------------------- ---------------------
.. note:: The branding notice below applies to code and documentation .. note:: The branding notice below applies to code and documentation
contributions intended to be added directly to Intel nGraph library core. contributions intended to be added directly to the Intel nGraph repo.
Use the first or most prominent usage with symbols as described below. Use the first or most prominent usage with symbols as described below.
Subsequent references on the same document, or on a file with an Subsequent references on the same document, or on a file with an
already-present prominent form (such as Sphinx\* documentation sidebars), already-present prominent form (such as Sphinx\* documentation sidebars),
may be done as an abbreviated form (sub-bullet items) and/or without the may be done as an abbreviated form (sub-bullet items) and/or without the
repeated use of the trademark / branding symbols. repeated use of the trademark / branding symbols.
* Intel® Nervana™ Neural Network Processor * Intel® Nervana™ Neural Network Processor
...@@ -39,15 +42,16 @@ repeated use of the trademark / branding symbols. ...@@ -39,15 +42,16 @@ repeated use of the trademark / branding symbols.
* Intel® nGraph™ * Intel® nGraph™
* Intel® nGraph™ library * Intel® nGraph Library
* Intel® nGraph Compiler
* Intel® nGraph Backend
* Intel® nGraph API
* Movidius™ Myriad™
* nGraph library
* ``ngraph`` API
* ``ngraph`` library
* ``ngraph`` backend
* nGraph abstraction layer
* neon™ frontend framework
* Intel® Math Kernel Library * Intel® Math Kernel Library
* Intel® MKL * Intel® MKL
...@@ -59,3 +63,20 @@ repeated use of the trademark / branding symbols. ...@@ -59,3 +63,20 @@ repeated use of the trademark / branding symbols.
* Intel® Nervana™ Graph (deprecated) * Intel® Nervana™ Graph (deprecated)
Optimization Notices
====================
Software and workloads used in performance tests may have been optimized for
performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance
tests to assist you in fully evaluating your contemplated purchases, including
the performance of that product when combined with other products. For more
complete information visit http://www.intel.com/benchmarks.
Intel technologies' features and benefits depend on system configuration and may
require enabled hardware, software or service activation. Performance varies
depending on system configuration. No computer system can be absolutely secure.
Check with your system manufacturer or retailer or learn more at intel.com.
...@@ -62,7 +62,7 @@ source_suffix = '.rst' ...@@ -62,7 +62,7 @@ source_suffix = '.rst'
master_doc = 'index' master_doc = 'index'
# General information about the project. # General information about the project.
project = u'Intel® nGraph Library' project = u'nGraph Compiler stack'
copyright = '2018, Intel Corporation' copyright = '2018, Intel Corporation'
author = 'Intel Corporation' author = 'Intel Corporation'
......
...@@ -28,12 +28,13 @@ See the latest :doc:`project/release-notes`. ...@@ -28,12 +28,13 @@ See the latest :doc:`project/release-notes`.
:width: 599px :width: 599px
nGraph is an open-source C++ library, compiler, and runtime accelerator for nGraph is an open-source C++ library, compiler stack, and runtime accelerator
software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph for software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph
simplifies development and makes it possible to design, write, compile, and simplifies development and makes it possible to design, write, compile, and
deploy :abbr:`Deep Neural Network (DNN)`-based solutions. A more detailed deploy :abbr:`Deep Neural Network (DNN)`-based solutions that can be adapted and
explanation of the feature set of nGraph Compiler, as well as a high-level deployed across many frameworks and backends. A more detailed explanation, as
overview, can be found on our project :doc:`project/about`. well as a high-level overview, can be found on our project :doc:`project/about`.
For more generalized discussion on the ecosystem, see the `ecosystem`_ document.
.. _quickstart: .. _quickstart:
...@@ -89,7 +90,7 @@ We have many documentation pages to help you get started. ...@@ -89,7 +90,7 @@ We have many documentation pages to help you get started.
Intel Movidius™ Myriad™ 2 (VPU), Coming soon, Yes Intel Movidius™ Myriad™ 2 (VPU), Coming soon, Yes
.. note:: The Library code is under active development as we're continually .. note:: The code in this repo is under active development as we're continually
adding support for more kinds of DL models and ops, compiler optimizations, adding support for more kinds of DL models and ops, compiler optimizations,
and backend optimizations. and backend optimizations.
...@@ -131,3 +132,4 @@ Indices and tables ...@@ -131,3 +132,4 @@ Indices and tables
.. _contributions: https://github.com/NervanaSystems/ngraph#how-to-contribute .. _contributions: https://github.com/NervanaSystems/ngraph#how-to-contribute
.. _TensorFlow bridge to nGraph: https://github.com/NervanaSystems/ngraph-tf/blob/master/README.md .. _TensorFlow bridge to nGraph: https://github.com/NervanaSystems/ngraph-tf/blob/master/README.md
.. _Compiling MXNet with nGraph: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/README.md .. _Compiling MXNet with nGraph: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/README.md
.. _ecosystem: https://github.com/NervanaSystems/ngraph/blob/master/ecosystem-overview.md
.. about: . about:
Architecture, Features, FAQs Architecture, Features, FAQs
...@@ -15,154 +15,124 @@ Architecture, Features, FAQs ...@@ -15,154 +15,124 @@ Architecture, Features, FAQs
nGraph Compiler stack architecture nGraph Compiler stack architecture
================================== ==================================
The diagram below represents our current Beta release stack. In the
The diagram below represents our current |release| release stack. Please diagram, nGraph components are colored in gray. Please note that the
note that the stack diagram is simplified to show how nGraph executes deep stack diagram is simplified to show how nGraph executes deep learning
learning workloads with two hardware backends; however, many other deep workloads with two hardware backends; however, many other deep learning
learning frameworks and backends currently are functioning. frameworks and backends currently are functioning.
.. figure:: ../graphics/stackngrknl.png .. figure:: ../graphics/stackngrknl.png
:width: 455px :alt:
:alt: Current Beta release stack
Bridge
Simplified stack diagram for nGraph Compiler and components Beta ^^^^^^
Starting from the top of the diagram, we present a simplified view of the nGraph Starting from the top of the stack, nGraph receives a computational
Intermediate Representation (IR). The nGraph IR is a format which works with a graph from a deep learning framework such as TensorFlow\* or MXNet\*.
framework such as TensorFlow* or MXNet* when there is a corresponding "Bridge" The computational graph is converted to an nGraph internal
or import method, such as from NNVM or via `ONNX`_. Once the nGraph IR can begin representation by a bridge created for the corresponding framework.
using nGraph's Core ops, components lower in the stack can begin parsing and
pattern-matching subgraphs for device-specific optimizations; these are then An nGraph bridge examines the whole graph to pattern match subgraphs
encapsulated. This encapsulation is represented on the diagram as the colored which nGraph knows how to execute, and these subgraphs are encapsulated.
background between the ``ngraph`` kernel(s) and the the stack above. Parts of the graph that are not encapsulated will default to framework
implementation when executed.
Note that everything at or below the **Kernel APIs** and **Subgraph APIs** gets
executed "automatically" during training runs. In other words, the accelerations nGraph Core
are automatic: parts of the graph that are not encapsulated default to framework ^^^^^^^^^^^
implementation when executed. For example, if nGraph optimizes ResNet50 for
TensorFlow, the same optimization can be readily applied to the NNVM/MXNet nGraph uses a strongly-typed and platform-neutral
implementation of ResNet50. This works efficiently because the nGraph ``Intermediate Representation (IR)`` to construct a "stateless"
:abbr:`(IR) Intermediate Representation`, which keeps the input and output computational graph. Each node, or op, in the graph corresponds to one
semantics of encapsulated subgraphs, rebuilds an encapsulated subgraph that can ``step`` in a computation, where each step produces zero or more tensor
efficiently make use or re-use of operations. Such an approach significantly outputs from zero or more tensor inputs.
cuts down on the time needed to compile; when we're not relying upon the
framework's ops alone, memory management and data layouts can be more efficiently This allows nGraph to apply its state of the art optimizations instead
applied to the hardware backends in use. of having to follow how a particular framework implements op execution,
memory management, data layouts, etc.
The :doc:`nGraph Core <../ops/index>` uses a strongly-typed and platform-neutral
:abbr:`(IR) Intermediate Representation` to construct a "stateless" graph. Each In addition, using nGraph IR allows faster optimization delivery for
node, or ``op``, in the graph corresponds to one :term:`step` in a computation, many of the supported frameworks. For example, if nGraph optimizes
where each step produces zero or more tensor outputs from zero or more tensor ResNet\* for TensorFlow\ *, the same optimization can be readily applied
inputs. to MXNet* or ONNX\* implementations of ResNet\*.
After construction, our Hybrid transformer takes the IR, further partitions it Hybrid Transformer
into subgraphs, and assigns them to the best-performing backend. There are two ^^^^^^^^^^^^^^^^^^
hardware backends shown in the stack diagram to demonstrate nGraph's graph
partitioning. The Hybrid transformer assigns complex operations (subgraphs) to Hybrid transformer takes the nGraph IR, and partitions it into
the Intel® Nervana :abbr:`Neural Network Processor (NNP)`, or to a different subgraphs, which can then be assigned to the best-performing backend.
CPU backend to expedite the computation, and the remaining operations default There are two hardware backends shown in the stack diagram to
to CPU. In the future, we will further expand the capabilities of Hybrid demonstrate this graph partitioning. The Hybrid transformer assigns
transformer by enabling more features, such as localized cost modeling and complex operations (subgraphs) to Intel® Nervana™ Neural Network
memory sharing, when the next generation of :abbr:`NNP (Neural Network Processor)` Processor (NNP) to expedite the computation, and the remaining
is released. In the meantime, your deep learning software engineering or modeling operations default to CPU. In the future, we will further expand the
can be confidently built upon this stable anchorage. capabilities of Hybrid transformer by enabling more features, such as
localized cost modeling and memory sharing.
The Intel® Architecture :abbr:`IA (Intel® Architecture)` transformer provides
two modes that reduce compilation time, and have already been shown as useful Once the subgraphs are assigned, the corresponding backend will execute
for training, deploying, and retraining a deep learning workload in production. the IR.
For example, in our tests, DEX mode reduced ResNet50 compilation time by 30X.
Backends
We are excited to continue our work in enabling distributed training, and we ^^^^^^^^
plan to expand the nodes to 256 in Q4 18. Additionally, we are testing model
parallelism in addition to data parallelism. Focusing our attention on the CPU backend, when the IR is passed to the
Intel® Architecture (IA) transformer, it can be executed in two modes:
.. note:: In this Beta release, nGraph via Bridge code supports only :abbr:`Just In Time (JiT)` Direct EXecution (DEX) and code generation (``codegen``).
compilation; the nGraph ONNX companion tool supports dynamic graphs and will
add additional support for Ahead of Time compilation in the official release. In ``codegen`` mode, nGraph generates and compiles code which can either
call into highly optimized kernels like MKL-DNN or JITers like Halide.
Although our team wrote kernels for nGraph for some operations, nGraph
leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.
MLSL library is called when nGraph executes distributed training. At the
time of the nGraph Beta release, nGraph achieved state of the art
results for ResNet50 with 16 nodes and 32 nodes for TensorFlow\* and
MXNet\*. We are excited to continue our work in enabling distributed
training, and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
are testing model parallelism in addition to data parallelism.
The other mode of execution is Direct EXecution (DEX). In DEX mode,
nGraph can execute the operations by directly calling associated kernels
as it walks though the IR instead of compiling via ``codegen``. This
mode reduces the compilation time, and it will be useful for training,
deploying, and retraining a deep learning workload in production. In our
tests, DEX mode reduced ResNet50 compilation time by 30X.
nGraph further tries to speed up the computation by leveraging
multi-threading and graph scheduling libraries such as OpenMP and TBB
Flow Graph.
.. _features: .. _features:
Features Features
======== ========
The nGraph :abbr:`(IR) Intermediate Representation` contains a combination of nGraph performs a combination of device-specific and non-device-specific
device-specific and non-device-specific optimization : optimizations:
* **Fusion** -- Fuse multiple ops to to decrease memory usage. - **Fusion** -- Fuse multiple ops to to decrease memory usage.
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph - **Data layout abstraction** -- Make abstraction easier and faster
translating element order to work best for a given or available device. with nGraph translating element order to work best for a given or
* **Data reuse** -- Save results and reuse for subgraphs with the same input. available device.
* **Graph scheduling** -- Run similar subgraphs in parallel via multi-threading. - **Data reuse** -- Save results and reuse for subgraphs with the same
* **Graph partitioning** -- Partition subgraphs to run on different devices to input.
speed up computation; make better use of spare CPU cycles with nGraph. - **Graph scheduling** -- Run similar subgraphs in parallel via
* **Memory management** -- Prevent peak memory usage by intercepting a graph multi-threading.
with or by a "saved checkpoint," and to enable data auditing. - **Graph partitioning** -- Partition subgraphs to run on different
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph devices to speed up computation; make better use of spare CPU cycles
translating element order to work best for whatever given or available device. with nGraph.
- **Memory management** -- Prevent peak memory usage by intercepting a
.. important:: See :doc:`../ops/index` to learn the nGraph means for graph computations. graph with or by a "saved checkpoint," and to enable data auditing.
- **Data layout abstraction** -- Make abstraction easier and faster
.. Our design philosophy is that the graph is not a script for running kernels; with nGraph translating element order to work best for whatever given
rather, our compilation will match ``ops`` to appropriate available kernels or available device.
(or when available, such as with CPU cycles). Thus, we expect that adding of
new Core ops should be infrequent and that most functionality instead gets Beta Limitations
added with new functions that build sub-graphs from existing core ops. ----------------
In this Beta release, nGraph only supports Just In Time compilation, but
.. _portable: we plan to add support for Ahead of Time compilation in the official
release of nGraph. nGraph currently has limited support for dynamic
Portable graphs.
--------
One of nGraph's key features is **framework neutrality**. While we currently
support :doc:`three popular <../framework-integration-guides>` frameworks with
pre-optimized deployment runtimes for training :abbr:`Deep Neural Network (DNN)`,
models, you are not limited to these when choosing among frontends. Architects
of any framework (even those not listed above) can use our documentation for how
to :doc:`compile and run <../howto/execute>` a training model and design or tweak
a framework to bridge directly to the nGraph compiler. With a *portable* model
at the core of your :abbr:`DL (Deep Learning)` ecosystem, it's no longer necessary
to bring large datasets to the model for training; you can take your model -- in
whole, or in part -- to where the data lives and save potentially significant
or quantifiable machine resources.
.. _adaptable:
Adaptable
---------
We've recently begun support for the `ONNX`_ format. Developers who already have
a "trained" :abbr:`DNN (Deep Neural Network)` model can use nGraph to bypass
significant framework-based complexity and :doc:`import it <../howto/import>`
to test or run on targeted and efficient backends with our user-friendly
Python-based API. See the `ngraph onnx companion tool`_ to get started.
.. _deployable:
Deployable
----------
It's no secret that the :abbr:`DL (Deep Learning)` ecosystem is evolving
rapidly. Benchmarking comparisons can be blown steeply out of proportion by
subtle tweaks to batch or latency numbers here and there. Where traditional
GPU-based training excels, inference can lag and vice versa. Sometimes what we
care about is not "speed at training a large dataset" but rather latency
compiling a complex multi-layer algorithm locally, and then outputting back to
an edge network, where it can be analyzed by an already-trained model.
Indeed, when choosing among topologies, it is important to not lose sight of
the ultimate deployability and machine-runtime demands of your component in
the larger ecosystem. It doesn't make sense to use a heavy-duty backhoe to
plant a flower bulb. Furthermore, if you are trying to develop an entirely
new genre of modeling for a :abbr:`DNN (Deep Neural Network)` component, it
may be especially beneficial to consider ahead of time how portable and
mobile you want that model to be within the rapidly-changing ecosystem.
With nGraph, any modern CPU can be used to design, write, test, and deploy
a training or inference model. You can then adapt and update that same core
model to run on a variety of backends
.. _no-lockin: .. _no-lockin:
...@@ -212,10 +182,13 @@ framework, and the result is a function that can be compiled from a framework. ...@@ -212,10 +182,13 @@ framework, and the result is a function that can be compiled from a framework.
A fully-compiled function that makes use of bridge code thus becomes a "function A fully-compiled function that makes use of bridge code thus becomes a "function
graph", or what we sometimes call an **nGraph graph**. graph", or what we sometimes call an **nGraph graph**.
.. note:: Low-level nGraph APIs are not accessible *dynamically* via bridge code; .. important:: See :doc:`../ops/index` to learn about Core Ops.
this is the nature of stateless graphs. However, do note that a graph with a
"saved" checkpoint can be "continued" to run from a previously-applied Our design philosophy is that the graph is not a script for running kernels;
checkpoint, or it can loaded as static graph for further inspection. rather, our compilation will match ``ops`` to appropriate available kernels
(or when available, such as with CPU cycles). Thus, we expect that adding of
new Core ops should be infrequent and that most functionality instead gets
added with new functions that build sub-graphs from existing core ops.
For a more detailed dive into how custom bridge code can be implemented, see our For a more detailed dive into how custom bridge code can be implemented, see our
documentation on how to :doc:`../howto/execute`. To learn how TensorFlow and documentation on how to :doc:`../howto/execute`. To learn how TensorFlow and
...@@ -228,23 +201,6 @@ MXNet currently make use of custom bridge code, see the section on ...@@ -228,23 +201,6 @@ MXNet currently make use of custom bridge code, see the section on
JiT Compiling for computation JiT Compiling for computation
Given that we have no way to predict how many other frameworks designed around
model, workload, or framework-specific purposes there may be, it would be
impossible for us to create bridges for every framework that currently exists
(or that will exist in the future). Although we only support a few frameworks,
we provide documentation to help developers and engineers figure out how to
get custom solutions working, such as for edge cases.
.. csv-table::
:header: "Framework", "Bridge Available?", "ONNX Support?"
:widths: 27, 10, 10
TensorFlow, Yes, Yes
MXNet, Yes, Yes
PaddlePaddle, Coming Soon, Yes
PyTorch, No, Yes
Other, Write your own, Custom
How do I run an inference model? How do I run an inference model?
-------------------------------- --------------------------------
...@@ -269,7 +225,7 @@ our `arXiv paper`_ from the 2018 SysML conference. ...@@ -269,7 +225,7 @@ our `arXiv paper`_ from the 2018 SysML conference.
.. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf .. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
.. _ONNX: http://onnx.ai .. _ONNX: http://onnx.ai
.. _NNVM: http:// .. _NNVM: https://github.com/dmlc/nnvm
.. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx .. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
.. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn .. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
.. _Movidius: https://developer.movidius.com/ .. _Movidius: https://developer.movidius.com/
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment