Commit a8b789fc authored by Leona C's avatar Leona C Committed by Sang Ik Lee

Compiler passes section collab (#2533)

* Cleaner API doc reference for compile call

* Add a useful table for nGraph namespaces

* Remove layout namespace

* Show exploding kernel problem on illustration like IEEE preso

* WIP branch for new documentation restructuring that is a huge pain

* Fix the doc reorg mess

* Fix underline

* List of passes disclaimer note

* Update disclaimers on README

* More cleanup of doc reorg

* Update core docs

* Update overview on core

* Add PR feedback

* Get rid of all the gazillion of doc build errors from rearranging stuff

* Add section on tutorials

* Update branch

* Cleanup intro

* Add better detail to overview

* Revise buildlb instructions and add better title for contributing to doc

* Note about unit tests

* Editing

* Update core overview namespace table and fix more broken links due to ToC changes

* Add doc on pass manager register and run passes code from unit tests

* Add doc on pass manager register and run passes code from unit tests

* Make the compiler passes section more awesome

* Consistent sentence case on all ToC headings

* Update for gold docs

* Add better detail about execution interface

* Minor edits

* Revert strange change

* Update with bucketed list of passes

* Fix build error
parent 6d2f182b
...@@ -27,6 +27,26 @@ How to use? ...@@ -27,6 +27,26 @@ How to use?
#. A single iteration of the executable is executed by calling the ``call`` #. A single iteration of the executable is executed by calling the ``call``
method on the ``Executable`` object. method on the ``Executable`` object.
.. figure:: ../graphics/execution-interface.png
:width: 650px
The execution interface for nGraph
The nGraph execution API for ``Executable`` objects is a simple, five-method
interface; each backend implements the following five functions:
* The ``create_tensor()`` method allows the bridge to create tensor objects
in host memory or an accelerator's memory.
* The ``write()`` and ``read()`` methods are used to transfer raw data into
and out of tensors that reside in off-host memory.
* The ``compile()`` method instructs the backend to prepare an nGraph function
for later execution.
* And, finally, the ``call()`` method is used to invoke an nGraph function
against a particular set of tensors.
.. _backend-api: .. _backend-api:
......
.. buildlb.rst: .. buildlb.rst:
###################### ###########################
Build the C++ Library nGraph Library for backends
###################### ###########################
This section details how to build the C++ version of the nGraph Library, which
is targeted toward developers working on kernel-specific operations,
optimizations, or on deep learning solutions that leverage custom backends.
* :ref:`ubuntu` * :ref:`ubuntu`
* :ref:`centos` * :ref:`centos`
...@@ -132,7 +136,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS) or on Ubuntu ...@@ -132,7 +136,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS) or on Ubuntu
.. code-block:: console .. code-block:: console
$ cmake .. [-DNGRAPH_USE_PREBUILT_LLVM=TRUE] [-DNGRAPH_TARGET_ARCH=skylake-avx512] $ cmake .. [-DNGRAPH_USE_PREBUILT_LLVM=OFF] [-DNGRAPH_TARGET_ARCH=skylake-avx512]
#. Run ``$ make`` and ``make install`` to install ``libngraph.so`` and the #. Run ``$ make`` and ``make install`` to install ``libngraph.so`` and the
header files to ``~/ngraph_dist``: header files to ``~/ngraph_dist``:
...@@ -223,10 +227,16 @@ according to those conventions. These scripts require the command ...@@ -223,10 +227,16 @@ according to those conventions. These scripts require the command
Testing the build Testing the build
================= =================
We use the `googletest framework`_ from Google for unit tests. The We use the `googletest framework`_ from Google for unit tests. The ``cmake``
``NGRAPH_UNIT_TEST_ENABLE`` build flag is enabled by default when building command automatically downloaded a copy of the needed ``gtest`` files when
with cmake, so to perform unit tests, simply enter the build directory and it configured the build directory.
run ``make check``:
To perform unit tests on the install:
#. Create and configure the build directory as described in our
:doc:`buildlb` guide.
#. Enter the build directory and run ``make check``:
.. code-block:: console .. code-block:: console
...@@ -238,8 +248,8 @@ Adding framework support ...@@ -238,8 +248,8 @@ Adding framework support
======================== ========================
After building and installing nGraph on your system, there are two likely After building and installing nGraph on your system, there are two likely
paths for what you'll want to do next: either compile a framework to run a paths for what you'll want to do next: either compile a framework to run a DL
DL model, or load an import of an "already-trained" model for inference training model, or load an import of an "already-trained" model for inference
on an Intel nGraph-enabled backend. on an Intel nGraph-enabled backend.
For the former case, this early |version|, :doc:`frameworks/index`, For the former case, this early |version|, :doc:`frameworks/index`,
......
.. howto/index: .. howto/index:
Constructing Graphs Constructing graphs
=================== ===================
.. toctree:: .. toctree::
......
...@@ -62,8 +62,8 @@ descriptions: ...@@ -62,8 +62,8 @@ descriptions:
:escape: ~ :escape: ~
``ngraph``, The Intel nGraph C++ API, `ngraph`_, Implicit namespace omitted from most API documentation ``ngraph``, The Intel nGraph C++ API, `ngraph`_, Implicit namespace omitted from most API documentation
``builder``, "Convenience functions that create additional graph nodes to implement commonly-used recipes; for example, auto-broadcast", `builder`_, " " ``builder``, "Convenience functions that create additional graph nodes to implement commonly-used recipes; for example, auto-broadcast", `builder`_, Coming Soon
``descriptor``, Descriptors are compile-time representations of objects that will appear at run-time, `descriptor`_, " " ``descriptor``, Descriptors are compile-time representations of objects that will appear at run-time, `descriptor`_, Coming Soon
``op``, Ops used in graph construction, `op`_, :doc:`../ops/index` ``op``, Ops used in graph construction, `op`_, :doc:`../ops/index`
``runtime``, The objects and methods used for executing the graph, `runtime`_, :doc:`../backend-support/cpp-api` ``runtime``, The objects and methods used for executing the graph, `runtime`_, :doc:`../backend-support/cpp-api`
......
.. core/passes/list-of-passes: .. core/passes/list-of-passes:
List of passes List of passes
============== ##############
The kinds of compiler passes available can be broken down into different buckets:
Graph Optimization Passes
=========================
.. csv-table:: .. csv-table::
:header: "Pass Name", "More Detail" :header: "Graph Optimization Passes", "More Detail"
:widths: 29, 31 :widths: 29, 31
:escape: ~ :escape: ~
``AlgebraicSimplification``, :ref:`algebraic_simpl` ``AlgebraicSimplification``, :ref:`algebraic_simpl`
``AssignLayout``, Coming Soon
``CallGraphPass``, Coming Soon
``CommonFunctionCollection``, Coming Soon
``CommonSubexpressionElimination``, :ref:`common_subex_elim` ``CommonSubexpressionElimination``, :ref:`common_subex_elim`
``ConstantFolding``, :ref:`constant_fold` ``ConstantFolding``, :ref:`constant_fold`
``CoreFusion``, Coming Soon ``CoreFusion``, :ref:`core_fusion`
``DumpSorted``, Coming Soon
``FunctionPass``, Coming Soon
``GetOutputElementElimination``, Coming Soon
``GraphRewrite``, Coming Soon
``LikeReplacement``, Coming Soon
``Liveness``, Coming Soon
``Manager``, Coming Soon
``ManagerState``, Coming Soon
``MemoryLayout``, Coming Soon
``MemoryManager``, Coming Soon
``MemoryVisualize``, Coming Soon
``ModulePass``, Coming Soon
``NodePass``, Coming Soon
``NopElimination``, Coming Soon
``PassBase``, Coming Soon
``PassConfig``, Coming Soon
``PrefixReshapeElimination``, Coming Soon
``PropagateCacheability``, Coming Soon
``RecurrentGraphRewrite``, Coming Soon
``ReshapeElimination``, :ref:`reshape_transpose_elim` ``ReshapeElimination``, :ref:`reshape_transpose_elim`
``ReshapeSinking``, :ref:`reshape_transpose_sink` ``ReshapeSinking``, :ref:`reshape_transpose_sink`
``Serialization``, Coming Soon
``ValidateGraph``, Coming Soon
``VisualizeTree``, Coming Soon Node Optimization Passes
``ZeroDimTensorElimination``, Coming soon ========================
.. csv-table::
:header: "Node Optimization Passes", "More Detail"
:widths: 29, 31
:escape: ~
``NopElimination``, ""
``ZeroDimTensorElimination``, ""
Memory Assignment Passes
========================
.. csv-table::
:header: "Memory Assignment Passes", "More Detail"
:widths: 29, 31
:escape: ~
``AssignLayout``, ""
``Liveness``, ""
``MemoryLayout``, ""
``PropagateCacheability``, ""
Codegen Passes
==============
.. csv-table::
:header: "Codegen Passes", "More Detail"
:widths: 29, 31
:escape: ~
``CommonFunctionCollection``, ""
Debug Passes
============
.. csv-table::
:header: "Debug Passes", "More Detail"
:widths: 29, 31
:escape: ~
``DumpSorted``, ""
``MemoryVisualize``, ""
``Serialization``, ""
``VisualizeTree``, ""
Maintenance Passes
==================
.. csv-table::
:header: "Maintenance Passes", "More Detail"
:widths: 29, 31
:escape: ~
``GetOutputElementElimination``, ""
``LikeReplacement``, ""
``ValidateGraph``, ""
.. important:: All of the above passes are currently implementable; more .. important:: All of the above passes are currently implementable; more
......
...@@ -12,20 +12,32 @@ Compiler passes ...@@ -12,20 +12,32 @@ Compiler passes
Overview: Generic graph optimization passes Overview
------------------------------------------- --------
*Generic graph optimization passes*
This section discusses how to use nGraph to create a Pass Manager for your
backend, and provides both a simple and a complex example to follow.
The pass manager infrastructure in nGraph makes it easy to reuse and mix the The pass manager infrastructure in nGraph makes it easy to reuse and mix the
generic optimization passes. It also permits you to roll your own device-specific generic optimization passes. It also permits you to roll your own device-specific
optimizations; that is, the same unified interface and APIs may be used to optimizations; that is, the same unified interface and APIs may be used to
cover both things. cover both things.
Invoking these passes is fairly straightforward: Invoking these passes is fairly straightforward, illustrated by the following
steps and the code below.
#. Create a "pass manager" object (line 1)
#. Populate it with the desired pass or passes (lines 2-4)
#. Invoke the pass manager with a pointer to your unoptimized graph, and
it will return a pointer to an optimized graph (lines 5-6)
#. Create a "pass manager" object. .. literalinclude:: ../../../../../test/cpu_fusion.cpp
#. Populate it with the desired pass(es). :language: cpp
#. Invoke the pass manager with a pointer to your unoptimized graph, and it’ll return a pointer :lines: 2085-2092
to an optimized graph. :linenos:
nGraph Core includes a large library of hardware-agnostic passes useful nGraph Core includes a large library of hardware-agnostic passes useful
for almost any kind of hardware backend. Some of these passes are likely familiar for almost any kind of hardware backend. Some of these passes are likely familiar
...@@ -33,29 +45,67 @@ to people who are comfortable with classical compiler designs. Others, like the ...@@ -33,29 +45,67 @@ to people who are comfortable with classical compiler designs. Others, like the
reshape/transpose elimination and sinking passes, are quite specific to deep reshape/transpose elimination and sinking passes, are quite specific to deep
learning. learning.
Example of Passes
A simple example
----------------
Here's a fairly straightforward function graph: it has 4 ops:
:doc:`../../ops/convolution`, :doc:`../../ops/broadcast`, :doc:`../../ops/add`,
and :doc:`../../ops/relu`. With nGraph, backends have the ability to rewrite the
graph in ways that are specific to the underlying device/hardware's capabilities.
When, for example, the device is an Intel® Architecture :abbr:`IA (Intel® Architecture)`
CPU, it can support a fused ``ConvolutionBiasReLU`` kernel. The backend is able
to rewrite the graph into its own custom ops that more closely match the
hardware-specific primitives; here they get matched via Intel® MKL-DNN.
.. _figure-simple-compiler:
.. figure:: ../../graphics/simple-compiler-passes.png
:width: 750px
:alt: Simple kernel fusion
Figure A: On the left side of *Figure A* is a fully-formed function
graph prior to fusion. After graph rewrite, the CPU implements a number of
custom fusions.
A complex example
----------------- -----------------
The effectiveness of graph-level optimization with nGraph is more striking to look The effectiveness of graph-level optimization with nGraph is more striking to look
at in terms of an actual input graph, such as one from the framework bridge. at in terms of an actual input graph, such as one from the framework bridge. Here
is slightly more complicated example drawn from a topology called MobileNet which
makes heavy use of group convolution.
In group convolution, sometimes called depthwise convolution, a batch's different
feature channels get divided into groups that are processed independently, rather
than every convolution kernel seeing all of the input feature channels.
With "Group Convolution Fusion", it is possible to optimize a subgraph that has
implemented group convolution by many instances of "ordinary" convolution.
*Figure B* shows an excerpt from ``MobileNet v1``, a topology which makes heavy
use of group convolution. Here, an image batch and a filter batch first undergo
a "preprocessing" phase where segments along the channel axis are sliced out:
one per channel group. Next, there are separate convolutions on each channel
group before finally concatenating the result back together.
*Figure A* shows an excerpt from ``MobileNet v1``, a topology which makes heavy
use of group convolution.
.. _figure-mobilenet-gc: .. _figure-mobilenet-gc:
.. figure:: ../../graphics/mobilenet-group-conv.png .. figure:: ../../graphics/mobilenet-group-conv.png
:width: 700px :width: 700px
:alt: :alt: MobileNet example
Figure A: Each of these grouped convolution complexes -- the Figure B: Each of these grouped convolution complexes -- the
operations within the rectangles on the left -- is very wide; each is too operations within the rectangles on the left -- is very wide; each is too
wide to fit legibly on the illustration. wide to fit legibly on the illustration.
The group convolution fusion is able to replace each of those giant subgraphs The group convolution fusion is able to replace each of those giant subgraphs
with a single CPU group convolution node. This ends up being a win in several with a single CPU group convolution node. This ends up being beneficial in
ways: several ways:
* sheer node count, * Reduces sheer node count,
* mappability to MKL-DNN (which has an accelerated group convolution implementation), * Provides mappability to MKL-DNN, which has an accelerated group convolution implementation, and
* elimination of unnecessary temporaries, and so on. * Eliminates unnecessary temporary nodes.
\ No newline at end of file \ No newline at end of file
.. distr/index.rst: .. distr/index.rst:
############################## ################################
Distributed Training in nGraph Distributed training with nGraph
############################## ################################
.. important:: Distributed training is not officially supported in version |version|; .. important:: Distributed training is not officially supported in version
however, some configuration options have worked for nGraph devices with mixed or |version|; however, some configuration options have worked for nGraph devices
limited success in testing environments. with mixed or limited success in testing environments.
Why distributed training? Why distributed training?
...@@ -47,7 +47,8 @@ distributed training. Deployments using nGraph Library with supported backends ...@@ -47,7 +47,8 @@ distributed training. Deployments using nGraph Library with supported backends
can be configured to train with data parallelism and will soon work with model can be configured to train with data parallelism and will soon work with model
parallelism. Distributing workloads is increasingly important, as more data and parallelism. Distributing workloads is increasingly important, as more data and
bigger models mean the ability to :doc:`../core/constructing-graphs/distribute-train` bigger models mean the ability to :doc:`../core/constructing-graphs/distribute-train`
work with larger and larger datasets, or to work with models having many layers that aren't designed to fit to a single device. work with larger and larger datasets, or to work with models having many layers
that aren't designed to fit to a single device.
Distributed training with data parallelism splits the data and each worker Distributed training with data parallelism splits the data and each worker
node has the same model; during each iteration, the gradients are aggregated node has the same model; during each iteration, the gradients are aggregated
......
...@@ -32,7 +32,7 @@ Glossary ...@@ -32,7 +32,7 @@ Glossary
function graph function graph
The Intel nGraph Library uses a function graph to represent an The nGraph Library uses a function graph to represent an
``op``'s parameters and results. ``op``'s parameters and results.
fusion fusion
......
...@@ -51,7 +51,7 @@ nGraph Compiler stack ...@@ -51,7 +51,7 @@ nGraph Compiler stack
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Backend support :caption: Backend Support
backend-support/index.rst backend-support/index.rst
backend-support/cpp-api.rst backend-support/cpp-api.rst
...@@ -66,7 +66,7 @@ nGraph Compiler stack ...@@ -66,7 +66,7 @@ nGraph Compiler stack
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
:caption: Diagnostics and visualization :caption: Diagnostics and Visualization
diagnostics/nbench.rst diagnostics/nbench.rst
diagnostics/performance-profile.rst diagnostics/performance-profile.rst
...@@ -86,10 +86,10 @@ nGraph Compiler stack ...@@ -86,10 +86,10 @@ nGraph Compiler stack
project/release-notes.rst project/release-notes.rst
project/contribution-guide.rst project/contribution-guide.rst
project/governance.rst
project/doc-contributor-README.rst
project/index.rst project/index.rst
glossary.rst glossary.rst
project/doc-contributor-README.rst
Indices and tables Indices and tables
================== ==================
......
...@@ -15,11 +15,10 @@ ...@@ -15,11 +15,10 @@
.. limitations under the License. .. limitations under the License.
.. --------------------------------------------------------------------------- .. ---------------------------------------------------------------------------
Contributing Documentation Contributing to documentation
========================== =============================
Read this for changes affecting anything in ``ngraph/doc`` .. important:: Read this for changes affecting **anything** in ``ngraph/doc``
----------------------------------------------------------
For updates to the Intel® nGraph Library ``/doc`` repo, please submit a PR with For updates to the Intel® nGraph Library ``/doc`` repo, please submit a PR with
any changes or ideas you'd like integrated. This helps us maintain trackability any changes or ideas you'd like integrated. This helps us maintain trackability
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment