Compiler passes section collab (#2533)

* Cleaner API doc reference for compile call * Add a useful table for nGraph namespaces * Remove layout namespace * Show exploding kernel problem on illustration like IEEE preso * WIP branch for new documentation restructuring that is a huge pain * Fix the doc reorg mess * Fix underline * List of passes disclaimer note * Update disclaimers on README * More cleanup of doc reorg * Update core docs * Update overview on core * Add PR feedback * Get rid of all the gazillion of doc build errors from rearranging stuff * Add section on tutorials * Update branch * Cleanup intro * Add better detail to overview * Revise buildlb instructions and add better title for contributing to doc * Note about unit tests * Editing * Update core overview namespace table and fix more broken links due to ToC changes * Add doc on pass manager register and run passes code from unit tests * Add doc on pass manager register and run passes code from unit tests * Make the compiler passes section more awesome * Consistent sentence case on all ToC headings * Update for gold docs * Add better detail about execution interface * Minor edits * Revert strange change * Update with bucketed list of passes * Fix build error

Compiler passes section collab (#2533)
* Cleaner API doc reference for compile call * Add a useful table for nGraph namespaces * Remove layout namespace * Show exploding kernel problem on illustration like IEEE preso * WIP branch for new documentation restructuring that is a huge pain * Fix the doc reorg mess * Fix underline * List of passes disclaimer note * Update disclaimers on README * More cleanup of doc reorg * Update core docs * Update overview on core * Add PR feedback * Get rid of all the gazillion of doc build errors from rearranging stuff * Add section on tutorials * Update branch * Cleanup intro * Add better detail to overview * Revise buildlb instructions and add better title for contributing to doc * Note about unit tests * Editing * Update core overview namespace table and fix more broken links due to ToC changes * Add doc on pass manager register and run passes code from unit tests * Add doc on pass manager register and run passes code from unit tests * Make the compiler passes section more awesome * Consistent sentence case on all ToC headings * Update for gold docs * Add better detail about execution interface * Minor edits * Revert strange change * Update with bucketed list of passes * Fix build error
a8b789fc · Leona C · Sang Ik Lee · 6d2f182b · a8b789fc · a8b789fc
Commit a8b789fc authored Mar 15, 2019 by Leona C Committed by Sang Ik Lee Mar 15, 2019
12 changed files
--- a/doc/sphinx/source/backend-support/cpp-api.rst
+++ b/doc/sphinx/source/backend-support/cpp-api.rst
@@ -27,6 +27,26 @@ How to use?
 #. A single iteration of the executable is executed by calling the ``call``
   method on the ``Executable`` object.

+.. figure:: ../graphics/execution-interface.png
+   :width: 650px
+
+   The execution interface for nGraph 
+
+
+The nGraph execution API for ``Executable`` objects is a simple, five-method 
+interface; each backend implements the following five functions:
+
+
+* The ``create_tensor()`` method allows the bridge to create tensor objects 
+  in host memory or an accelerator's memory.
+* The ``write()`` and ``read()`` methods are used to transfer raw data into 
+  and out of tensors that reside in off-host memory.
+* The ``compile()`` method instructs the backend to prepare an nGraph function 
+  for later execution.
+* And, finally, the ``call()`` method is used to invoke an nGraph function 
+  against a particular set of tensors.
+
+

 .. _backend-api:


--- a/doc/sphinx/source/buildlb.rst
+++ b/doc/sphinx/source/buildlb.rst
 .. buildlb.rst:

-######################
-Build the C++ Library 
-######################
+###########################
+nGraph Library for backends 
+###########################
+
+This section details how to build the C++ version of the nGraph Library, which 
+is targeted toward developers working on kernel-specific operations, 
+optimizations, or on deep learning solutions that leverage custom backends. 

 * :ref:`ubuntu`
 * :ref:`centos`
@@ -132,7 +136,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS) or on Ubuntu
   
   .. code-block:: console

-      $ cmake .. [-DNGRAPH_USE_PREBUILT_LLVM=TRUE] [-DNGRAPH_TARGET_ARCH=skylake-avx512]   
+      $ cmake .. [-DNGRAPH_USE_PREBUILT_LLVM=OFF] [-DNGRAPH_TARGET_ARCH=skylake-avx512]   

 #. Run ``$ make`` and ``make install`` to install ``libngraph.so`` and the 
   header files to ``~/ngraph_dist``:
@@ -223,10 +227,16 @@ according to those conventions. These scripts require the command
 Testing the build 
 =================

-We use the `googletest framework`_ from Google for unit tests. The 
-``NGRAPH_UNIT_TEST_ENABLE`` build flag is enabled by default when building
-with cmake, so to perform unit tests, simply enter the build directory and 
-run ``make check``:
+We use the `googletest framework`_ from Google for unit tests. The ``cmake`` 
+command automatically downloaded a copy of the needed ``gtest`` files when 
+it configured the build directory.
+
+To perform unit tests on the install:
+
+#. Create and configure the build directory as described in our 
+   :doc:`buildlb` guide.
+
+#. Enter the build directory and run ``make check``:
   
   .. code-block:: console

@@ -238,8 +248,8 @@ Adding framework support
 ========================

 After building and installing nGraph on your system, there are two likely 
-paths for what you'll want to do next: either compile a framework to run a 
-DL model, or load an import of an "already-trained" model for inference 
+paths for what you'll want to do next: either compile a framework to run a DL 
+training model, or load an import of an "already-trained" model for inference 
 on an Intel nGraph-enabled backend.

 For the former case, this early |version|, :doc:`frameworks/index`, 

--- a/doc/sphinx/source/core/constructing-graphs/index.rst
+++ b/doc/sphinx/source/core/constructing-graphs/index.rst
 .. howto/index: 

-Constructing Graphs
+Constructing graphs
 ===================

 .. toctree::

--- a/doc/sphinx/source/core/overview.rst
+++ b/doc/sphinx/source/core/overview.rst
@@ -62,8 +62,8 @@ descriptions:
   :escape: ~

   ``ngraph``, The Intel nGraph C++ API, `ngraph`_, Implicit namespace omitted from most API documentation
-   ``builder``, "Convenience functions that create additional graph nodes to implement commonly-used recipes; for example, auto-broadcast", `builder`_, " "
-   ``descriptor``, Descriptors are compile-time representations of objects that will appear at run-time, `descriptor`_, " "
+   ``builder``, "Convenience functions that create additional graph nodes to implement commonly-used recipes; for example, auto-broadcast", `builder`_, Coming Soon
+   ``descriptor``, Descriptors are compile-time representations of objects that will appear at run-time, `descriptor`_, Coming Soon
   ``op``, Ops used in graph construction, `op`_, :doc:`../ops/index`
   ``runtime``, The objects and methods used for executing the graph, `runtime`_, :doc:`../backend-support/cpp-api`


--- a/doc/sphinx/source/core/passes/list-of-passes.rst
+++ b/doc/sphinx/source/core/passes/list-of-passes.rst
 .. core/passes/list-of-passes:

 List of passes
-==============
+##############
+
+The kinds of compiler passes available can be broken down into different buckets:
+
+Graph Optimization Passes
+=========================

 .. csv-table::
-   :header: "Pass Name", "More Detail"
+   :header: "Graph Optimization Passes", "More Detail"
   :widths: 29, 31
   :escape: ~

   ``AlgebraicSimplification``, :ref:`algebraic_simpl`
-   ``AssignLayout``, Coming Soon
-   ``CallGraphPass``, Coming Soon
-   ``CommonFunctionCollection``, Coming Soon
   ``CommonSubexpressionElimination``, :ref:`common_subex_elim`
   ``ConstantFolding``, :ref:`constant_fold`
-   ``CoreFusion``, Coming Soon
-   ``DumpSorted``, Coming Soon
-   ``FunctionPass``, Coming Soon
-   ``GetOutputElementElimination``, Coming Soon
-   ``GraphRewrite``, Coming Soon
-   ``LikeReplacement``, Coming Soon
-   ``Liveness``, Coming Soon
-   ``Manager``, Coming Soon
-   ``ManagerState``, Coming Soon
-   ``MemoryLayout``, Coming Soon
-   ``MemoryManager``, Coming Soon
-   ``MemoryVisualize``, Coming Soon
-   ``ModulePass``, Coming Soon
-   ``NodePass``, Coming Soon
-   ``NopElimination``, Coming Soon
-   ``PassBase``, Coming Soon
-   ``PassConfig``, Coming Soon
-   ``PrefixReshapeElimination``, Coming Soon
-   ``PropagateCacheability``, Coming Soon
-   ``RecurrentGraphRewrite``, Coming Soon
+   ``CoreFusion``, :ref:`core_fusion`
   ``ReshapeElimination``, :ref:`reshape_transpose_elim`
   ``ReshapeSinking``, :ref:`reshape_transpose_sink`
-   ``Serialization``, Coming Soon
-   ``ValidateGraph``, Coming Soon
-   ``VisualizeTree``, Coming Soon
-   ``ZeroDimTensorElimination``, Coming soon 
+
+
+Node Optimization Passes
+========================
+
+.. csv-table::
+   :header: "Node Optimization Passes", "More Detail"
+   :widths: 29, 31
+   :escape: ~
+
+   ``NopElimination``, ""
+   ``ZeroDimTensorElimination``, ""
+
+
+Memory Assignment Passes
+========================
+
+.. csv-table::
+   :header: "Memory Assignment Passes", "More Detail"
+   :widths: 29, 31
+   :escape: ~
+
+   ``AssignLayout``, ""
+   ``Liveness``, ""
+   ``MemoryLayout``, ""
+   ``PropagateCacheability``, ""
+
+
+Codegen Passes
+==============
+
+.. csv-table::
+   :header: "Codegen Passes", "More Detail"
+   :widths: 29, 31
+   :escape: ~
+
+   ``CommonFunctionCollection``, ""
+
+
+Debug Passes
+============
+
+.. csv-table::
+   :header: "Debug Passes", "More Detail"
+   :widths: 29, 31
+   :escape: ~
+
+   ``DumpSorted``, ""
+   ``MemoryVisualize``, ""
+   ``Serialization``, ""
+   ``VisualizeTree``, ""
+
+
+Maintenance Passes
+==================
+
+.. csv-table::
+   :header: "Maintenance Passes", "More Detail"
+   :widths: 29, 31
+   :escape: ~
+
+   ``GetOutputElementElimination``, ""
+   ``LikeReplacement``, ""
+   ``ValidateGraph``, ""
+
+


 .. important:: All of the above passes are currently implementable; more 

--- a/doc/sphinx/source/core/passes/passes.rst
+++ b/doc/sphinx/source/core/passes/passes.rst
@@ -12,20 +12,32 @@ Compiler passes



-Overview: Generic graph optimization passes
-------------------------------------------
+Overview
+--------
+
+*Generic graph optimization passes*
+
+This section discusses how to use nGraph to create a Pass Manager for your
+backend, and provides both a simple and a complex example to follow. 

 The pass manager infrastructure in nGraph makes it easy to reuse and mix the 
 generic optimization passes. It also permits you to roll your own device-specific 
 optimizations; that is, the same unified interface and APIs may be used to 
 cover both things.

-Invoking these passes is fairly straightforward:  
+Invoking these passes is fairly straightforward, illustrated by the following 
+steps and the code below.   
+
+#. Create a "pass manager" object (line 1)
+#. Populate it with the desired pass or passes (lines 2-4)
+#. Invoke the pass manager with a pointer to your unoptimized graph, and 
+   it will return a pointer to an optimized graph (lines 5-6)
+

-#. Create a "pass manager" object. 
-#. Populate it with the desired pass(es). 
-#. Invoke the pass manager with a pointer to your unoptimized graph, and it’ll return a pointer 
-   to an optimized graph.
+.. literalinclude:: ../../../../../test/cpu_fusion.cpp
+   :language: cpp
+   :lines: 2085-2092
+   :linenos: 

 nGraph Core includes a large library of hardware-agnostic passes useful 
 for almost any kind of hardware backend. Some of these passes are likely familiar 
@@ -33,29 +45,67 @@ to people who are comfortable with classical compiler designs. Others, like the
 reshape/transpose elimination and sinking passes, are quite specific to deep 
 learning.

-Example of Passes
+
+A simple example
+----------------
+
+Here's a fairly straightforward function graph: it has 4 ops: 
+:doc:`../../ops/convolution`, :doc:`../../ops/broadcast`, :doc:`../../ops/add`, 
+and :doc:`../../ops/relu`. With nGraph, backends have the ability to rewrite the 
+graph in ways that are specific to the underlying device/hardware's capabilities. 
+
+When, for example, the device is an Intel® Architecture :abbr:`IA (Intel® Architecture)` 
+CPU, it can support a fused ``ConvolutionBiasReLU`` kernel. The backend is able 
+to rewrite the graph into its own custom ops that more closely match the 
+hardware-specific primitives; here they get matched via Intel® MKL-DNN. 
+
+.. _figure-simple-compiler:
+
+.. figure:: ../../graphics/simple-compiler-passes.png
+   :width: 750px
+   :alt: Simple kernel fusion
+
+   Figure A: On the left side of *Figure A* is a fully-formed function 
+   graph prior to fusion. After graph rewrite, the CPU implements a number of
+   custom fusions.
+
+
+A complex example
 -----------------

 The effectiveness of graph-level optimization with nGraph is more striking to look 
-at in terms of an actual input graph, such as one from the framework bridge.
+at in terms of an actual input graph, such as one from the framework bridge. Here 
+is slightly more complicated example drawn from a topology called MobileNet which 
+makes heavy use of group convolution. 
+
+In group convolution, sometimes called depthwise convolution, a batch's different 
+feature channels get divided into groups that are processed independently, rather 
+than every convolution kernel seeing all of the input feature channels.
+
+With "Group Convolution Fusion", it is possible to optimize a subgraph that has
+implemented group convolution by many instances of "ordinary" convolution.
+
+*Figure B* shows an excerpt from ``MobileNet v1``, a topology which makes heavy 
+use of group convolution. Here, an image batch and a filter batch first undergo 
+a  "preprocessing" phase where segments along the channel axis are sliced out: 
+one per channel group. Next, there are separate convolutions on each channel 
+group before finally concatenating the result back together.

-*Figure A* shows an excerpt from ``MobileNet v1``, a topology which makes heavy 
-use of group convolution.

 .. _figure-mobilenet-gc:

 .. figure:: ../../graphics/mobilenet-group-conv.png
   :width: 700px
-   :alt: 
+   :alt: MobileNet example

-   Figure A: Each of these grouped convolution complexes -- the 
+   Figure B: Each of these grouped convolution complexes -- the 
   operations within the rectangles on the left -- is very wide; each is too 
-   wide to fit legibly on the illustration.
+   wide to fit legibly on the illustration.  

 The group convolution fusion is able to replace each of those giant subgraphs 
-with a single CPU group convolution node. This ends up being a win in several 
-ways: 
+with a single CPU group convolution node. This ends up being beneficial in 
+several ways: 

-* sheer node count, 
-* mappability to MKL-DNN (which has an accelerated group convolution implementation), 
-* elimination of unnecessary temporaries, and so on.
\ No newline at end of file
+* Reduces sheer node count, 
+* Provides mappability to MKL-DNN, which has an accelerated group convolution implementation, and 
+* Eliminates unnecessary temporary nodes.
\ No newline at end of file
--- a/doc/sphinx/source/distr/index.rst
+++ b/doc/sphinx/source/distr/index.rst
 .. distr/index.rst: 

-##############################
-Distributed Training in nGraph
-##############################
+################################
+Distributed training with nGraph
+################################


-.. important:: Distributed training is not officially supported in version |version|;
-   however, some configuration options have worked for nGraph devices with mixed or 
-   limited success in testing environments.
+.. important:: Distributed training is not officially supported in version 
+   |version|; however, some configuration options have worked for nGraph devices 
+   with mixed or limited success in testing environments.


 Why distributed training?
@@ -47,7 +47,8 @@ distributed training. Deployments using nGraph Library with supported backends
 can be configured to train with data parallelism and will soon work with model 
 parallelism. Distributing workloads is increasingly important, as more data and 
 bigger models mean the ability to :doc:`../core/constructing-graphs/distribute-train` 
-work with larger and larger datasets, or to work with models having many layers that aren't designed to fit to a single device.  
+work with larger and larger datasets, or to work with models having many layers 
+that aren't designed to fit to a single device.  

 Distributed training with data parallelism splits the data and each worker 
 node has the same model; during each iteration, the gradients are aggregated 

--- a/doc/sphinx/source/glossary.rst
+++ b/doc/sphinx/source/glossary.rst
@@ -32,7 +32,7 @@ Glossary

   function graph

-      The Intel nGraph Library uses a function graph to represent an
+      The nGraph Library uses a function graph to represent an
      ``op``'s parameters and results.

   fusion

--- a/doc/sphinx/source/graphics/execution-interface.png
+++ b/doc/sphinx/source/graphics/execution-interface.png
--- a/doc/sphinx/source/graphics/simple-compiler-passes.png
+++ b/doc/sphinx/source/graphics/simple-compiler-passes.png
--- a/doc/sphinx/source/index.rst
+++ b/doc/sphinx/source/index.rst
@@ -51,7 +51,7 @@ nGraph Compiler stack
   
 .. toctree::
   :maxdepth: 1
-   :caption: Backend support
+   :caption: Backend Support

   backend-support/index.rst
   backend-support/cpp-api.rst
@@ -66,7 +66,7 @@ nGraph Compiler stack

 .. toctree::
   :maxdepth: 1
-   :caption: Diagnostics and visualization
+   :caption: Diagnostics and Visualization

   diagnostics/nbench.rst
   diagnostics/performance-profile.rst
@@ -86,10 +86,10 @@ nGraph Compiler stack

   project/release-notes.rst
   project/contribution-guide.rst
+   project/governance.rst
+   project/doc-contributor-README.rst
   project/index.rst 
   glossary.rst
-   project/doc-contributor-README.rst
-

 Indices and tables
 ==================

--- a/doc/sphinx/source/project/doc-contributor-README.rst
+++ b/doc/sphinx/source/project/doc-contributor-README.rst
@@ -15,11 +15,10 @@
 .. limitations under the License.
 .. ---------------------------------------------------------------------------

-Contributing Documentation
-==========================
+Contributing to documentation
+=============================

-Read this for changes affecting anything in ``ngraph/doc``
----------------------------------------------------------
+.. important:: Read this for changes affecting **anything** in ``ngraph/doc``

 For updates to the Intel® nGraph Library ``/doc`` repo, please submit a PR with 
 any changes or ideas you'd like integrated. This helps us maintain trackability