Leona/doc cleanup 2 (#946)

* doc updates * test add section on transformers to graph basics * Fix typo on abs * Adding more background and detail for graph-building concepts unique to nGraph * First pass at updating nGraph basics for StackOverflow kinds of questions * Forgot to add a file * Update for new naming and capitalization conventions * add edits from first PR review * More updates from PR review

Leona/doc cleanup 2 (#946)
* doc updates * test add section on transformers to graph basics * Fix typo on abs * Adding more background and detail for graph-building concepts unique to nGraph * First pass at updating nGraph basics for StackOverflow kinds of questions * Forgot to add a file * Update for new naming and capitalization conventions * add edits from first PR review * More updates from PR review
22ea1f95 · L.S. Cook · Scott Cyphers · e30b3c61 · 22ea1f95 · 22ea1f95
Commit 22ea1f95 authored May 08, 2018 by L.S. Cook Committed by Scott Cyphers May 08, 2018
7 changed files
--- a/doc/sphinx/source/framework-integration-guides.rst
+++ b/doc/sphinx/source/framework-integration-guides.rst
@@ -4,77 +4,9 @@
 Integrate Supported Frameworks
 ###############################
-* :ref:`neon_intg`
 * :ref:`mxnet_intg`
 * :ref:`tensorflow_intg`
+* :ref:`neon_intg`
-.. _neon_intg:
-neon |trade|
-============
-Use ``neon`` as a frontend for nGraph backends
-----------------------------------------------
-``neon`` is an open source Deep Learning framework that has a history 
-of `being the fastest`_ framework `for training CNN-based models with GPUs`_. 
-Detailed info about neon's features and functionality can be found in the 
-`neon docs`_. This section covers installing neon on an existing 
-system that already has an ``ngraph_dist`` installed. 
-.. important:: The numbered instructions below pick up from where 
-   the :doc:`install` instructions left off, and they presume that your system 
-   already has the ngraph library installed installed at ``$HOME/ngraph_dist`` 
-   as the default location. If the |nGl| code has not yet been installed to 
-   your system, you can follow the instructions on the `ngraph-neon python README`_ 
-   to install everything at once.  
-#. Set the ``NGRAPH_CPP_BUILD_PATH`` and the ``LD_LIBRARY_PATH`` path to the 
-   location where you built the nGraph libraries. (This example shows the default 
-   location):
-   .. code-block:: bash
-      export NGRAPH_CPP_BUILD_PATH=$HOME/ngraph_dist/
-      export LD_LIBRARY_PATH=$HOME/ngraph_dist/lib/       
-#. The neon framework uses the :command:`pip` package manager during installation; 
-   install it with Python version 3.5 or higher:
-   .. code-block:: console
-      $ sudo apt-get install python3-pip python3-venv
-      $ python3 -m venv frameworks
-      $ cd frameworks 
-      $ . bin/activate
-      (frameworks) ~/frameworks$ 
-#. Go to the "python" subdirectory of the ``ngraph`` repo we cloned during the 
-   previous :doc:`install`, and complete these actions: 
-   .. code-block:: console
-      (frameworks)$ cd /opt/libraries/ngraph/python
-      (frameworks)$ git clone --recursive -b allow-nonconstructible-holders https://github.com/jagerman/pybind11.git
-      (frameworks)$ export PYBIND_HEADERS_PATH=/opt/libraries/ngraph/python/pybind11
-      (frameworks)$ pip install -U . 
-#. Finally we're ready to install the `neon` integration: 
-   .. code-block:: console
-      (frameworks)$ git clone git@github.com:NervanaSystems/ngraph-neon
-      (frameworks)$ cd ngraph-neon
-      (frameworks)$ make install
-#. To test a training example, you can run the following from ``ngraph-neon/examples/cifar10``
-   .. code-block:: console
-      (frameworks)$ python cifar10_conv.py
 .. _mxnet_intg:
@@ -172,6 +104,77 @@ See the `ngraph tensorflow bridge README`_ for how to install the
 nGraph-TensorFlow bridge.
+.. _neon_intg:
+neon |trade|
+============
+Use ``neon`` as a frontend for nGraph backends
+-----------------------------------------------
+``neon`` is an open source Deep Learning framework that has a history 
+of `being the fastest`_ framework `for training CNN-based models with GPUs`_. 
+Detailed info about neon's features and functionality can be found in the 
+`neon docs`_. This section covers installing neon on an existing 
+system that already has an ``ngraph_dist`` installed. 
+.. important:: The numbered instructions below pick up from where 
+   the :doc:`install` instructions left off, and they presume that your system 
+   already has the ngraph library installed installed at ``$HOME/ngraph_dist`` 
+   as the default location. If the |nGl| code has not yet been installed to 
+   your system, you can follow the instructions on the `ngraph-neon python README`_ 
+   to install everything at once.  
+#. Set the ``NGRAPH_CPP_BUILD_PATH`` and the ``LD_LIBRARY_PATH`` path to the 
+   location where you built the nGraph libraries. (This example shows the default 
+   location):
+   .. code-block:: bash
+      export NGRAPH_CPP_BUILD_PATH=$HOME/ngraph_dist/
+      export LD_LIBRARY_PATH=$HOME/ngraph_dist/lib/       
+#. The neon framework uses the :command:`pip` package manager during installation; 
+   install it with Python version 3.5 or higher:
+   .. code-block:: console
+      $ sudo apt-get install python3-pip python3-venv
+      $ python3 -m venv frameworks
+      $ cd frameworks 
+      $ . bin/activate
+      (frameworks) ~/frameworks$ 
+#. Go to the "python" subdirectory of the ``ngraph`` repo we cloned during the 
+   previous :doc:`install`, and complete these actions: 
+   .. code-block:: console
+      (frameworks)$ cd /opt/libraries/ngraph/python
+      (frameworks)$ git clone --recursive -b allow-nonconstructible-holders https://github.com/jagerman/pybind11.git
+      (frameworks)$ export PYBIND_HEADERS_PATH=/opt/libraries/ngraph/python/pybind11
+      (frameworks)$ pip install -U . 
+#. Finally we're ready to install the `neon` integration: 
+   .. code-block:: console
+      (frameworks)$ git clone git@github.com:NervanaSystems/ngraph-neon
+      (frameworks)$ cd ngraph-neon
+      (frameworks)$ make install
+#. To test a training example, you can run the following from ``ngraph-neon/examples/cifar10``
+   .. code-block:: console
+      (frameworks)$ python cifar10_conv.py
 .. _MXNet: http://mxnet.incubator.apache.org
 .. _DSO: http://csweb.cs.wfu.edu/%7Etorgerse/Kokua/More_SGI/007-2360-010/sgi_html/ch03.html
 .. _ngraph-neon python README: https://github.com/NervanaSystems/ngraph/blob/master/python/README.md

--- a/doc/sphinx/source/glossary.rst
+++ b/doc/sphinx/source/glossary.rst
@@ -17,6 +17,12 @@ Glossary
      A component of nGraph that acts as a backend for a framework,
      allowing the framework to define and execute computations.
+   data-flow graph
+      Data-flow graphs are used to implement deep learning models. In  
+      a data-flow graph, nodes represent operations on data and edges 
+      represent data flowing between those operations. 
   framework
      A machine learning environment, such as TensorFlow, MXNet, or

--- a/doc/sphinx/source/graph-basics.rst
+++ b/doc/sphinx/source/graph-basics.rst
 .. graph-basics:
+#############
 Graph Basics
-============
+#############
-This section describes the basic concepts you need to know when 
+Overview
-constructing a graph.
+========
+This section provides a brief overview of some concepts used in the nGraph 
-Framework Bridges
+Library. It also introduces new ideas regarding our unique departure from the 
------------------
+first generation of deep learning software design. 
-Frontends (or users who require the flexibility of constructing 
+The historical dominance of GPUs at the beginning of the current 
-Ops directly) can utilize a set of graph construction functions 
+:abbr:`DL (Deep Learning)` boom means that many framework authors made 
-to construct graphs. 
+GPU-specific design decisions at a very deep level. Those assumptions created 
+an "ecosystem" of frameworks that all behave essentially the same at the
-A framework bridge constructs a function which is compiled/optimized
+framework's hardware abstraction layer: 
-by a sequence of graph transformations that replace subgraphs of the
-computation with more optimal subgraphs. Throughout this process, ops
+* The framework expects to own memory allocation. 
-represent tensor operations.
+* The framework expects the execution device to be a GPU. 
+* The framework expects complete control of the GPU, and that the device doesn't 
+  need to be shared. 
+* The framework expects that developers will write things in a `SIMT-friendly`_ 
+  manner, thus requring only a limited set of data layout conventions.    
+Some of these design decisions have implications that do not translate well to 
+the newer or more demanding generation of **adaptable software**. For example, 
+most frameworks that expect full control of the GPU devices experience their 
+own per-device inefficiency for resource utilization whenever the system 
+encounters a bottleneck. 
+Most framework owners will tell you to refactor the model in order to remove the 
+unimplemented copy, rather than attempt to run multiple models in parallel, or 
+attempt to figure out how to build graphs more efficiently. In other words, if 
+a model requires any operation that hasn't been implemented on GPU, it must wait 
+for copies to propagate from the CPU to the GPU(s). An effect of this 
+inefficiency is that it slows down the system. Data scientists who are facing a 
+large curve of uncertainty in how large (or how small) the compute-power needs 
+of their model will be, investing heavily in frameworks reliant upon GPUs may 
+not be the best decision.  
+Meanwhile, the shift toward greater diversity in deep learning **hardware devices** 
+requires that these assumptions be revisited. Incorporating direct support for 
+all of the different hardware targets out there, each of which has its own 
+preferences when it comes to the above factors, is a very heavy burden 
+on framework owners.
+Adding the nGraph compiler to the system lightens that burden by raising the 
+abstraction level, and by letting any hardware-specific backends make these 
+decisions automatically. The nGraph Compiler is designed to be able to take into 
+account the needs of each target hardware platform, and to achieve maximum 
+performance.
+This makes things easier on framework owners, but also (as new models are developed) 
+on data scientists, who will not have to keep in mind nearly as many low-level 
+hardware details when architecting their models with layers of complexity for 
+anything other than a :abbr:`Just-in-Time (JIT)` compilation.     
+While the first generation frameworks tended to need to make a tradeoff between 
+being "specialized" and "adaptable" (the trade-off between training and inference), 
+nGraph Library permits algorithms implemented in a DNN to be both specialized 
+and adaptable. The new generation of software design in and around AI ecosystems 
+can and should be much more flexible.   
+* :ref:`framework_bridges`
+* :ref:`about_transformers`
+* :ref:`graph_shaping`
+.. _framework_bridges:
+Framework bridges
+=================
+In the nGraph ecosystem, a framework is what the data scientist uses to solve 
+a specific (and usually large-scale) deep learning computational problem with 
+the use of a high-level, data science-oriented language. 
+A framework :term:`bridge` is a software layer (typically a plugin *for* or an 
+extension *to* a framework) that translates the data science-oriented language 
+into a compute-oriented language called a :term:`data-flow graph`. The bridge 
+can then present the problem to the nGraph :abbr:`Abstraction Layer (AL)` which 
+is responsible for execution on an optimized backend by performing graph 
+transformations that replace subgraphs of the computation with more optimal 
+(in terms of machine code) subgraphs. Throughout this process, ``ops`` represent 
+tensor operations. 
+Either the framework can provide its own graph of functions to be compiled and 
+optimized via :abbr:`Ahead-of-Time (AoT)` compilation to send back to the 
+framework, or an entity (framework or user) who requires the flexibility of 
+shaping ops directly can use our graph construction functions to experiment with 
+building runtime APIs for their framework, thus exposing more flexible multi-theaded compute 
+power options to 
+See the section on :doc:`howto/execute` for a detailed walk-through describing 
+how this translation can be programmed to happen automatically via a framework. 
+.. _about_transformers:
+Transformer ops
+================
+A framework bridge may define its own bridge-specific ops, as long as they can be 
+converted to transformer ops. This is usually achieved by them first being 
+converted to core ops on the way. For example, if a framework has a 
+``PaddedCell`` op, nGraph pattern replacement facilities can be used to convert 
+it into one of our core ops.  More detail on transformer ops will be coming soon.  
+.. _graph_shaping:
+Graph shaping
+=============
 Tensors
 -------
@@ -68,9 +164,9 @@ and results in a tensor with the same element type and shape:
  (A+B)_I = A_I + B_I
 Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor 
-:math:`X`. So the value of sum of two tensors is a tensor whose value at a 
+:math:`X`. So the value of the sum of two tensors is a tensor whose value at a 
-coordinate is the sum of the elements are that coordinate for the two inputs. 
+coordinate is the sum of the elements' two inputs. Unlike many frameworks, it 
-Unlike many frameworks, it says nothing about storage or arrays.
+says nothing about storage or arrays.
 An ``Add`` op is used to represent an elementwise tensor sum. To
 construct an Add op, each of the two inputs of the ``Add`` must be
@@ -117,8 +213,12 @@ corresponding to the array provided as the nth argument, and the outputs
 of all result ops will be written into the result arrays in row-major
 order.
 An Example
----------
+==========
 ::
@@ -142,6 +242,7 @@ An Example
       auto f = std::make_shared<Function>(Nodes{t1}, Parameters{a, b, c});
   }
 We use shared pointers for all ops. For each parameter, we need to
 element type and shape attributes. When the function is called, each
 argument must conform to the corresponding parameter element type and
@@ -164,3 +265,5 @@ After the graph is constructed, we create the function, passing the
 `Function` constructor the nodes that are results and the parameters
 that are arguments.
+.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
\ No newline at end of file
--- a/doc/sphinx/source/install.rst
+++ b/doc/sphinx/source/install.rst
@@ -4,6 +4,10 @@
 Install 
 ########
+* :ref:`ubuntu`
+* :ref:`centos`
 Build Environments
 ==================
@@ -20,10 +24,10 @@ with the following packages and prerequisites:
   Clear Linux\* OS for Intel Architecture, Clang 5.0.1, CMake 3.10.2, experimental, bundles ``machine-learning-basic dev-utils python3-basic python-basic-dev``
 Other configurations may work, but should be considered experimental with
-limited support. On Ubuntu 16.04 with ``gcc-5.4.0`` or ``clang-3.9``, for 
+limited support. On Ubuntu 16.04 with gcc-5.4.0 or clang-3.9, for example, we 
-example, we recommend adding ``-DNGRAPH_USE_PREBUILT_LLVM=TRUE`` to the 
+recommend adding ``-DNGRAPH_USE_PREBUILT_LLVM=TRUE`` to the cmake command in 
-:command:`cmake` command in step 4 below. This fetches a pre-built tarball 
+step 4 below. This fetches a pre-built tarball of LLVM+Clang from llvm.org, 
-of LLVM+Clang from `llvm.org`_, and will substantially reduce build time.
+and it will substantially reduce build time.
 If using ``gcc`` version 4.8, it may be necessary to add symlinks from ``gcc`` 
 to ``gcc-4.8``, and from ``g++`` to ``g++-4.8``, in your :envvar:`PATH`, even 
@@ -40,13 +44,10 @@ The CMake procedure installs ``ngraph_dist`` to the installing user's ``$HOME``
 directory as the default location. See the :file:`CMakeLists.txt` file for 
 details about how to change or customize the install location.
-The instructions below also presume cloning the nGraph source via an SSH-enabled 
+.. _ubuntu:
-Github account. If you don't have SSH keys set up on your GitHub account, you can 
-still follow the instructions below and clone via HTTPS.
+Ubuntu 16.04
-Ubuntu
+-------------
------
 The process documented here will work on Ubuntu\* 16.04 (LTS)
@@ -77,7 +78,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
      $ mkdir build && cd build
-#. Generate the GNUMakefiles in the customary manner (from within the 
+#. Generate the GNU Makefiles in the customary manner (from within the 
   ``build`` directory). If running ``gcc-5.4.0`` or ``clang-3.9``, remember 
   that you can also append ``cmake`` with the prebuilt LLVM option to 
   speed-up the build. Another option if your deployment system has Intel®
@@ -87,7 +88,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
   .. code-block:: console
-      $ cmake ../ [-DNGRAPH_USE_PREBUILT_LLVM=TRUE]
+      $ cmake ../ [-DNGRAPH_USE_PREBUILT_LLVM=TRUE] [-DNGRAPH_TARGET_ARCH=skylake-avx512]
 #. Run ``$ make`` and ``make install`` to install ``libngraph.so`` and the 
   header files to ``$HOME/ngraph_dist``:
@@ -100,11 +101,15 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
 #. (Optional, requires `doxygen`_, `Sphinx`_, and `breathe`_). Run ``make html`` 
   inside the ``doc/sphinx`` directory of the cloned source to build a copy of 
   the `website docs`_ locally. The low-level API docs with inheritance and 
-   collaboration diagrams can be found inside the ``/docs/doxygen/`` directory.    
+   collaboration diagrams can be found inside the ``/docs/doxygen/`` directory. 
+   See the :doc:`project/doc-contributor-README` for more details about how to 
+   build documentation for nGraph. 
+.. _centos: 
-CentOS
+CentOS 7.4
------
+-----------
 The process documented here will work on CentOS 7.4.
@@ -138,23 +143,26 @@ The process documented here will work on CentOS 7.4.
      $ ./bootstrap
      $ make && sudo make install  
-#. Clone the `NervanaSystems` ``ngraph`` repo and use Cmake 3.4.3 to 
+#. Clone the `NervanaSystems` ``ngraph`` repo via SSH and use Cmake 3.4.3 to 
-   install the nGraph libraries to ``$HOME/ngraph_dist``.
+   install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your 
+   deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to 
+   target the accelerations available directly by compiling the build as follows 
+   during the cmake step: ``-DNGRAPH_TARGET_ARCH=skylake-avx512``.
   .. code-block:: console
      $ cd /opt/libraries 
      $ git clone https://github.com/NervanaSystems/ngraph.git
      $ cd ngraph && mkdir build && cd build
-      $ cmake ../
+      $ cmake ../ [-DNGRAPH_TARGET_ARCH=skylake-avx512]
      $ make && sudo make install 
 macOS\* development
 --------------------
-.. note:: Although we do not offer support for the macOS platform; some 
+.. note:: Although we do not currently offer full support for the macOS platform, 
-   configurations and features may work.
+   some configurations and features may work.
 The repository includes two scripts (``maint/check-code-format.sh`` and 
 ``maint/apply-code-format.sh``) that are used respectively to check adherence 
@@ -203,9 +211,10 @@ on an Intel nGraph-enabled backend.
 For the former case, this early |version|, :doc:`framework-integration-guides`, 
 can help you get started with a training a model on a supported framework. 
-* :doc:`neon<framework-integration-guides>` framework,  
 * :doc:`MXNet<framework-integration-guides>` framework,  
 * :doc:`TensorFlow<framework-integration-guides>` framework, and
+* :doc:`neon<framework-integration-guides>` framework,  
 For the latter case, if you've followed a tutorial from `ONNX`_, and you have an 
 exported, serialized model, you can skip the section on frameworks and go directly

--- a/doc/sphinx/source/ops/abs.rst
+++ b/doc/sphinx/source/ops/abs.rst
@@ -13,7 +13,7 @@ Description
 ===========
 Produces a single output tensor of the same element type and shape as ``arg``,
-where the value at each coordinate of ``output`` is the absoloute value of the
+where the value at each coordinate of ``output`` is the absolute value of the
 value at each ``arg`` coordinate.
 Inputs

--- a/doc/sphinx/source/ops/batch_norm.rst
+++ b/doc/sphinx/source/ops/batch_norm.rst
@@ -4,8 +4,8 @@
 BatchNorm
 #########
-NOTE: This describes what the BatchNorm op should look like. The current version
+NOTE: This describes what the ``BatchNorm`` op should look like. The current 
-will be made a CPU transformer op.
+version will be made a CPU transformer op.
 .. code-block:: cpp
@@ -20,27 +20,27 @@ Produces a normalized output.
 Inputs
 ------
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| Name                | Element Type            | Shape                          |
+| Name                | Element Type            | Shape                       |
-+=====================+=========================+================================+
+=====================+=========================+=============================+
-| ``input``           | same as ``gamma``       | \(..., C, ...\)                |
+| ``input``           | same as ``gamma``       | \(..., C, ...\)             |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``gamma``           | any                     | \(C\)                          |
+| ``gamma``           | any                     | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``beta``            | same as ``gamma``       | \(C\)                          |
+| ``beta``            | same as ``gamma``       | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``global_mean``     | same as ``gamma``       | \(C\)                          |
+| ``global_mean``     | same as ``gamma``       | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``global_variance`` | same as ``gamma``       | \(C\)                          |
+| ``global_variance`` | same as ``gamma``       | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``use_global``      | ``bool``                | \(\)                           |
+| ``use_global``      | ``bool``                | \(\)                        |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
 Attributes
 ----------
-+-----------------+--------------------+----------------------+
+------------------+--------------------+---------------------+
 | Name             | Type               | Notes               |
 +==================+====================+=====================+
 | ``epsilon``      | same as ``input``  | Bias for variance   |
@@ -50,15 +50,15 @@ Attributes
 Outputs
 -------
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| Name                | Element Type            | Shape                          |
+| Name                | Element Type            | Shape                       |
-+=====================+=========================+================================+
+=====================+=========================+=============================+
-| ``normalized``      | same as ``gamma``       | same as ``input``              |
+| ``normalized``      | same as ``gamma``       | same as ``input``           |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``batch_mean``      | same as ``gamma``       | \(C\)                          |
+| ``batch_mean``      | same as ``gamma``       | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
-| ``batch_variance``  | same as ``gamma``       | \(C\)                          |
+| ``batch_variance``  | same as ``gamma``       | \(C\)                       |
-+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
 The ``batch_mean`` and ``batch_variance`` are computed per-channel from ``input``.
 The values only need to be computed if ``use_global`` is ``false`` or they are used.

--- a/doc/sphinx/source/ops/concat.rst
+++ b/doc/sphinx/source/ops/concat.rst
@@ -12,9 +12,8 @@ Concat
 Description
 ===========
-Produces a single output tensor of the same element type and shape as ``arg``,
+Produce from ``Nodes`` of ``args`` some outputs with the same attributes
-where the value at each coordinate of ``output`` is the absoloute value of the
-value at each ``arg`` coordinate.
 Inputs
 ------