Commit 22ea1f95 authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Leona/doc cleanup 2 (#946)

* doc updates

* test add section on transformers to graph basics

* Fix typo on abs

* Adding more background and detail for graph-building concepts unique to nGraph

* First pass at updating nGraph basics for StackOverflow kinds of questions

* Forgot to add a file

* Update for new naming and capitalization conventions

* add edits from first PR review

* More updates from PR review
parent e30b3c61
......@@ -4,77 +4,9 @@
Integrate Supported Frameworks
###############################
* :ref:`neon_intg`
* :ref:`mxnet_intg`
* :ref:`tensorflow_intg`
.. _neon_intg:
neon |trade|
============
Use ``neon`` as a frontend for nGraph backends
-----------------------------------------------
``neon`` is an open source Deep Learning framework that has a history
of `being the fastest`_ framework `for training CNN-based models with GPUs`_.
Detailed info about neon's features and functionality can be found in the
`neon docs`_. This section covers installing neon on an existing
system that already has an ``ngraph_dist`` installed.
.. important:: The numbered instructions below pick up from where
the :doc:`install` instructions left off, and they presume that your system
already has the ngraph library installed installed at ``$HOME/ngraph_dist``
as the default location. If the |nGl| code has not yet been installed to
your system, you can follow the instructions on the `ngraph-neon python README`_
to install everything at once.
#. Set the ``NGRAPH_CPP_BUILD_PATH`` and the ``LD_LIBRARY_PATH`` path to the
location where you built the nGraph libraries. (This example shows the default
location):
.. code-block:: bash
export NGRAPH_CPP_BUILD_PATH=$HOME/ngraph_dist/
export LD_LIBRARY_PATH=$HOME/ngraph_dist/lib/
#. The neon framework uses the :command:`pip` package manager during installation;
install it with Python version 3.5 or higher:
.. code-block:: console
$ sudo apt-get install python3-pip python3-venv
$ python3 -m venv frameworks
$ cd frameworks
$ . bin/activate
(frameworks) ~/frameworks$
#. Go to the "python" subdirectory of the ``ngraph`` repo we cloned during the
previous :doc:`install`, and complete these actions:
.. code-block:: console
(frameworks)$ cd /opt/libraries/ngraph/python
(frameworks)$ git clone --recursive -b allow-nonconstructible-holders https://github.com/jagerman/pybind11.git
(frameworks)$ export PYBIND_HEADERS_PATH=/opt/libraries/ngraph/python/pybind11
(frameworks)$ pip install -U .
#. Finally we're ready to install the `neon` integration:
.. code-block:: console
(frameworks)$ git clone git@github.com:NervanaSystems/ngraph-neon
(frameworks)$ cd ngraph-neon
(frameworks)$ make install
#. To test a training example, you can run the following from ``ngraph-neon/examples/cifar10``
.. code-block:: console
(frameworks)$ python cifar10_conv.py
* :ref:`neon_intg`
.. _mxnet_intg:
......@@ -172,6 +104,77 @@ See the `ngraph tensorflow bridge README`_ for how to install the
nGraph-TensorFlow bridge.
.. _neon_intg:
neon |trade|
============
Use ``neon`` as a frontend for nGraph backends
-----------------------------------------------
``neon`` is an open source Deep Learning framework that has a history
of `being the fastest`_ framework `for training CNN-based models with GPUs`_.
Detailed info about neon's features and functionality can be found in the
`neon docs`_. This section covers installing neon on an existing
system that already has an ``ngraph_dist`` installed.
.. important:: The numbered instructions below pick up from where
the :doc:`install` instructions left off, and they presume that your system
already has the ngraph library installed installed at ``$HOME/ngraph_dist``
as the default location. If the |nGl| code has not yet been installed to
your system, you can follow the instructions on the `ngraph-neon python README`_
to install everything at once.
#. Set the ``NGRAPH_CPP_BUILD_PATH`` and the ``LD_LIBRARY_PATH`` path to the
location where you built the nGraph libraries. (This example shows the default
location):
.. code-block:: bash
export NGRAPH_CPP_BUILD_PATH=$HOME/ngraph_dist/
export LD_LIBRARY_PATH=$HOME/ngraph_dist/lib/
#. The neon framework uses the :command:`pip` package manager during installation;
install it with Python version 3.5 or higher:
.. code-block:: console
$ sudo apt-get install python3-pip python3-venv
$ python3 -m venv frameworks
$ cd frameworks
$ . bin/activate
(frameworks) ~/frameworks$
#. Go to the "python" subdirectory of the ``ngraph`` repo we cloned during the
previous :doc:`install`, and complete these actions:
.. code-block:: console
(frameworks)$ cd /opt/libraries/ngraph/python
(frameworks)$ git clone --recursive -b allow-nonconstructible-holders https://github.com/jagerman/pybind11.git
(frameworks)$ export PYBIND_HEADERS_PATH=/opt/libraries/ngraph/python/pybind11
(frameworks)$ pip install -U .
#. Finally we're ready to install the `neon` integration:
.. code-block:: console
(frameworks)$ git clone git@github.com:NervanaSystems/ngraph-neon
(frameworks)$ cd ngraph-neon
(frameworks)$ make install
#. To test a training example, you can run the following from ``ngraph-neon/examples/cifar10``
.. code-block:: console
(frameworks)$ python cifar10_conv.py
.. _MXNet: http://mxnet.incubator.apache.org
.. _DSO: http://csweb.cs.wfu.edu/%7Etorgerse/Kokua/More_SGI/007-2360-010/sgi_html/ch03.html
.. _ngraph-neon python README: https://github.com/NervanaSystems/ngraph/blob/master/python/README.md
......
......@@ -17,6 +17,12 @@ Glossary
A component of nGraph that acts as a backend for a framework,
allowing the framework to define and execute computations.
data-flow graph
Data-flow graphs are used to implement deep learning models. In
a data-flow graph, nodes represent operations on data and edges
represent data flowing between those operations.
framework
A machine learning environment, such as TensorFlow, MXNet, or
......
.. graph-basics:
#############
Graph Basics
============
#############
Overview
========
This section provides a brief overview of some concepts used in the nGraph
Library. It also introduces new ideas regarding our unique departure from the
first generation of deep learning software design.
The historical dominance of GPUs at the beginning of the current
:abbr:`DL (Deep Learning)` boom means that many framework authors made
GPU-specific design decisions at a very deep level. Those assumptions created
an "ecosystem" of frameworks that all behave essentially the same at the
framework's hardware abstraction layer:
This section describes the basic concepts you need to know when
constructing a graph.
* The framework expects to own memory allocation.
* The framework expects the execution device to be a GPU.
* The framework expects complete control of the GPU, and that the device doesn't
need to be shared.
* The framework expects that developers will write things in a `SIMT-friendly`_
manner, thus requring only a limited set of data layout conventions.
Some of these design decisions have implications that do not translate well to
the newer or more demanding generation of **adaptable software**. For example,
most frameworks that expect full control of the GPU devices experience their
own per-device inefficiency for resource utilization whenever the system
encounters a bottleneck.
Framework Bridges
------------------
Most framework owners will tell you to refactor the model in order to remove the
unimplemented copy, rather than attempt to run multiple models in parallel, or
attempt to figure out how to build graphs more efficiently. In other words, if
a model requires any operation that hasn't been implemented on GPU, it must wait
for copies to propagate from the CPU to the GPU(s). An effect of this
inefficiency is that it slows down the system. Data scientists who are facing a
large curve of uncertainty in how large (or how small) the compute-power needs
of their model will be, investing heavily in frameworks reliant upon GPUs may
not be the best decision.
Frontends (or users who require the flexibility of constructing
Ops directly) can utilize a set of graph construction functions
to construct graphs.
Meanwhile, the shift toward greater diversity in deep learning **hardware devices**
requires that these assumptions be revisited. Incorporating direct support for
all of the different hardware targets out there, each of which has its own
preferences when it comes to the above factors, is a very heavy burden
on framework owners.
A framework bridge constructs a function which is compiled/optimized
by a sequence of graph transformations that replace subgraphs of the
computation with more optimal subgraphs. Throughout this process, ops
represent tensor operations.
Adding the nGraph compiler to the system lightens that burden by raising the
abstraction level, and by letting any hardware-specific backends make these
decisions automatically. The nGraph Compiler is designed to be able to take into
account the needs of each target hardware platform, and to achieve maximum
performance.
This makes things easier on framework owners, but also (as new models are developed)
on data scientists, who will not have to keep in mind nearly as many low-level
hardware details when architecting their models with layers of complexity for
anything other than a :abbr:`Just-in-Time (JIT)` compilation.
While the first generation frameworks tended to need to make a tradeoff between
being "specialized" and "adaptable" (the trade-off between training and inference),
nGraph Library permits algorithms implemented in a DNN to be both specialized
and adaptable. The new generation of software design in and around AI ecosystems
can and should be much more flexible.
* :ref:`framework_bridges`
* :ref:`about_transformers`
* :ref:`graph_shaping`
.. _framework_bridges:
Framework bridges
=================
In the nGraph ecosystem, a framework is what the data scientist uses to solve
a specific (and usually large-scale) deep learning computational problem with
the use of a high-level, data science-oriented language.
A framework :term:`bridge` is a software layer (typically a plugin *for* or an
extension *to* a framework) that translates the data science-oriented language
into a compute-oriented language called a :term:`data-flow graph`. The bridge
can then present the problem to the nGraph :abbr:`Abstraction Layer (AL)` which
is responsible for execution on an optimized backend by performing graph
transformations that replace subgraphs of the computation with more optimal
(in terms of machine code) subgraphs. Throughout this process, ``ops`` represent
tensor operations.
Either the framework can provide its own graph of functions to be compiled and
optimized via :abbr:`Ahead-of-Time (AoT)` compilation to send back to the
framework, or an entity (framework or user) who requires the flexibility of
shaping ops directly can use our graph construction functions to experiment with
building runtime APIs for their framework, thus exposing more flexible multi-theaded compute
power options to
See the section on :doc:`howto/execute` for a detailed walk-through describing
how this translation can be programmed to happen automatically via a framework.
.. _about_transformers:
Transformer ops
================
A framework bridge may define its own bridge-specific ops, as long as they can be
converted to transformer ops. This is usually achieved by them first being
converted to core ops on the way. For example, if a framework has a
``PaddedCell`` op, nGraph pattern replacement facilities can be used to convert
it into one of our core ops. More detail on transformer ops will be coming soon.
.. _graph_shaping:
Graph shaping
=============
Tensors
-------
......@@ -68,9 +164,9 @@ and results in a tensor with the same element type and shape:
(A+B)_I = A_I + B_I
Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor
:math:`X`. So the value of sum of two tensors is a tensor whose value at a
coordinate is the sum of the elements are that coordinate for the two inputs.
Unlike many frameworks, it says nothing about storage or arrays.
:math:`X`. So the value of the sum of two tensors is a tensor whose value at a
coordinate is the sum of the elements' two inputs. Unlike many frameworks, it
says nothing about storage or arrays.
An ``Add`` op is used to represent an elementwise tensor sum. To
construct an Add op, each of the two inputs of the ``Add`` must be
......@@ -117,8 +213,12 @@ corresponding to the array provided as the nth argument, and the outputs
of all result ops will be written into the result arrays in row-major
order.
An Example
----------
==========
::
......@@ -142,6 +242,7 @@ An Example
auto f = std::make_shared<Function>(Nodes{t1}, Parameters{a, b, c});
}
We use shared pointers for all ops. For each parameter, we need to
element type and shape attributes. When the function is called, each
argument must conform to the corresponding parameter element type and
......@@ -164,3 +265,5 @@ After the graph is constructed, we create the function, passing the
`Function` constructor the nodes that are results and the parameters
that are arguments.
.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
\ No newline at end of file
......@@ -4,6 +4,10 @@
Install
########
* :ref:`ubuntu`
* :ref:`centos`
Build Environments
==================
......@@ -20,10 +24,10 @@ with the following packages and prerequisites:
Clear Linux\* OS for Intel Architecture, Clang 5.0.1, CMake 3.10.2, experimental, bundles ``machine-learning-basic dev-utils python3-basic python-basic-dev``
Other configurations may work, but should be considered experimental with
limited support. On Ubuntu 16.04 with ``gcc-5.4.0`` or ``clang-3.9``, for
example, we recommend adding ``-DNGRAPH_USE_PREBUILT_LLVM=TRUE`` to the
:command:`cmake` command in step 4 below. This fetches a pre-built tarball
of LLVM+Clang from `llvm.org`_, and will substantially reduce build time.
limited support. On Ubuntu 16.04 with gcc-5.4.0 or clang-3.9, for example, we
recommend adding ``-DNGRAPH_USE_PREBUILT_LLVM=TRUE`` to the cmake command in
step 4 below. This fetches a pre-built tarball of LLVM+Clang from llvm.org,
and it will substantially reduce build time.
If using ``gcc`` version 4.8, it may be necessary to add symlinks from ``gcc``
to ``gcc-4.8``, and from ``g++`` to ``g++-4.8``, in your :envvar:`PATH`, even
......@@ -40,13 +44,10 @@ The CMake procedure installs ``ngraph_dist`` to the installing user's ``$HOME``
directory as the default location. See the :file:`CMakeLists.txt` file for
details about how to change or customize the install location.
The instructions below also presume cloning the nGraph source via an SSH-enabled
Github account. If you don't have SSH keys set up on your GitHub account, you can
still follow the instructions below and clone via HTTPS.
.. _ubuntu:
Ubuntu
------
Ubuntu 16.04
-------------
The process documented here will work on Ubuntu\* 16.04 (LTS)
......@@ -77,7 +78,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
$ mkdir build && cd build
#. Generate the GNUMakefiles in the customary manner (from within the
#. Generate the GNU Makefiles in the customary manner (from within the
``build`` directory). If running ``gcc-5.4.0`` or ``clang-3.9``, remember
that you can also append ``cmake`` with the prebuilt LLVM option to
speed-up the build. Another option if your deployment system has Intel®
......@@ -87,7 +88,7 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
.. code-block:: console
$ cmake ../ [-DNGRAPH_USE_PREBUILT_LLVM=TRUE]
$ cmake ../ [-DNGRAPH_USE_PREBUILT_LLVM=TRUE] [-DNGRAPH_TARGET_ARCH=skylake-avx512]
#. Run ``$ make`` and ``make install`` to install ``libngraph.so`` and the
header files to ``$HOME/ngraph_dist``:
......@@ -101,10 +102,14 @@ The process documented here will work on Ubuntu\* 16.04 (LTS)
inside the ``doc/sphinx`` directory of the cloned source to build a copy of
the `website docs`_ locally. The low-level API docs with inheritance and
collaboration diagrams can be found inside the ``/docs/doxygen/`` directory.
See the :doc:`project/doc-contributor-README` for more details about how to
build documentation for nGraph.
.. _centos:
CentOS
------
CentOS 7.4
-----------
The process documented here will work on CentOS 7.4.
......@@ -138,23 +143,26 @@ The process documented here will work on CentOS 7.4.
$ ./bootstrap
$ make && sudo make install
#. Clone the `NervanaSystems` ``ngraph`` repo and use Cmake 3.4.3 to
install the nGraph libraries to ``$HOME/ngraph_dist``.
#. Clone the `NervanaSystems` ``ngraph`` repo via SSH and use Cmake 3.4.3 to
install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your
deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to
target the accelerations available directly by compiling the build as follows
during the cmake step: ``-DNGRAPH_TARGET_ARCH=skylake-avx512``.
.. code-block:: console
$ cd /opt/libraries
$ git clone https://github.com/NervanaSystems/ngraph.git
$ cd ngraph && mkdir build && cd build
$ cmake ../
$ cmake ../ [-DNGRAPH_TARGET_ARCH=skylake-avx512]
$ make && sudo make install
macOS\* development
--------------------
.. note:: Although we do not offer support for the macOS platform; some
configurations and features may work.
.. note:: Although we do not currently offer full support for the macOS platform,
some configurations and features may work.
The repository includes two scripts (``maint/check-code-format.sh`` and
``maint/apply-code-format.sh``) that are used respectively to check adherence
......@@ -203,9 +211,10 @@ on an Intel nGraph-enabled backend.
For the former case, this early |version|, :doc:`framework-integration-guides`,
can help you get started with a training a model on a supported framework.
* :doc:`neon<framework-integration-guides>` framework,
* :doc:`MXNet<framework-integration-guides>` framework,
* :doc:`TensorFlow<framework-integration-guides>` framework, and
* :doc:`neon<framework-integration-guides>` framework,
For the latter case, if you've followed a tutorial from `ONNX`_, and you have an
exported, serialized model, you can skip the section on frameworks and go directly
......
......@@ -13,7 +13,7 @@ Description
===========
Produces a single output tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the absoloute value of the
where the value at each coordinate of ``output`` is the absolute value of the
value at each ``arg`` coordinate.
Inputs
......
......@@ -4,8 +4,8 @@
BatchNorm
#########
NOTE: This describes what the BatchNorm op should look like. The current version
will be made a CPU transformer op.
NOTE: This describes what the ``BatchNorm`` op should look like. The current
version will be made a CPU transformer op.
.. code-block:: cpp
......@@ -20,27 +20,27 @@ Produces a normalized output.
Inputs
------
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| Name | Element Type | Shape |
+=====================+=========================+================================+
+=====================+=========================+=============================+
| ``input`` | same as ``gamma`` | \(..., C, ...\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``gamma`` | any | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``beta`` | same as ``gamma`` | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``global_mean`` | same as ``gamma`` | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``global_variance`` | same as ``gamma`` | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``use_global`` | ``bool`` | \(\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
Attributes
----------
+-----------------+--------------------+----------------------+
+------------------+--------------------+---------------------+
| Name | Type | Notes |
+==================+====================+=====================+
| ``epsilon`` | same as ``input`` | Bias for variance |
......@@ -50,15 +50,15 @@ Attributes
Outputs
-------
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| Name | Element Type | Shape |
+=====================+=========================+================================+
+=====================+=========================+=============================+
| ``normalized`` | same as ``gamma`` | same as ``input`` |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``batch_mean`` | same as ``gamma`` | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
| ``batch_variance`` | same as ``gamma`` | \(C\) |
+---------------------+-------------------------+--------------------------------+
+---------------------+-------------------------+-----------------------------+
The ``batch_mean`` and ``batch_variance`` are computed per-channel from ``input``.
The values only need to be computed if ``use_global`` is ``false`` or they are used.
......
......@@ -12,9 +12,8 @@ Concat
Description
===========
Produces a single output tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the absoloute value of the
value at each ``arg`` coordinate.
Produce from ``Nodes`` of ``args`` some outputs with the same attributes
Inputs
------
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment