Commit 00447a8e authored by Leona C's avatar Leona C Committed by Robert Kimball

Leona/doc distr (#2417)

* Update docs for PR 2353

* Fix missing flag designator

* Refer to the Intel® MLSL properly as a spelled-out acronym for first instance

* Fix missing compilation designator

* Finalize editing on CPU backend

* Update doc version to be even with what is in master branch

* Removed typo from docu build instruction command line cd command probably accidental

* Add PR feedback to doc
parent 9a6177f4
......@@ -73,14 +73,15 @@ author = 'Intel Corporation'
# built documents.
#
# The short X.Y version.
version = '0.15'
version = '0.14'
# The Documentation full version, including alpha/beta/rc tags. Some features
# available in the latest code will not necessarily be documented first
release = '0.15.0-rc0'
release = '0.14.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
......
.. howto/distribute-train.rst
Train using multiple nGraph CPU backends with data parallel
===========================================================
Distribute training across multiple nGraph backends
===================================================
In the :doc:`previous section <../howto/derive-for-training>`, we described the
steps needed to create a "trainable" nGraph model. Here we demonstrate how to
train a data parallel model by distributing the graph to more than one device.
As of release version 0.12, the default build is with OpenMPI. To use the
`Intel MLSL`_ library, set the following compilation flag at build time:
These options are currently supported for available backends:
``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``.
* Use ``-DNGRAPH_DISTRIBUTED_OMPI_ENABLE=TRUE`` to enable distributed training
with OpenMPI. Use of this flag requires that OpenMPI be a pre-existing library
in the system. If it's not present on the system, install `OpenMPI`_ version
``2.1.1`` or later before running the compile.
To deploy data-parallel training on backends supported by nGraph API, the
``AllReduce`` op should be added after the steps needed to complete the
:doc:`backpropagation <../howto/derive-for-training>`.
* Use ``-DNGRAPH_DISTRIBUTED_MLSL_ENABLE=TRUE`` to enable the option for
:abbr:`Intel® Machine Learning Scaling Library (MLSL)` for Linux* OS:
.. important:: The Intel® MLSL option applies to Intel® Architecture CPUs
(``CPU``) and ``Interpreter`` backends only. For all other backends,
``OpenMPI`` is presently the only supported option. We recommend the
use of `Intel MLSL` for CPU backends to avoid an extra download step.
Finally, to run the training using two nGraph devices, invoke
.. code-block:: console
$ mpirun
To deploy data-parallel training, the ``AllReduce`` op should be added after the
steps needed to complete the :doc:`backpropagation <../howto/derive-for-training>`;
the new code is highlighted below:
.. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp
:language: cpp
:lines: 180-196
:emphasize-lines: 8-11
We need to initialize and finalize distributed training with ``Distributed`` object;
see the `full raw code`_.
Finally, to run the training using two nGraph devices, invoke :command:`mpirun` which
is distributed with `Intel MLSL`_ library. This will launch two nGraph CPU backends.
See the `full code`_ in the ``examples`` folder ``/doc/examples/mnist_mlp/dist_mnist_mlp.cpp``.
.. code-block:: console
......@@ -35,4 +46,5 @@ is distributed with `Intel MLSL`_ library. This will launch two nGraph CPU back
.. _Intel MLSL: https://github.com/intel/MLSL/releases
.. _full raw code: https://github.com/NervanaSystems/ngraph/blob/master/doc/examples/mnist_mlp/dist_mnist_mlp.cpp
.. _OpenMPI: https://www.open-mpi.org/software/ompi/v2.1/
.. _full code: https://github.com/NervanaSystems/ngraph/blob/master/doc/examples/mnist_mlp/dist_mnist_mlp.cpp
......@@ -127,7 +127,7 @@ To build documentation locally, run:
.. code-block:: console
$ sudo apt-get install python3-sphinxcd
$ sudo apt-get install python3-sphinx
$ pip3 install [-I] Sphinx==1.7.5 [--user]
$ pip3 install [-I] breathe numpy [--user]
$ cd doc/sphinx/
......@@ -167,4 +167,4 @@ stable reST documentation.
.. _doxygen: http://www.doxygen.org/index.html
.. 45555555555555555555555555555
\ No newline at end of file
.. 45555555555555555555555555555
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment