Commit 00447a8e authored by Leona C's avatar Leona C Committed by Robert Kimball

Leona/doc distr (#2417)

* Update docs for PR 2353

* Fix missing flag designator

* Refer to the Intel® MLSL properly as a spelled-out acronym for first instance

* Fix missing compilation designator

* Finalize editing on CPU backend

* Update doc version to be even with what is in master branch

* Removed typo from docu build instruction command line cd command probably accidental

* Add PR feedback to doc
parent 9a6177f4
...@@ -73,14 +73,15 @@ author = 'Intel Corporation' ...@@ -73,14 +73,15 @@ author = 'Intel Corporation'
# built documents. # built documents.
# #
# The short X.Y version. # The short X.Y version.
version = '0.15' version = '0.14'
# The Documentation full version, including alpha/beta/rc tags. Some features # The Documentation full version, including alpha/beta/rc tags. Some features
# available in the latest code will not necessarily be documented first # available in the latest code will not necessarily be documented first
release = '0.15.0-rc0' release = '0.14.1'
# The language for content autogenerated by Sphinx. Refer to documentation # The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages. # for a list of supported languages.
# #
# This is also used if you do content translation via gettext catalogs. # This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases. # Usually you set "language" from the command line for these cases.
......
.. howto/distribute-train.rst .. howto/distribute-train.rst
Train using multiple nGraph CPU backends with data parallel Distribute training across multiple nGraph backends
=========================================================== ===================================================
In the :doc:`previous section <../howto/derive-for-training>`, we described the In the :doc:`previous section <../howto/derive-for-training>`, we described the
steps needed to create a "trainable" nGraph model. Here we demonstrate how to steps needed to create a "trainable" nGraph model. Here we demonstrate how to
train a data parallel model by distributing the graph to more than one device. train a data parallel model by distributing the graph to more than one device.
As of release version 0.12, the default build is with OpenMPI. To use the These options are currently supported for available backends:
`Intel MLSL`_ library, set the following compilation flag at build time:
``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``. * Use ``-DNGRAPH_DISTRIBUTED_OMPI_ENABLE=TRUE`` to enable distributed training
with OpenMPI. Use of this flag requires that OpenMPI be a pre-existing library
in the system. If it's not present on the system, install `OpenMPI`_ version
``2.1.1`` or later before running the compile.
To deploy data-parallel training on backends supported by nGraph API, the * Use ``-DNGRAPH_DISTRIBUTED_MLSL_ENABLE=TRUE`` to enable the option for
``AllReduce`` op should be added after the steps needed to complete the :abbr:`Intel® Machine Learning Scaling Library (MLSL)` for Linux* OS:
:doc:`backpropagation <../howto/derive-for-training>`.
.. important:: The Intel® MLSL option applies to Intel® Architecture CPUs
(``CPU``) and ``Interpreter`` backends only. For all other backends,
``OpenMPI`` is presently the only supported option. We recommend the
use of `Intel MLSL` for CPU backends to avoid an extra download step.
Finally, to run the training using two nGraph devices, invoke
.. code-block:: console
$ mpirun
To deploy data-parallel training, the ``AllReduce`` op should be added after the
steps needed to complete the :doc:`backpropagation <../howto/derive-for-training>`;
the new code is highlighted below:
.. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp .. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp
:language: cpp :language: cpp
:lines: 180-196 :lines: 180-196
:emphasize-lines: 8-11 :emphasize-lines: 8-11
We need to initialize and finalize distributed training with ``Distributed`` object; See the `full code`_ in the ``examples`` folder ``/doc/examples/mnist_mlp/dist_mnist_mlp.cpp``.
see the `full raw code`_.
Finally, to run the training using two nGraph devices, invoke :command:`mpirun` which
is distributed with `Intel MLSL`_ library. This will launch two nGraph CPU backends.
.. code-block:: console .. code-block:: console
...@@ -35,4 +46,5 @@ is distributed with `Intel MLSL`_ library. This will launch two nGraph CPU back ...@@ -35,4 +46,5 @@ is distributed with `Intel MLSL`_ library. This will launch two nGraph CPU back
.. _Intel MLSL: https://github.com/intel/MLSL/releases .. _Intel MLSL: https://github.com/intel/MLSL/releases
.. _full raw code: https://github.com/NervanaSystems/ngraph/blob/master/doc/examples/mnist_mlp/dist_mnist_mlp.cpp .. _OpenMPI: https://www.open-mpi.org/software/ompi/v2.1/
.. _full code: https://github.com/NervanaSystems/ngraph/blob/master/doc/examples/mnist_mlp/dist_mnist_mlp.cpp
...@@ -127,7 +127,7 @@ To build documentation locally, run: ...@@ -127,7 +127,7 @@ To build documentation locally, run:
.. code-block:: console .. code-block:: console
$ sudo apt-get install python3-sphinxcd $ sudo apt-get install python3-sphinx
$ pip3 install [-I] Sphinx==1.7.5 [--user] $ pip3 install [-I] Sphinx==1.7.5 [--user]
$ pip3 install [-I] breathe numpy [--user] $ pip3 install [-I] breathe numpy [--user]
$ cd doc/sphinx/ $ cd doc/sphinx/
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment