Commit 84719348 authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Edit distr docs (#1405)

* Clarify ng backends as devices

* revise some intros and add menu link
parent 3ceb6499
...@@ -6,17 +6,17 @@ Distributed Training in nGraph ...@@ -6,17 +6,17 @@ Distributed Training in nGraph
Why distributed training? Why distributed training?
------------------------- -------------------------
A tremendous amount of data is required to train deep neural networks in diverse A tremendous amount of data is required to train DNNs in diverse areas -- from
areas -- from computer vision to natural language processing. Meanwhile, computer vision to natural language processing. Meanwhile, computation used in
computation used in AI training has been increasing exponentially. And even AI training has been increasing exponentially. And even though significant
though significant improvements have been made in algorithms and hardware, improvements have been made in algorithms and hardware, using one machine to
using one machine to train a very large neural network / model is usually not train a very large :term:`NN` is usually not optimal. The use of multiple nodes,
optimal. The use of multiple nodes, then, becomes important for making deep then, becomes important for making deep learning training feasible with large
learning training feasible with a large datasets. datasets.
Data parallelism is the most popular parallel architecture to accelerate deep Data parallelism is the most popular parallel architecture to accelerate deep
learning with large datasets. The first algorithm we support is based on the learning with large datasets. The first algorithm we support is `based on the
`synchronous`_ :term:`SGD` method, and partitions the dataset among workers synchronous`_ :term:`SGD` method, and partitions the dataset among workers
where each worker executes the same neural network model. For every iteration, where each worker executes the same neural network model. For every iteration,
nGraph backend computes the gradients in back-propagation, aggregates the gradients nGraph backend computes the gradients in back-propagation, aggregates the gradients
across all workers, and then update the weights. across all workers, and then update the weights.
...@@ -24,6 +24,8 @@ across all workers, and then update the weights. ...@@ -24,6 +24,8 @@ across all workers, and then update the weights.
How? (Generic frameworks) How? (Generic frameworks)
------------------------- -------------------------
* :doc:`../howto/distribute-train`
To synchronize gradients across all workers, the essential operation for data To synchronize gradients across all workers, the essential operation for data
parallel training, due to its simplicity and scalability over parameter servers, parallel training, due to its simplicity and scalability over parameter servers,
is “allreduce”. The AllReduce op is one of the nGraph Library’s core ops. To is “allreduce”. The AllReduce op is one of the nGraph Library’s core ops. To
...@@ -94,7 +96,7 @@ communication collective ops such as allgather, scatter, gather, etc. in ...@@ -94,7 +96,7 @@ communication collective ops such as allgather, scatter, gather, etc. in
the future. the future.
.. _synchronous: https://arxiv.org/pdf/1602.06709.pdf .. _based on the synchronous: https://arxiv.org/pdf/1602.06709.pdf
.. _one could train ResNet-50 with Imagenet-1k data: https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-phi-in-less-than-40-minutes/ .. _one could train ResNet-50 with Imagenet-1k data: https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-phi-in-less-than-40-minutes/
.. _arxiv.org/pdf/1709.05011.pdf: https://arxiv.org/pdf/1709.05011.pdf .. _arxiv.org/pdf/1709.05011.pdf: https://arxiv.org/pdf/1709.05011.pdf
.. _Intel MLSL: https://github.com/intel/MLSL/releases .. _Intel MLSL: https://github.com/intel/MLSL/releases
\ No newline at end of file
...@@ -13,8 +13,8 @@ To use this mode of training, first install a supported version of `OpenMPI`_ ...@@ -13,8 +13,8 @@ To use this mode of training, first install a supported version of `OpenMPI`_
Next, create an nGraph build with the cmake flag ``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``. Next, create an nGraph build with the cmake flag ``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``.
To deploy data-parallel training on multi-node/device, the ``AllReduce`` op To deploy data-parallel training on backends supported by nGraph API, the
should be added after the steps needed to complete the ``AllReduce`` op should be added after the steps needed to complete the
:doc:`backpropagation <../howto/derive-for-training>`. :doc:`backpropagation <../howto/derive-for-training>`.
.. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp .. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp
...@@ -26,8 +26,8 @@ Also since we are using OpenMPI in this example, we need to initialize and ...@@ -26,8 +26,8 @@ Also since we are using OpenMPI in this example, we need to initialize and
finalize MPI with ``MPI::Init();`` and ``MPI::Finalize();`` at the beginning finalize MPI with ``MPI::Init();`` and ``MPI::Finalize();`` at the beginning
and the end of the code used to deploy to devices; see the `full raw code`_. and the end of the code used to deploy to devices; see the `full raw code`_.
Finally, to run the training on two nGraph devices, invoke :command:`mpirun`. Finally, to run the training using two nGraph devices, invoke :command:`mpirun`.
This will run on a single machine and launch two processes. This will launch two nGraph CPU backends.
.. code-block:: console .. code-block:: console
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment