Commit 84719348 authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Edit distr docs (#1405)

* Clarify ng backends as devices

* revise some intros and add menu link
parent 3ceb6499
......@@ -6,17 +6,17 @@ Distributed Training in nGraph
Why distributed training?
-------------------------
A tremendous amount of data is required to train deep neural networks in diverse
areas -- from computer vision to natural language processing. Meanwhile,
computation used in AI training has been increasing exponentially. And even
though significant improvements have been made in algorithms and hardware,
using one machine to train a very large neural network / model is usually not
optimal. The use of multiple nodes, then, becomes important for making deep
learning training feasible with a large datasets.
A tremendous amount of data is required to train DNNs in diverse areas -- from
computer vision to natural language processing. Meanwhile, computation used in
AI training has been increasing exponentially. And even though significant
improvements have been made in algorithms and hardware, using one machine to
train a very large :term:`NN` is usually not optimal. The use of multiple nodes,
then, becomes important for making deep learning training feasible with large
datasets.
Data parallelism is the most popular parallel architecture to accelerate deep
learning with large datasets. The first algorithm we support is based on the
`synchronous`_ :term:`SGD` method, and partitions the dataset among workers
learning with large datasets. The first algorithm we support is `based on the
synchronous`_ :term:`SGD` method, and partitions the dataset among workers
where each worker executes the same neural network model. For every iteration,
nGraph backend computes the gradients in back-propagation, aggregates the gradients
across all workers, and then update the weights.
......@@ -24,6 +24,8 @@ across all workers, and then update the weights.
How? (Generic frameworks)
-------------------------
* :doc:`../howto/distribute-train`
To synchronize gradients across all workers, the essential operation for data
parallel training, due to its simplicity and scalability over parameter servers,
is “allreduce”. The AllReduce op is one of the nGraph Library’s core ops. To
......@@ -94,7 +96,7 @@ communication collective ops such as allgather, scatter, gather, etc. in
the future.
.. _synchronous: https://arxiv.org/pdf/1602.06709.pdf
.. _based on the synchronous: https://arxiv.org/pdf/1602.06709.pdf
.. _one could train ResNet-50 with Imagenet-1k data: https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-phi-in-less-than-40-minutes/
.. _arxiv.org/pdf/1709.05011.pdf: https://arxiv.org/pdf/1709.05011.pdf
.. _Intel MLSL: https://github.com/intel/MLSL/releases
\ No newline at end of file
......@@ -13,8 +13,8 @@ To use this mode of training, first install a supported version of `OpenMPI`_
Next, create an nGraph build with the cmake flag ``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``.
To deploy data-parallel training on multi-node/device, the ``AllReduce`` op
should be added after the steps needed to complete the
To deploy data-parallel training on backends supported by nGraph API, the
``AllReduce`` op should be added after the steps needed to complete the
:doc:`backpropagation <../howto/derive-for-training>`.
.. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp
......@@ -26,8 +26,8 @@ Also since we are using OpenMPI in this example, we need to initialize and
finalize MPI with ``MPI::Init();`` and ``MPI::Finalize();`` at the beginning
and the end of the code used to deploy to devices; see the `full raw code`_.
Finally, to run the training on two nGraph devices, invoke :command:`mpirun`.
This will run on a single machine and launch two processes.
Finally, to run the training using two nGraph devices, invoke :command:`mpirun`.
This will launch two nGraph CPU backends.
.. code-block:: console
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment