Edit distr docs (#1405)

* Clarify ng backends as devices * revise some intros and add menu link

Edit distr docs (#1405)
* Clarify ng backends as devices * revise some intros and add menu link
84719348 · L.S. Cook · Scott Cyphers · 3ceb6499 · 84719348 · 84719348
Commit 84719348 authored Aug 13, 2018 by L.S. Cook Committed by Scott Cyphers Aug 13, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 17 additions and 14 deletions

index.rst doc/sphinx/source/distr/index.rst +13 -10

distribute-train.rst doc/sphinx/source/howto/distribute-train.rst +4 -4

No files found.
--- a/doc/sphinx/source/distr/index.rst
+++ b/doc/sphinx/source/distr/index.rst
@@ -6,17 +6,17 @@ Distributed Training in nGraph
 Why distributed training?
 -------------------------

-A tremendous amount of data is required to train deep neural networks in diverse 
-areas -- from computer vision to natural language processing. Meanwhile, 
-computation used in AI training has been increasing exponentially. And even 
-though significant improvements have been made in algorithms and hardware, 
-using one machine to train a very large neural network / model is usually not 
-optimal. The use of multiple nodes, then, becomes important for making deep 
-learning training feasible with a large datasets.   
+A tremendous amount of data is required to train DNNs in diverse areas -- from 
+computer vision to natural language processing. Meanwhile, computation used in 
+AI training has been increasing exponentially. And even though significant 
+improvements have been made in algorithms and hardware, using one machine to 
+train a very large :term:`NN` is usually not optimal. The use of multiple nodes, 
+then, becomes important for making deep learning training feasible with large 
+datasets.   

 Data parallelism is the most popular parallel architecture to accelerate deep 
-learning with large datasets. The first algorithm we support is based on the 
-`synchronous`_ :term:`SGD` method, and partitions the dataset among workers 
+learning with large datasets. The first algorithm we support is `based on the 
+synchronous`_ :term:`SGD` method, and partitions the dataset among workers 
 where each worker executes the same neural network model. For every iteration, 
 nGraph backend computes the gradients in back-propagation, aggregates the gradients 
 across all workers, and then update the weights. 
@@ -24,6 +24,8 @@ across all workers, and then update the weights.
 How? (Generic frameworks)
 -------------------------

+* :doc:`../howto/distribute-train`
+
 To synchronize gradients across all workers, the essential operation for data 
 parallel training, due to its simplicity and scalability over parameter servers, 
 is “allreduce”. The AllReduce op is one of the nGraph Library’s core ops. To 
@@ -94,7 +96,7 @@ communication collective ops such as allgather, scatter, gather, etc. in
 the future. 


-.. _synchronous: https://arxiv.org/pdf/1602.06709.pdf 
+.. _based on the synchronous: https://arxiv.org/pdf/1602.06709.pdf 
 .. _one could train ResNet-50 with Imagenet-1k data: https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-phi-in-less-than-40-minutes/
 .. _arxiv.org/pdf/1709.05011.pdf: https://arxiv.org/pdf/1709.05011.pdf
 .. _Intel MLSL: https://github.com/intel/MLSL/releases
\ No newline at end of file
--- a/doc/sphinx/source/howto/distribute-train.rst
+++ b/doc/sphinx/source/howto/distribute-train.rst
@@ -13,8 +13,8 @@ To use this mode of training, first install a supported version of `OpenMPI`_

 Next, create an nGraph build with the cmake flag ``-DNGRAPH_DISTRIBUTED_ENABLE=TRUE``.  

-To deploy data-parallel training on multi-node/device, the ``AllReduce`` op 
-should be added after the steps needed to complete the 
+To deploy data-parallel training on backends supported by nGraph API, the 
+``AllReduce`` op should be added after the steps needed to complete the 
 :doc:`backpropagation <../howto/derive-for-training>`.

 .. literalinclude:: ../../../examples/mnist_mlp/dist_mnist_mlp.cpp
@@ -26,8 +26,8 @@ Also since we are using OpenMPI in this example, we need to initialize and
 finalize MPI with ``MPI::Init();`` and ``MPI::Finalize();`` at the beginning
 and the end of the code used to deploy to devices; see the `full raw code`_. 

-Finally, to run the training on two nGraph devices, invoke :command:`mpirun`. 
-This will run on a single machine and launch two processes. 
+Finally, to run the training using two nGraph devices, invoke :command:`mpirun`. 
+This will launch two nGraph CPU backends.


 .. code-block:: console