.. howto/distribute-train.rst Distribute training across multiple nGraph backends =================================================== .. important:: Distributed training is not officially supported in version |version|; however, the following configuration options have worked for nGraph devices with mixed or limited success in testing. In the :doc:`previous section <../constructing-graphs/derive-for-training>`, we described the steps needed to create a "trainable" nGraph model. Here we demonstrate how to train a data parallel model by distributing the graph to more than one device. Frameworks can implement distributed training with nGraph versions prior to `0.13`: * Use ``-DNGRAPH_DISTRIBUTED_ENABLE=OMPI`` to enable distributed training with OpenMPI. Use of this flag requires that OpenMPI be a pre-existing library in the system. If it's not present on the system, install `OpenMPI`_ version ``2.1.1`` or later before running the compile. * Use ``-DNGRAPH_DISTRIBUTED_ENABLE=MLSL`` to enable the option for :abbr:`Intel® Machine Learning Scaling Library (MLSL)` for Linux* OS: .. note:: The Intel® MLSL option applies to Intel® Architecture CPUs (``CPU``) and ``Interpreter`` backends only. For all other backends, ``OpenMPI`` is presently the only supported option. We recommend the use of `Intel MLSL` for CPU backends to avoid an extra download step. Finally, to run the training using two nGraph devices, invoke .. code-block:: console $ mpirun To deploy data-parallel training, the ``AllReduce`` op should be added after the steps needed to complete the :doc:`backpropagation <../constructing-graphs/derive-for-training>`; the new code is highlighted below: .. literalinclude:: ../../../../examples/mnist_mlp/dist_mnist_mlp.cpp :language: cpp :lines: 178-194 :emphasize-lines: 8-11 See the `full code`_ in the ``examples`` folder ``/doc/examples/mnist_mlp/dist_mnist_mlp.cpp``. .. code-block:: console $ mpirun -np 2 dist_mnist_mlp .. _Intel MLSL: https://github.com/intel/MLSL/releases .. _OpenMPI: https://www.open-mpi.org/software/ompi/v2.1/ .. _full code: https://github.com/NervanaSystems/ngraph/blob/master/doc/examples/mnist_mlp/dist_mnist_mlp.cpp