Basic Concepts==============.. important:: Distributed training is not officially supported as of version |version|; however, some configuration options have worked for nGraph devices in testing environments.Data scientists with locally-scalable rack or cloud-based resources will likelyfind it worthwhile to experiment with different modes or variations ofdistributed training. Deployments using nGraph Library with supported backendscan be configured to train with data parallelism and will soon work with modelparallelism. Distributing workloads is increasingly important, as more data andbigger models mean the ability to :doc:`../core/constructing-graphs/distribute-train`work with larger and larger datasets, or to work with models having many layersthat aren't designed to fit to a single device.Distributed training with data parallelism splits the data and each workernode has the same model; during each iteration, the gradients are aggregatedacross all workers with an op that performs "allreduce", and applied to updatethe weights.Using multiple machines helps to scale and speed up deep learning. With large mini-batch training, one could train ResNet-50 with Imagenet-1k data to the*Top 5* classifier in minutes using thousands of CPU nodes. See`arxiv.org/abs/1709.05011`_... _arxiv.org/abs/1709.05011: https://arxiv.org/format/1709.05011