Commit 79878d71 authored by L.S. Cook's avatar L.S. Cook Committed by Michał Karzyński

Architecture and feature docs (#2092)

parent 7b665771
# nGraph Compiler Stack Beta # nGraph Compiler Stack
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/NervanaSystems/ngraph/blob/master/LICENSE) [![Build Status][build-status-badge]][build-status] [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/NervanaSystems/ngraph/blob/master/LICENSE) [![Build Status][build-status-badge]][build-status]
<div align="left"> <div align="left">
<h3> <h4>
<a href="https://ngraph.nervanasys.com/docs/latest/project/about.html"> <a href="./ABOUT.md">Architecture & features</a> | <a href="./ecosystem-overview.md" >Ecosystem</a> | <a href="https://ngraph.nervanasys.com/docs/latest/project/release-notes.html">Release notes</a><span> | </span> <a href="https://ngraph.nervanasys.com/docs/latest">Documentation</a><span> | </span> <a href="#How-to-contribute" >Contribution guide</a>
Architecture and features</a> | <a href="#Ecosystem" >nGraph ecosystem</a><span> </span> <span> | </span> </h4>
<a href="https://ngraph.nervanasys.com/docs/latest/project/release-notes.html">
Beta release notes</a><span> | </span> <br />
<a href="https://ngraph.nervanasys.com/docs/latest">Documentation</a><span> | </span>
<a href="#How-to-contribute" >How to contribute</a>
</h3>
</div> </div>
## Quick start ## Quick start
...@@ -19,14 +14,14 @@ ...@@ -19,14 +14,14 @@
To begin using nGraph with popular frameworks to accelerate deep learning To begin using nGraph with popular frameworks to accelerate deep learning
workloads on CPU for inference, please refer to the links below. workloads on CPU for inference, please refer to the links below.
| Framework / Version | Installation guide | Notes | Framework (Version) | Installation guide | Notes
|----------------------------|----------------------------------------|----------------------------------- |----------------------------|----------------------------------------|-----------------------------------
| TensorFlow* 1.12 | [Pip package] or [Build from source] | 17 [Validated workloads] | TensorFlow* 1.12 | [Pip package] or [Build from source] | 17 [Validated workloads]
| MXNet* 1.4 | [Enable the module] or [Source compile]| 17 [Validated workloads] | MXNet* 1.4 | [Enable the module] or [Source compile]| 17 [Validated workloads]
| ONNX 1.3 | [Pip package] | 13 [Functional] workloads with DenseNet-121, Inception-v1, ResNet-50, Inception-v2, ShuffleNet, SqueezeNet, VGG-19, and 7 more | ONNX 1.3 | [Pip package] | 14 [Validated workloads]
Frameworks using nGraph Compiler stack to execute workloads have shown Frameworks using nGraph Compiler stack to execute workloads have shown
**3X** to **45X** performance boost when compared to native framework **up to 45X** performance boost when compared to native framework
implementations. We've also seen performance boosts running workloads that implementations. We've also seen performance boosts running workloads that
are not included on the list of [Validated workloads], thanks to our are not included on the list of [Validated workloads], thanks to our
powerful subgraph pattern matching and thanks to the collaborative efforts powerful subgraph pattern matching and thanks to the collaborative efforts
...@@ -47,23 +42,23 @@ We strongly believe in providing freedom, performance, and ease-of-use to AI ...@@ -47,23 +42,23 @@ We strongly believe in providing freedom, performance, and ease-of-use to AI
developers. developers.
The diagram below shows what deep learning frameworks and hardware targets The diagram below shows what deep learning frameworks and hardware targets
we support. More details on these current and future plans are in the ecosystem we support. More details on these current and future plans are in the [ecosystem]
section. section.
![nGraph ecosystem][ngraph-ecosystem] ![nGraph wireframe][ngraph_wireframes_with_notice]
While the ecosystem shown above is all functioning, we have validated While the ecosystem shown above is all functioning, we have validated
performance metrics for deep learning inference on CPU processors including performance for deep learning inference on CPU processors such as Intel® Xeon®.
as Intel® Xeon®. Please refer to the [Beta release notes] to learn more. Please refer to the [Release notes] to learn more. The Gold release
The Gold release is targeted for April 2019; it will feature broader workload is targeted for April 2019; it will feature broader workload coverage,
coverage, including support for quantized graphs, and more detail on our including quantized graphs, and more detail on our advanced support for
advanced support for ``int8``. ``int8``.
Our documentation has extensive information about how to use nGraph Compiler Our documentation has extensive information about how to use nGraph Compiler
stack to create an nGraph computational graph, integrate custom frameworks, stack to create an nGraph computational graph, integrate custom frameworks,
and interact with supported backends. If you wish to contribute to the and to interact with supported backends. If you wish to contribute to the
project, please don't hesitate to ask questions in [GitHub issues] after project, please don't hesitate to ask questions in [GitHub issues] after
reviewing our contribution guide below. reviewing our contribution guide below.
...@@ -85,20 +80,8 @@ to improve it: ...@@ -85,20 +80,8 @@ to improve it:
modifications are necessary, may provide feedback to guide you. When modifications are necessary, may provide feedback to guide you. When
accepted, your pull request will be merged to the repository. accepted, your pull request will be merged to the repository.
![nGraph Compiler Stack][ngraph-compiler-stack-readme]
| Backend | current support | future support |
|-----------------------------------------------|-------------------|----------------|
| Intel® Architecture CPU | yes | yes |
| Intel® Nervana™ Neural Network Processor (NNP)| yes | yes |
| Intel [Movidius™ Myriad™ 2] VPUs | coming soon | yes |
| Intel® Architecture GPUs | via PlaidML | yes |
| AMD* GPUs | via PlaidML | yes |
| NVIDIA* GPUs | via PlaidML | some |
| Field Programmable Gate Arrays (FPGA) | no | yes |
[Ecosystem]: ecosystem-overview
[Architecture and features]:https://ngraph.nervanasys.com/docs/latest/project/about.html [Architecture and features]:https://ngraph.nervanasys.com/docs/latest/project/about.html
[Documentation]: https://ngraph.nervanasys.com/docs/latest [Documentation]: https://ngraph.nervanasys.com/docs/latest
[build the Library]: https://ngraph.nervanasys.com/docs/latest/buildlb.html [build the Library]: https://ngraph.nervanasys.com/docs/latest/buildlb.html
...@@ -112,7 +95,7 @@ to improve it: ...@@ -112,7 +95,7 @@ to improve it:
[contrib guide]: https://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html [contrib guide]: https://ngraph.nervanasys.com/docs/latest/project/code-contributor-README.html
[pull request]: https://github.com/NervanaSystems/ngraph/pulls [pull request]: https://github.com/NervanaSystems/ngraph/pulls
[how to import]: https://ngraph.nervanasys.com/docs/latest/howto/import.html [how to import]: https://ngraph.nervanasys.com/docs/latest/howto/import.html
[ngraph-ecosystem]: doc/sphinx/source/graphics/599px-Intel-ngraph-ecosystem.png "nGraph Ecosystem" [ngraph_wireframes_with_notice]: doc/sphinx/source/graphics/ngraph_wireframes_with_notice.png "nGraph wireframe"
[ngraph-compiler-stack-readme]: doc/sphinx/source/graphics/ngraph-compiler-stack-readme.png "nGraph Compiler Stack" [ngraph-compiler-stack-readme]: doc/sphinx/source/graphics/ngraph-compiler-stack-readme.png "nGraph Compiler Stack"
[build-status]: https://travis-ci.org/NervanaSystems/ngraph/branches [build-status]: https://travis-ci.org/NervanaSystems/ngraph/branches
[build-status-badge]: https://travis-ci.org/NervanaSystems/ngraph.svg?branch=master [build-status-badge]: https://travis-ci.org/NervanaSystems/ngraph.svg?branch=master
...@@ -121,6 +104,7 @@ to improve it: ...@@ -121,6 +104,7 @@ to improve it:
[PlaidML]: https://github.com/plaidml/plaidml [PlaidML]: https://github.com/plaidml/plaidml
[Pip package]: https://github.com/NervanaSystems/ngraph-onnx#installing-ngraph-onnx [Pip package]: https://github.com/NervanaSystems/ngraph-onnx#installing-ngraph-onnx
[Build from source]: https://github.com/NervanaSystems/ngraph-tf [Build from source]: https://github.com/NervanaSystems/ngraph-tf
[Enable the module]: https://github.com/NervanaSystems/ngraph/blob/mbrookhart/mxnet_tutorial/doc/sphinx/source/shared/mxnet_tutorial.rst
[Source compile]: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/NGRAPH_README.md [Source compile]: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/NGRAPH_README.md
[nGraph-ONNX]: https://github.com/NervanaSystems/ngraph-onnx/blob/master/README.md [nGraph-ONNX]: https://github.com/NervanaSystems/ngraph-onnx/blob/master/README.md
[nGraph-ONNX adaptable]: https://ai.intel.com/adaptable-deep-learning-solutions-with-ngraph-compiler-and-onnx/ [nGraph-ONNX adaptable]: https://ai.intel.com/adaptable-deep-learning-solutions-with-ngraph-compiler-and-onnx/
......
# Install script for directory: /opt/libraries/ngraph/doc/sphinx
# Set the install prefix
if(NOT DEFINED CMAKE_INSTALL_PREFIX)
set(CMAKE_INSTALL_PREFIX "/usr/local")
endif()
string(REGEX REPLACE "/$" "" CMAKE_INSTALL_PREFIX "${CMAKE_INSTALL_PREFIX}")
# Set the install configuration name.
if(NOT DEFINED CMAKE_INSTALL_CONFIG_NAME)
if(BUILD_TYPE)
string(REGEX REPLACE "^[^A-Za-z0-9_]+" ""
CMAKE_INSTALL_CONFIG_NAME "${BUILD_TYPE}")
else()
set(CMAKE_INSTALL_CONFIG_NAME "")
endif()
message(STATUS "Install configuration: \"${CMAKE_INSTALL_CONFIG_NAME}\"")
endif()
# Set the component getting installed.
if(NOT CMAKE_INSTALL_COMPONENT)
if(COMPONENT)
message(STATUS "Install component: \"${COMPONENT}\"")
set(CMAKE_INSTALL_COMPONENT "${COMPONENT}")
else()
set(CMAKE_INSTALL_COMPONENT)
endif()
endif()
# Install shared libraries without execute permission?
if(NOT DEFINED CMAKE_INSTALL_SO_NO_EXE)
set(CMAKE_INSTALL_SO_NO_EXE "1")
endif()
# Is this installation the result of a crosscompile?
if(NOT DEFINED CMAKE_CROSSCOMPILING)
set(CMAKE_CROSSCOMPILING "FALSE")
endif()
if(CMAKE_INSTALL_COMPONENT)
set(CMAKE_INSTALL_MANIFEST "install_manifest_${CMAKE_INSTALL_COMPONENT}.txt")
else()
set(CMAKE_INSTALL_MANIFEST "install_manifest.txt")
endif()
string(REPLACE ";" "\n" CMAKE_INSTALL_MANIFEST_CONTENT
"${CMAKE_INSTALL_MANIFEST_FILES}")
file(WRITE "/opt/libraries/ngraph/doc/sphinx/${CMAKE_INSTALL_MANIFEST}"
"${CMAKE_INSTALL_MANIFEST_CONTENT}")
...@@ -49,9 +49,9 @@ across all workers with an op that performs "allreduce", and applied to update ...@@ -49,9 +49,9 @@ across all workers with an op that performs "allreduce", and applied to update
the weights. the weights.
Using multiple machines helps to scale and speed up deep learning. With large Using multiple machines helps to scale and speed up deep learning. With large
mini-batch training, `one could train ResNet-50 with Imagenet-1k data`_ to the mini-batch training, one could train ResNet-50 with Imagenet-1k data to the
*Top 5* classifier in minutes using thousands of CPU nodes. See also: *Top 5* classifier in minutes using thousands of CPU nodes. See
`arxiv.org/pdf/1709.05011.pdf`_. `arxiv.org/abs/1709.05011`_.
...@@ -96,7 +96,6 @@ communication collective ops such as allgather, scatter, gather, etc. in ...@@ -96,7 +96,6 @@ communication collective ops such as allgather, scatter, gather, etc. in
the future. the future.
.. _based on the synchronous: https://arxiv.org/pdf/1602.06709.pdf .. _arxiv.org/abs/1709.05011: https://arxiv.org/format/1709.05011
.. _one could train ResNet-50 with Imagenet-1k data: https://blog.surf.nl/en/imagenet-1k-training-on-intel-xeon-phi-in-less-than-40-minutes/ .. _based on the synchronous: https://arxiv.org/format/1602.06709
.. _arxiv.org/pdf/1709.05011.pdf: https://arxiv.org/pdf/1709.05011.pdf
.. _Intel MLSL: https://github.com/intel/MLSL/releases .. _Intel MLSL: https://github.com/intel/MLSL/releases
\ No newline at end of file
...@@ -4,96 +4,99 @@ ...@@ -4,96 +4,99 @@
Validation and testing Validation and testing
###################### ######################
* **Validating** -- To provide optimizations with nGraph, we first We validated performance for the following TensorFlow* and MXNet* workloads:
confirm that a given workload is "validated" as being functional;
that is, we can successfully load its serialized graph as an nGraph
:term:`function graph`. Following here is a list of 14 workloads
we've tested with success.
TensorFlow
==========
.. csv-table:: .. csv-table::
:header: "Workload", "Validated" :header: "TensorFlow Workloads", "Type"
:widths: 27, 53 :widths: 27, 53
:escape: ~ :escape: ~
DenseNet-121, Functional Resnet50 v1 and v2, Image recognition
Inception-v1, Functional Inception V3 and V4, Image recognition
Inception-v2, Functional Inception-ResNetv2, Image recognition
ResNet-50, Functional MobileNet v1, Image recognition
Shufflenet, Functional SqueezeNet v1.1, Image recognition
SqueezeNet, Functional DenseNet-121, Image recognition
VGG-19, Functional SSD-VGG16, Object detection
ZFNet-512, Functional SSD-MobileNetv1, Object detection
MNIST, Functional Faster RCNN, Object detection
Emotion-FERPlus, Functional Yolo v2, Object detection
BVLC AlexNet, Functional Wide & Deep, Recommender system
BVLC GoogleNet, Functional NCF, Recommender system
BVLC CaffeNet, Functional WaveNet, Speech generation
BVLC R-CNN ILSVRC13, Functional U-Net, Image segmentation
DCGAN, Generative adversarial network
DRAW, Image generation
A3C, Reinforcement learning
MXNet
=====
.. csv-table::
:header: "MXNet Workloads", "Type"
:widths: 27, 53
:escape: ~
* **Testing & Performance Optimizations** for workloads that have been Resnet50 v1 and v2, Image recognition
"validated" with nGraph are also available via the nGraph DenseNet (121, 161, 169, 201), Image recognition
:abbr:`Intermediate Representation (IR)`). For example, a common use InceptionV3, Image recognition
case for data scientists is to train a new model with a large dataset, InceptionV4, Image recognition
and so nGraph already has several accelerations available "out of the Inception-ResNetv2, Image recognition
box" for the workloads noted below. MobileNet v1, Image recognition
SqueezeNet v1 and v1.1, Image recognition
VGG16, Image recognition
Faster RCNN, Object detection
SSD-VGG16, Object detection
GNMT, Language translation
Transformer-LT, Language translation
Wide & Deep, Recommender system
WaveNet, Speech generation
DeepSpeech2, Speech recognition
DCGAN, Generative adversarial network
A3C, Reinforcement learning
ONNX
=====
Additionally, we validated the following workloads are functional through nGraph ONNX importer:
TensorFlow
==========
.. csv-table:: .. csv-table::
:header: "TensorFlow Workloads", "Performance" :header: "Workload", "Type"
:widths: 27, 53 :widths: 27, 53
:escape: ~ :escape: ~
Resnet50 v1 and v2, 50% of P40 DenseNet-121, Image recognition
Inception V3 and V4, 50% of P40 Inception-v1, Image recognition
Inception-ResNetv2, 50% of P40 Inception-v2, Image recognition
MobileNet v1, 50% of P40 ResNet-50, Image recognition
SqueezeNet v1.1, 50% of P40 Shufflenet, Image recognition
SSD-VGG16, 50% of P40 SqueezeNet, Image recognition
R-FCN, 50% of P40 VGG-19, Image recognition
Faster RCNN, 50% of P40 ZFNet-512, Image recognition
Yolo v2, 50% of P40 MNIST, Image recognition
GNMT, Greater than or equal to :abbr:`Direct Optimization (DO)` Emotion-FERPlus, Image recognition
Transformer-LT, 50% of P40 BVLC AlexNet, Image recognition
Wide & Deep, 50% of P40 BVLC GoogleNet, Image recognition
WaveNet, Functional BVLC CaffeNet, Image recognition
U-Net, Greater than DO BVLC R-CNN ILSVRC13, Object detection
DRAW, 50% of P40
A3C, 50% of P40
MXNet
=====
.. csv-table::
:header: "MXNet Workloads", "Performance"
:widths: 27, 53
:escape: ~
Resnet50 v1 and v2, 50% of P40 .. important:: Please see Intel's `Optimization Notice`_ for details on disclaimers.
DenseNet (121 161 169 201), 50% of P40
InceptionV3, 50% of P40
InceptionV4, 50% of P40
Inception-ResNetv2, 50% of P40 .. _Optimization Notice: https://software.intel.com/en-us/articles/optimization-notice
MobileNet v1, 50% of P40
SqueezeNet v1 and v1.1, 50% of P40
VGG16, Functional (No DO available)
Faster RCNN, 50% of P40
SSD-VGG16, 50% of P40
GNMT, Greater than or equal to :abbr:`Direct Optimization (DO)`
Transformer-LT, 50% of P40
Wide & Deep, 50% of P40
WaveNet, Functional
DeepSpeech2, 50% of P40
DCGAN, 50% of P40
A3C, Greater than or equal to DO
.. Notice revision #20110804: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
......
...@@ -175,5 +175,15 @@ Glossary ...@@ -175,5 +175,15 @@ Glossary
SGD SGD
:abbr:`Stochastic Gradient Descent (SGD)`, also known as incremental :abbr:`Stochastic Gradient Descent (SGD)`, also known as incremental
gradient descent, is an iterative method for optimizing a differentiable gradient descent, is an iterative method for optimizing a
objective function. differentiable objective function.
\ No newline at end of file
validated
To provide optimizations with nGraph, we first confirm that a given
workload is "validated" as being functional; that is, we can
successfully load its serialized graph as an nGraph :term:`function
graph`
...@@ -16,27 +16,27 @@ nGraph Compiler stack architecture ...@@ -16,27 +16,27 @@ nGraph Compiler stack architecture
================================== ==================================
The diagram below represents our current Beta release stack. Please note that The diagram below represents our current |release| release stack. Please
the stack diagram is simplified to show how nGraph executes deep learning note that the stack diagram is simplified to show how nGraph executes deep
workloads with two hardware backends; however, many other deep learning learning workloads with two hardware backends; however, many other deep
frameworks and backends currently are functioning. learning frameworks and backends currently are functioning.
.. figure:: ../graphics/stackngrknl.png .. figure:: ../graphics/stackngrknl.png
:width: 771px :width: 455px
:alt: Current Beta release stack :alt: Current Beta release stack
Simplified stack diagram for nGraph Compiler and components Beta Simplified stack diagram for nGraph Compiler and components Beta
Starting from the top of the diagram, we present a simplified view of how Starting from the top of the diagram, we present a simplified view of the nGraph
the nGraph :abbr:`Intermediate Representation (IR)` can receive a graph from a Intermediate Representation (IR). The nGraph IR is a format which works with a
framework such as TensorFlow\* or MXNet\* when there is a corresponding framework such as TensorFlow* or MXNet* when there is a corresponding "Bridge"
"Bridge" or import method, such as from NNVM or via `ONNX`_. Once the nGraph or import method, such as from NNVM or via `ONNX`_. Once the nGraph IR can begin
:doc:`../ops/index` can begin parsing the graph as a computation graph, they using nGraph's Core ops, components lower in the stack can begin parsing and
can pattern-match subgraphs for device-specific optimizations; these are then pattern-matching subgraphs for device-specific optimizations; these are then
encapsulated. This encapsulation is represented on the diagram as the colored encapsulated. This encapsulation is represented on the diagram as the colored
background between the ``ngraph`` kernel(s) and the the stack above. background between the ``ngraph`` kernel(s) and the the stack above.
Note that everything at or below the "Kernel APIs" and "Subgraph APIs" gets Note that everything at or below the **Kernel APIs** and **Subgraph APIs** gets
executed "automatically" during training runs. In other words, the accelerations executed "automatically" during training runs. In other words, the accelerations
are automatic: parts of the graph that are not encapsulated default to framework are automatic: parts of the graph that are not encapsulated default to framework
implementation when executed. For example, if nGraph optimizes ResNet50 for implementation when executed. For example, if nGraph optimizes ResNet50 for
...@@ -89,7 +89,7 @@ Features ...@@ -89,7 +89,7 @@ Features
The nGraph :abbr:`(IR) Intermediate Representation` contains a combination of The nGraph :abbr:`(IR) Intermediate Representation` contains a combination of
device-specific and non-device-specific optimization : device-specific and non-device-specific optimization :
* **Fusion** -- Fuse multiple ops to to decrease memory usage "localities". * **Fusion** -- Fuse multiple ops to to decrease memory usage.
* **Data layout abstraction** -- Make abstraction easier and faster with nGraph * **Data layout abstraction** -- Make abstraction easier and faster with nGraph
translating element order to work best for a given or available device. translating element order to work best for a given or available device.
* **Data reuse** -- Save results and reuse for subgraphs with the same input. * **Data reuse** -- Save results and reuse for subgraphs with the same input.
...@@ -110,7 +110,6 @@ device-specific and non-device-specific optimization : ...@@ -110,7 +110,6 @@ device-specific and non-device-specific optimization :
added with new functions that build sub-graphs from existing core ops. added with new functions that build sub-graphs from existing core ops.
.. _portable: .. _portable:
Portable Portable
......
# Framework & runtime support
One of nGraph’s key features is framework neutrality. We currently support
popular deep learning frameworks such as TensorFlow and MXNet with stable
bridges to pass computational graphs to nGraph. Additionally nGraph
Compiler has functional bridges to PaddlePaddle and PyTorch (via [ONNXIFI]).
For these frameworks, we have successfully tested functionality with a few
deep learning workloads, and we plan to bring stable support for them in the
upcoming releases.
To further promote framework neutrality, the nGraph team has been actively
contributing to the ONNX project. Developers who already have a "trained"
DNN (Deep Neural Network) model can use nGraph to bypass significant
framework-based complexity and [import it] to test or run on targeted and
efficient backends with our user-friendly Python-based API.
nGraph is also integrated as an computation provider for [ONNX Runtime],
which is a runtime for [WinML] on Windows OS and Azure to accelerate DL
workloads.
The table below summarizes our current progress on supported frameworks.
If you are an architect of a framework wishing to take advantage of speed
and multi-device support of nGraph Compiler, please refer to [Framework integration guide] section.
| Framework & Runtime | Supported | Validated
|----------------------------|--------------------|-------------
| TensorFlow* 1.12 | :heavy_check_mark: | :heavy_check_mark:
| MXNet* 1.4 | :heavy_check_mark: | :heavy_check_mark:
| ONNX 1.3 | :heavy_check_mark: | :heavy_check_mark:
| ONNX Runtime Functional | Functional | No
| PyTorch (via ONNXIFI) | Functional | No
| PaddlePaddle | Functional | No
## Hardware & backend support
The current release of nGraph primarily focuses on accelerating inference
performance on CPU. However we are also working on adding support for more
hardware and backends. As with the frameworks, we believe in providing
freedom to AI developers to deploy their deep learning workloads to the
desired hardware without a lock in. We currently have functioning backends
for Intel, Nvidia*, and AMD* GPU either leveraging kernel libraries
such as clDNN and cuDNN directly or utilizing PlaidML to compile for codegen
and emit OpenCL, OpenGL, LLVM, Cuda, and Metal. Please refer to [Architecture
and features] section to learn more about how we plan to take advantage of
both solutions using hybrid transformer. We expect to have stable support for aformentioned GPUs
in the early second half of 2019. In the similar time frame, we plan
to release multinode support.
Additionally, we are excited about providing support for our upcoming deep learning
accelerators such as NNP (Neural Network Processor) via nGraph compiler
stack, and early adopters will be able test them in 2019.
| Backend | supported
|-----------------------------------------------|-------------------
| Intel® Architecture CPU | :heavy_check_mark:
| Intel® Architecture GPUs | Functional via clDNN and PlaidML
| AMD* GPUs | Functional via PlaidML
| Nvidia* GPUs | Functional via cuDNN and PlaidML
| Intel® Nervana™ Neural Network Processor (NNP)| Functional
| Upcoming DL accelerators | Functional and will be announced in the near future
[Architecture and features]: ./ABOUT.md
[Upcoming DL accelerators]: https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/vision-accelerator-design-product-brief.pdf
[import it]: http://ngraph.nervanasys.com/docs/latest/howto/import.html
[ONNXIFI]: https://github.com/onnx/onnx/blob/master/docs/ONNXIFI.md
[ONNX Runtime]:https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx
[WinML]: http://docs.microsoft.com/en-us/windows/ai
[How to]: https://ngraph.nervanasys.com/docs/latest/howto/index.html
[Framework integration guide]: https://ngraph.nervanasys.com/docs/latest/frameworks/index.html
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment