Merge remote-tracking branch 'origin/master' into r0.10

e10fa7d2 · Adam Procter · f8a0f784 · ca4437bb · e10fa7d2 · e10fa7d2
Commit e10fa7d2 authored Nov 28, 2018 by Adam Procter
5 changed files
--- a/cmake/external_mkldnn.cmake
+++ b/cmake/external_mkldnn.cmake
@@ -114,6 +114,7 @@ if(${CMAKE_VERSION} VERSION_LESS 3.2)
            -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
            -DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn
            -DMKLROOT=${MKL_ROOT}
+            "-DARCH_OPT_FLAGS=-march=${NGRAPH_TARGET_ARCH} -mtune=${NGRAPH_TARGET_ARCH}"
        TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp"
        STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp"
        DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download"
@@ -145,6 +146,7 @@ else()
            -DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}
            -DCMAKE_INSTALL_PREFIX=${EXTERNAL_PROJECTS_ROOT}/mkldnn
            -DMKLROOT=${MKL_ROOT}
+            "-DARCH_OPT_FLAGS=-march=${NGRAPH_TARGET_ARCH} -mtune=${NGRAPH_TARGET_ARCH}"
        TMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/tmp"
        STAMP_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/stamp"
        DOWNLOAD_DIR "${EXTERNAL_PROJECTS_ROOT}/mkldnn/download"

--- a/doc/sphinx/source/branding-notice.rst
+++ b/doc/sphinx/source/branding-notice.rst
@@ -6,19 +6,22 @@
 Branding Notice
 ===============
-The Intel® nGraph™ library is an open source project providing code and component 
+The Intel® nGraph Library and Compiler stack is an open source project providing 
-reference for many kinds of machine learning, deep learning, and DNN applications. 
+code and component reference for many kinds of machine learning, deep learning, 
+and DNN applications. 
-Documentation may include references to frontend frameworks, modules, extensions, 
+Our documentation may include references to various frontends / frameworks, 
-or other libraries that may be wholly or partially open source, or that may be 
+modules, extensions, or other libraries that may be wholly or partially open 
-claimed as the property of others.  
+source, or that may be claimed as the property of others.
+Intel, the Intel logo and Intel Nervana are trademarks of Intel Corporation or 
+its subsidiaries in the U.S. and/or other countries.
-Intel nGraph library core documentation
+Documentation notice
---------------------------------------
+---------------------
 .. note:: The branding notice below applies to code and documentation 
-   contributions intended to be added directly to Intel nGraph library core.   
+   contributions intended to be added directly to the Intel nGraph repo.   
 Use the first or most prominent usage with symbols as described below.
@@ -39,14 +42,15 @@ repeated use of the trademark / branding symbols.
 * Intel® nGraph™
-* Intel® nGraph™ library 
+* Intel® nGraph Library 
-    * nGraph library
+* Intel® nGraph Compiler
-    * ``ngraph`` API
-    * ``ngraph`` library
+* Intel® nGraph Backend 
-    * ``ngraph`` backend
-    * nGraph abstraction layer
+* Intel® nGraph API 
-    * neon™ frontend framework 
+* Movidius™ Myriad™ 
 * Intel® Math Kernel Library
@@ -59,3 +63,20 @@ repeated use of the trademark / branding symbols.
 * Intel® Nervana™ Graph (deprecated)
+Optimization Notices
+====================
+Software and workloads used in performance tests may have been optimized for 
+performance only on Intel microprocessors. Performance tests, such as SYSmark 
+and MobileMark, are measured using specific computer systems, components, 
+software, operations and functions. Any change to any of those factors may 
+cause the results to vary. You should consult other information and performance 
+tests to assist you in fully evaluating your contemplated purchases, including 
+the performance of that product when combined with other products.  For more 
+complete information visit http://www.intel.com/benchmarks.  
+Intel technologies' features and benefits depend on system configuration and may 
+require enabled hardware, software or service activation. Performance varies 
+depending on system configuration. No computer system can be absolutely secure. 
+Check with your system manufacturer or retailer or learn more at intel.com. 
--- a/doc/sphinx/source/conf.py
+++ b/doc/sphinx/source/conf.py
@@ -62,7 +62,7 @@ source_suffix = '.rst'
 master_doc = 'index'
 # General information about the project.
-project = u'Intel® nGraph Library'
+project = u'nGraph Compiler stack'
 copyright = '2018, Intel Corporation'
 author = 'Intel Corporation'

--- a/doc/sphinx/source/index.rst
+++ b/doc/sphinx/source/index.rst
@@ -28,12 +28,13 @@ See the latest :doc:`project/release-notes`.
   :width: 599px
-nGraph is an open-source C++ library, compiler, and runtime accelerator for 
+nGraph is an open-source C++ library, compiler stack, and runtime accelerator 
-software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph 
+for software engineering in the :abbr:`Deep Learning (DL)` ecosystem. nGraph 
 simplifies development and makes it possible to design, write, compile, and
-deploy :abbr:`Deep Neural Network (DNN)`-based solutions. A more detailed 
+deploy :abbr:`Deep Neural Network (DNN)`-based solutions that can be adapted and 
-explanation of the feature set of nGraph Compiler, as well as a high-level 
+deployed across many frameworks and backends. A more detailed explanation, as
-overview, can be found on our project :doc:`project/about`. 
+well as a high-level overview, can be found on our project :doc:`project/about`.  
+For more generalized discussion on the ecosystem, see the `ecosystem`_ document.
 .. _quickstart:
@@ -89,7 +90,7 @@ We have many documentation pages to help you get started.
   Intel Movidius™ Myriad™ 2 (VPU), Coming soon, Yes
-.. note:: The Library code is under active development as we're continually 
+.. note:: The code in this repo is under active development as we're continually 
   adding support for more kinds of DL models and ops, compiler optimizations, 
   and backend optimizations.
@@ -131,3 +132,4 @@ Indices and tables
 .. _contributions: https://github.com/NervanaSystems/ngraph#how-to-contribute
 .. _TensorFlow bridge to nGraph: https://github.com/NervanaSystems/ngraph-tf/blob/master/README.md
 .. _Compiling MXNet with nGraph: https://github.com/NervanaSystems/ngraph-mxnet/blob/master/README.md
+.. _ecosystem: https://github.com/NervanaSystems/ngraph/blob/master/ecosystem-overview.md
--- a/doc/sphinx/source/project/about.rst
+++ b/doc/sphinx/source/project/about.rst
-.. about: 
+. about: 
 Architecture, Features, FAQs
@@ -15,154 +15,124 @@ Architecture, Features, FAQs
 nGraph Compiler stack architecture
 ==================================
+The diagram below represents our current Beta release stack. In the
-The diagram below represents our current |release| release stack. Please 
+diagram, nGraph components are colored in gray. Please note that the
-note that the stack diagram is simplified to show how nGraph executes deep 
+stack diagram is simplified to show how nGraph executes deep learning
-learning workloads with two hardware backends; however, many other deep 
+workloads with two hardware backends; however, many other deep learning
-learning frameworks and backends currently are functioning. 
+frameworks and backends currently are functioning.
 .. figure:: ../graphics/stackngrknl.png
-    :width: 455px
+   :alt: 
-    :alt: Current Beta release stack
+Bridge
-    Simplified stack diagram for nGraph Compiler and components Beta 
+^^^^^^
-Starting from the top of the diagram, we present a simplified view of the nGraph 
+Starting from the top of the stack, nGraph receives a computational
-Intermediate Representation (IR). The nGraph IR is a format which works with a 
+graph from a deep learning framework such as TensorFlow\* or MXNet\*.
-framework such as TensorFlow* or MXNet* when there is a corresponding "Bridge"
+The computational graph is converted to an nGraph internal
-or import method, such as from NNVM or via `ONNX`_. Once the nGraph IR can begin 
+representation by a bridge created for the corresponding framework.
-using nGraph's Core ops, components lower in the stack can begin parsing and 
-pattern-matching subgraphs for device-specific optimizations; these are then 
-encapsulated. This encapsulation is represented on the diagram as the colored 
-background between the ``ngraph`` kernel(s) and the the stack above.
-Note that everything at or below the **Kernel APIs** and **Subgraph APIs** gets 
-executed "automatically" during training runs. In other words, the accelerations 
-are automatic: parts of the graph that are not encapsulated default to framework 
-implementation when executed. For example, if nGraph optimizes ResNet50 for 
-TensorFlow, the same optimization can be readily applied to the NNVM/MXNet 
-implementation of ResNet50. This works efficiently because the nGraph 
-:abbr:`(IR) Intermediate Representation`, which keeps the input and output 
-semantics of encapsulated subgraphs, rebuilds an encapsulated subgraph that can 
-efficiently make use or re-use of operations. Such an  approach significantly 
-cuts down on the time needed to compile; when we're not relying upon the 
-framework's ops alone, memory management and data layouts can be more efficiently 
-applied to the hardware backends in use.    
-The :doc:`nGraph Core <../ops/index>` uses a strongly-typed and platform-neutral 
-:abbr:`(IR) Intermediate Representation` to construct a "stateless" graph. Each 
-node, or ``op``, in the graph corresponds to one :term:`step` in a computation, 
-where each step produces zero or more tensor outputs from zero or more tensor 
-inputs.  
-After construction, our Hybrid transformer takes the IR, further partitions it 
-into subgraphs, and assigns them to the best-performing backend. There are two 
-hardware backends shown in the stack diagram to demonstrate nGraph's graph 
-partitioning. The Hybrid transformer assigns complex operations (subgraphs) to 
-the Intel® Nervana™ :abbr:`Neural Network Processor (NNP)`, or to a different 
-CPU backend to expedite the computation, and the remaining operations default 
-to CPU. In the future, we will further expand the capabilities of Hybrid 
-transformer by enabling more features, such as localized cost modeling and 
-memory sharing, when the next generation of :abbr:`NNP (Neural Network Processor)` 
-is released. In the meantime, your deep learning software engineering or modeling 
-can be confidently built upon this stable anchorage.  
-The  Intel® Architecture :abbr:`IA (Intel® Architecture)` transformer provides 
-two modes that reduce compilation time, and have already been shown as useful 
-for training, deploying, and retraining a deep learning workload in production. 
-For example, in our tests, DEX mode reduced ResNet50 compilation time by 30X. 
-We are excited to continue our work in enabling distributed training, and we 
-plan to expand the nodes to 256 in Q4 ‘18. Additionally, we are testing model 
-parallelism in addition to data parallelism.  
-.. note::  In this Beta release, nGraph via Bridge code supports only :abbr:`Just In Time (JiT)` 
-   compilation; the nGraph ONNX companion tool supports dynamic graphs and will 
-   add additional support for Ahead of Time compilation in the official release. 
+An nGraph bridge examines the whole graph to pattern match subgraphs
+which nGraph knows how to execute, and these subgraphs are encapsulated.
+Parts of the graph that are not encapsulated will default to framework
+implementation when executed.
+nGraph Core
+^^^^^^^^^^^
+nGraph uses a strongly-typed and platform-neutral
+``Intermediate Representation (IR)`` to construct a "stateless"
+computational graph. Each node, or op, in the graph corresponds to one
+``step`` in a computation, where each step produces zero or more tensor
+outputs from zero or more tensor inputs.
+This allows nGraph to apply its state of the art optimizations instead
+of having to follow how a particular framework implements op execution,
+memory management, data layouts, etc.
+In addition, using nGraph IR allows faster optimization delivery for
+many of the supported frameworks. For example, if nGraph optimizes
+ResNet\* for TensorFlow\ *, the same optimization can be readily applied
+to MXNet* or ONNX\* implementations of ResNet\*.
+Hybrid Transformer
+^^^^^^^^^^^^^^^^^^
+Hybrid transformer takes the nGraph IR, and partitions it into
+subgraphs, which can then be assigned to the best-performing backend.
+There are two hardware backends shown in the stack diagram to
+demonstrate this graph partitioning. The Hybrid transformer assigns
+complex operations (subgraphs) to Intel® Nervana™ Neural Network
+Processor (NNP) to expedite the computation, and the remaining
+operations default to CPU. In the future, we will further expand the
+capabilities of Hybrid transformer by enabling more features, such as
+localized cost modeling and memory sharing.
+Once the subgraphs are assigned, the corresponding backend will execute
+the IR.
+Backends
+^^^^^^^^
+Focusing our attention on the CPU backend, when the IR is passed to the
+Intel® Architecture (IA) transformer, it can be executed in two modes:
+Direct EXecution (DEX) and code generation (``codegen``).
+In ``codegen`` mode, nGraph generates and compiles code which can either
+call into highly optimized kernels like MKL-DNN or JITers like Halide.
+Although our team wrote kernels for nGraph for some operations, nGraph
+leverages existing kernel libraries such as MKL-DNN, Eigen, and MLSL.
+MLSL library is called when nGraph executes distributed training. At the
+time of the nGraph Beta release, nGraph achieved state of the art
+results for ResNet50 with 16 nodes and 32 nodes for TensorFlow\* and
+MXNet\*. We are excited to continue our work in enabling distributed
+training, and we plan to expand to 256 nodes in Q4 ‘18. Additionally, we
+are testing model parallelism in addition to data parallelism.
+The other mode of execution is Direct EXecution (DEX). In DEX mode,
+nGraph can execute the operations by directly calling associated kernels
+as it walks though the IR instead of compiling via ``codegen``. This
+mode reduces the compilation time, and it will be useful for training,
+deploying, and retraining a deep learning workload in production. In our
+tests, DEX mode reduced ResNet50 compilation time by 30X.
+nGraph further tries to speed up the computation by leveraging
+multi-threading and graph scheduling libraries such as OpenMP and TBB
+Flow Graph.
 .. _features:
 Features
 ========
-The nGraph :abbr:`(IR) Intermediate Representation` contains a combination of 
+nGraph performs a combination of device-specific and non-device-specific
-device-specific and non-device-specific optimization :
+optimizations:
-* **Fusion** -- Fuse multiple ops to to decrease memory usage.
+-  **Fusion** -- Fuse multiple ops to to decrease memory usage.
-* **Data layout abstraction** -- Make abstraction easier and faster with nGraph 
+-  **Data layout abstraction** -- Make abstraction easier and faster
-  translating element order to work best for a given or available device.
+   with nGraph translating element order to work best for a given or
-* **Data reuse** -- Save results and reuse for subgraphs with the same input.
+   available device.
-* **Graph scheduling** -- Run similar subgraphs in parallel via multi-threading.
+-  **Data reuse** -- Save results and reuse for subgraphs with the same
-* **Graph partitioning** -- Partition subgraphs to run on different devices to 
+   input.
-  speed up computation; make better use of spare CPU cycles with nGraph. 
+-  **Graph scheduling** -- Run similar subgraphs in parallel via
-* **Memory management** -- Prevent peak memory usage by intercepting a graph 
+   multi-threading.
-  with or by a "saved checkpoint," and to enable data auditing. 
+-  **Graph partitioning** -- Partition subgraphs to run on different
-* **Data layout abstraction** -- Make abstraction easier and faster with nGraph 
+   devices to speed up computation; make better use of spare CPU cycles
-  translating element order to work best for whatever given or available device.  
+   with nGraph.
+-  **Memory management** -- Prevent peak memory usage by intercepting a
-.. important:: See :doc:`../ops/index` to learn the nGraph means for graph computations.
+   graph with or by a "saved checkpoint," and to enable data auditing.
+-  **Data layout abstraction** -- Make abstraction easier and faster
-.. Our design philosophy is that the graph is not a script for running kernels; 
+   with nGraph translating element order to work best for whatever given
-   rather, our compilation will match ``ops`` to appropriate available kernels
+   or available device.
-   (or when available, such as with CPU cycles). Thus, we expect that adding of 
-   new Core ops should be infrequent and that most functionality instead gets 
+Beta Limitations
-   added with new functions that build sub-graphs from existing core ops.   
+----------------
+In this Beta release, nGraph only supports Just In Time compilation, but
-.. _portable:
+we plan to add support for Ahead of Time compilation in the official
+release of nGraph. nGraph currently has limited support for dynamic
-Portable
+graphs.
--------
-One of nGraph's key features is **framework neutrality**. While we currently 
-support :doc:`three popular <../framework-integration-guides>` frameworks with 
-pre-optimized deployment runtimes for training :abbr:`Deep Neural Network (DNN)`, 
-models, you are not limited to these when choosing among frontends. Architects 
-of any framework (even those not listed above) can use our documentation for how
-to :doc:`compile and run <../howto/execute>` a training model and design or tweak 
-a framework to bridge directly to the nGraph compiler. With a *portable* model 
-at the core of your :abbr:`DL (Deep Learning)` ecosystem, it's no longer necessary 
-to bring large datasets to the model for training; you can take your model -- in 
-whole, or in part -- to where the data lives and save potentially significant 
-or quantifiable machine resources.  
-.. _adaptable: 
-Adaptable
---------
-We've recently begun support for the `ONNX`_ format. Developers who already have 
-a "trained" :abbr:`DNN (Deep Neural Network)` model can use nGraph to bypass 
-significant framework-based complexity and :doc:`import it <../howto/import>` 
-to test or run on targeted and efficient backends with our user-friendly 
-Python-based API. See the `ngraph onnx companion tool`_ to get started. 
-.. _deployable:
-Deployable
----------
-It's no secret that the :abbr:`DL (Deep Learning)` ecosystem is evolving 
-rapidly. Benchmarking comparisons can be blown steeply out of proportion by 
-subtle tweaks to batch or latency numbers here and there. Where traditional 
-GPU-based training excels, inference can lag and vice versa. Sometimes what we
-care about is not "speed at training a large dataset" but rather latency 
-compiling a complex multi-layer algorithm locally, and then outputting back to 
-an edge network, where it can be analyzed by an already-trained model. 
-Indeed, when choosing among topologies, it is important to not lose sight of 
-the ultimate deployability and machine-runtime demands of your component in
-the larger ecosystem. It doesn't make sense to use a heavy-duty backhoe to 
-plant a flower bulb. Furthermore, if you are trying to develop an entirely 
-new genre of modeling for a :abbr:`DNN (Deep Neural Network)` component, it 
-may be especially beneficial to consider ahead of time how portable and 
-mobile you want that model to be within the rapidly-changing ecosystem.  
-With nGraph, any modern CPU can be used to design, write, test, and deploy 
-a training or inference model. You can then adapt and update that same core 
-model to run on a variety of backends  
 .. _no-lockin:
@@ -212,10 +182,13 @@ framework, and the result is a function that can be compiled from a framework.
 A fully-compiled function that makes use of bridge code thus becomes a "function
 graph", or what we sometimes call an **nGraph graph**.  
-.. note:: Low-level nGraph APIs are not accessible *dynamically* via bridge code;
+.. important:: See :doc:`../ops/index` to learn about Core Ops.
-   this is the nature of stateless graphs. However, do note that a graph with a 
-   "saved" checkpoint can be "continued" to run from a previously-applied 
+Our design philosophy is that the graph is not a script for running kernels; 
-   checkpoint, or it can loaded as static graph for further inspection.
+rather, our compilation will match ``ops`` to appropriate available kernels
+(or when available, such as with CPU cycles). Thus, we expect that adding of 
+new Core ops should be infrequent and that most functionality instead gets 
+added with new functions that build sub-graphs from existing core ops.   
 For a more detailed dive into how custom bridge code can be implemented, see our 
 documentation on how to :doc:`../howto/execute`. To learn how TensorFlow and 
@@ -228,23 +201,6 @@ MXNet currently make use of custom bridge code, see the section on
    JiT Compiling for computation
-Given that we have no way to predict how many other frameworks designed around 
-model, workload, or framework-specific purposes there may be, it would be  
-impossible for us to create bridges for every framework that currently exists 
-(or that will exist in the future). Although we only support a few frameworks, 
-we provide documentation to help developers and engineers figure out how to 
-get custom solutions working, such as for edge cases. 
-.. csv-table::
-   :header: "Framework", "Bridge Available?", "ONNX Support?"
-   :widths: 27, 10, 10
-   TensorFlow, Yes, Yes
-   MXNet, Yes, Yes
-   PaddlePaddle, Coming Soon, Yes
-   PyTorch, No, Yes
-   Other, Write your own, Custom
 How do I run an inference model?
 --------------------------------
@@ -269,7 +225,7 @@ our `arXiv paper`_ from the 2018 SysML conference.
 .. _arXiv paper: https://arxiv.org/pdf/1801.08058.pdf
 .. _ONNX: http://onnx.ai
-.. _NNVM: http://
+.. _NNVM: https://github.com/dmlc/nnvm
 .. _nGraph ONNX companion tool: https://github.com/NervanaSystems/ngraph-onnx
 .. _Intel® MKL-DNN: https://github.com/intel/mkl-dnn
 .. _Movidius: https://developer.movidius.com/