Leona/patternmatchdoc (#1057)

* editing how to execute computation file for clarity and linenos * Add placeholder for runtime docs * Update section on backends, interpreter, and FPGA options * add updated master to fix python_ci * Weird autosummary issue reverted * Clarify new section * fix up docs * Update pattern matcher doc based on Nik's presentation slides WIP * Update doc structure and examples * remove old folder * Fix broken Tensorview refs * . helping people document code more efficiently * PR review edits * Finish PR review comment fixes so far * split patternmatcher PR * small fixes to PM docs * remove mark tags from source code * Final PR cleanup edits

Leona/patternmatchdoc (#1057)
* editing how to execute computation file for clarity and linenos * Add placeholder for runtime docs * Update section on backends, interpreter, and FPGA options * add updated master to fix python_ci * Weird autosummary issue reverted * Clarify new section * fix up docs * Update pattern matcher doc based on Nik's presentation slides WIP * Update doc structure and examples * remove old folder * Fix broken Tensorview refs * . helping people document code more efficiently * PR review edits * Finish PR review comment fixes so far * split patternmatcher PR * small fixes to PM docs * remove mark tags from source code * Final PR cleanup edits
a2732033 · L.S. Cook · Scott Cyphers · 7758cf5d · a2732033 · a2732033
Commit a2732033 authored Jun 26, 2018 by L.S. Cook Committed by Scott Cyphers Jun 26, 2018
26 changed files
--- a/doc/sphinx/source/conf.py
+++ b/doc/sphinx/source/conf.py
@@ -56,7 +56,7 @@ source_suffix = '.rst'
 master_doc = 'index'

 # General information about the project.
-project = u'Intel® nGraph™ library'
+project = u'Intel® nGraph Library'
 copyright = '2018, Intel Corporation'
 author = 'Intel Corporation'


--- a/doc/sphinx/source/optimize/generic.rst
+++ b/doc/sphinx/source/optimize/generic.rst
-.. generic-frameworks.rst
+.. frameworks/generic.rst


-Activating nGraph on generic frameworks
-========================================
+Activate nGraph |trade| on generic frameworks
+=============================================

 This section details some of the *configuration options* and some of the 
 *environment variables* that can be used to tune for optimal performance when 
@@ -62,12 +62,12 @@ use these configuration settings.
 * ``KMP_AFFINITY`` Enables the runtime library to bind threads to physical 
  processing units. 
 * ``KMP_SETTINGS`` Enables (``true``) or disables (``false``) the printing of 
-  OpenMP* runtime library environment variables during program execution.
+  OpenMP\* runtime library environment variables during program execution.
 * ``OMP_NUM_THREADS`` Specifies the number of threads to use.


-nGraph-enabled Intel® Xeon®
-===========================
+nGraph-enabled Intel® Xeon® 
+============================

 The list below includes recommendations on data layout, parameters, and 
 application configuration to achieve best performance running DNN workloads on 
@@ -88,8 +88,10 @@ and activated as follows:
 Memory allocation 
 -----------------

-Buffer pointers should be aligned at the 64-byte boundary. NUMA policy should be 
-configured for local memory allocation (``numactl --localloc``)
+Buffer pointers should be aligned on 64-byte boundaries. NUMA policy should be 
+configured for local memory allocation (``numactl --localloc``). 
+
+

 Convolution shapes
 ^^^^^^^^^^^^^^^^^^
@@ -129,13 +131,11 @@ Intra-op and inter-op parallelism
 * ``intra_op_parallelism_threads``
 * ``inter_op_parallelism_threads``

-Some frameworks, like Tensorflow, use these settings to improve performance; 
-however, they are often not sufficient to achieve optimal performance. 
-Framework-based adjustments cannot access the underlying  NUMA configuration in 
-multi-socket Intel Xeon processor-based platforms, which is a key requirement for
-many kinds of inference-engine computations.  See the next section on 
-NUMA performance to learn more about this performance feature available to systems
-utilizing nGraph. 
+Some frameworks, like TensorFlow\*, use these settings to improve performance; 
+however, they are often not sufficient for optimal performance. Framework-based adjustments cannot access the underlying  NUMA configuration in multi-socket 
+Intel Xeon processor-based platforms, which is a key requirement for many kinds 
+of inference-engine computations.  See the next section on NUMA performance to 
+learn more about this performance feature available to systems utilizing nGraph. 


 NUMA performance 

--- a/doc/sphinx/source/optimize/index.rst
+++ b/doc/sphinx/source/optimize/index.rst
-.. optimize/index: 
+.. framework/index: 

 #############################
 Integrate Generic Frameworks   
 #############################

-This section, written for framework architects or engineers who want 
-to optimize a generic, brand new or less widely-supported framework, we
-provide some of our learnings from the work we've done in developing 
-"framework direct optimizations (DO)" and custom bridge code, such as 
-that for our `ngraph tensorflow bridge`_ code.
+In this section, written for framework architects or engineers who want 
+to optimize brand new, generic, or less widely-supported frameworks, we provide 
+some of our learnings from our "framework Direct Optimization (framework DO)" 
+work and custom bridge code, such as that for our `ngraph tensorflow bridge`_.
+
+

 .. important:: This section contains articles for framework owners or developers
   who want to incorporate the nGraph library directly into their framework and 
@@ -21,6 +22,7 @@ that for our `ngraph tensorflow bridge`_ code.
   generic.rst


+
 When using a framework to run a model or deploy an algorithm on nGraph 
 devices, there are some additional configuration options that can be 
 incorporated -- manually on the command line or via scripting -- to improve 
@@ -48,8 +50,10 @@ the pack by providing a means for the data scientist to obtain reproducible
 results. The other challenge is to provide sufficient documentation, or to 
 provide sufficient hints for how to do any "fine-tuning" for specific use cases. 

-How this has worked in creating the :doc:`the direct optimizations <../framework-integration-guides>` 
-we've shared with the developer community, our `engineering teams carefully tune the workload to extract best performance`_ 
+How this has worked in creating the 
+:doc:`the direct optimizations <../framework-integration-guides>` we've shared 
+with the developer community, our engineering teams carefully 
+`tune the workload to extract best performance`_ 
 from a specific :abbr:`DL (Deep Learning)` model embedded in a specific framework 
 that is training a specific dataset. Our forks of the frameworks adjust the code 
 and/or explain how to set the parameters that achieve reproducible results. 
@@ -82,10 +86,11 @@ updates.
 .. [#1] Benchmarking performance of DL systems is a young discipline; it is a
   good idea to be vigilant for results based on atypical distortions in the 
   configuration parameters. Every topology is different, and performance 
-   increases or slowdowns can be attributed to multiple means.    
+   changes can be attributed to multiple causes. Also watch out for the word "theoretical" in comparisons; actual performance should not be 
+   compared to theoretical performance.     


 .. _ngraph tensorflow bridge: http://ngraph.nervanasys.com/docs/latest/framework-integration-guides.html#tensorflow
-.. _engineering teams carefully tune the workload to extract best performance: https://ai.intel.com/accelerating-deep-learning-training-inference-system-level-optimizations
+.. _tune the workload to extract best performance: https://ai.intel.com/accelerating-deep-learning-training-inference-system-level-optimizations
 .. _a few small: https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi
 .. _Movidius: https://www.movidius.com/
\ No newline at end of file
--- a/doc/sphinx/source/fusion/graph-rewrite.rst
+++ b/doc/sphinx/source/fusion/graph-rewrite.rst
+.. fusion/graph-rewrite.rst:
+
+Using ``GraphRewrite`` to fuse ops
+-----------------------------------
+
+
+Exact pattern matching
+~~~~~~~~~~~~~~~~~~~~~~
+
+For the example of ``$-(-A) = A$``, various graphs of varying complexity can be 
+created and overwritten with recipes for pattern-matching + graph-rewrite. To 
+get started, a simple example for a trivial graph, followed by more complex 
+examples: 
+
+
+|image3|
+
+.. code-block:: cpp 
+
+    auto a = make_shared<op::Parameter>(element::i32, shape);
+    auto absn = make_shared<op::Abs>(a);
+    auto neg1 = make_shared<op::Negative>(absn);
+    auto neg2 = make_shared<op::Negative>(neg1);
+
+
+|image4|
+
+	
+.. code-block:: cpp 
+
+    auto a = make_shared<op::Parameter>(element::i32, shape);
+    auto b = make_shared<op::Parameter>(element::i32, shape);
+    auto c = a + b;
+    auto absn = make_shared<op::Abs>(c);
+    auto neg1 = make_shared<op::Negative>(absn);
+    auto neg2 = make_shared<op::Negative>(neg1);
+
+
+Label AKA ``.`` in regexes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+
+|image5|
+
+For the code below, ``element::f32`` will still match integer Graph1 and 
+Graph2 
+
+.. code-block:: cpp
+
+    //note element::f32, will still match integer Graph1 and Graph2 
+    auto lbl = std::make_shared<pattern::op::Label>(element::f32, Shape{});  
+    auto neg1 = make_shared<op::Negative>(lbl);
+    auto neg2 = make_shared<op::Negative>(neg1);
+
+Constructing labels from existing nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Double Negative w/ Add
+^^^^^^^^^^^^^^^^^^^^^^
+
+|image6|
+
+
+.. code-block:: cpp
+
+    auto a = make_shared<op::Parameter>(element::i32, shape);
+    //`lbl` borrows the type and shape information from `a`
+    auto lbl = std::make_shared<pattern::op::Label>(a);  
+    auto neg1 = make_shared<op::Negative>(a);
+    auto neg2 = make_shared<op::Negative>(neg1);
+
+Double Negative w/ Sub  
+^^^^^^^^^^^^^^^^^^^^^^
+
+|image7|
+
+
+Predicates are of type ``std::function<bool(std::shared_ptr<Node>)>``
+
+
+.. code-block:: cpp
+   
+    //predicates are of type std::function<bool(std::shared_ptr<Node>)>
+    auto add_or_sub = [](std::shared_ptr<Node> n) {
+        return std::dynamic_pointer_cast<op::Add>(n) != nullptr ||
+            std::dynamic_pointer_cast<op::Sub>(n) != nullptr
+    };
+
+    auto lbl = std::make_shared<pattern::op::Label>(
+        element::f32, 
+        Shape{}, 
+        add_or_sub
+    );  
+    auto neg1 = make_shared<op::Negative>(a);
+    auto neg2 = make_shared<op::Negative>(neg1);
+
+
+
+Passes that use Matcher
+=======================
+
+* CPUFusion (GraphRewrite)
+* CoreFusion (GraphRewrite)
+* ReshapeElimination (GraphRewrite)
+* AlgebraicSimplification
+* CPUPostLayoutOptimizations (GraphRewrite)
+* CPURnnMatFusion
+* and many more...
+
+
+
+Register `simplify_neg` handler
+--------------------------------
+
+::
+
+    static std::unordered_map<std::type_index, std::function<bool(std::shared_ptr<Node>)>>
+            initialize_const_values_to_ops()
+        {
+            return std::unordered_map<std::type_index, std::function<bool(std::shared_ptr<Node>)>>({
+                {TI(op::Add), simplify_add},
+                {TI(op::Multiply), simplify_multiply},
+                {TI(op::Sum), simplify_sum},
+                {TI(op::Negative), simplify_neg}
+            });
+        }
+
+Add a fusion 
+~~~~~~~~~~~~
+
+$max(0, A) = Relu(A)$ 
+
+
+
+Pattern for capturing 
+~~~~~~~~~~~~~~~~~~~~~
+
+|image11|
+
+$max(0, A) = Relu(A)$  
+::
+
+    namespace ngraph
+    {
+        namespace pass
+        {
+            class CoreFusion;
+        }
+    }
+    
+    class ngraph::pass::CoreFusion : public ngraph::pass::GraphRewrite
+    {
+    public:
+        CoreFusion()
+            : GraphRewrite()
+        {
+            construct_relu_pattern();
+        }
+
+        //this should go in a cpp file.
+        void construct_relu_pattern()
+        {
+            auto iconst0 = ngraph::make_zero(element::i32, Shape{});
+            auto val = make_shared(iconst0);
+            auto zero = make_shared(iconst0, nullptr, NodeVector{iconst0});
+
+            auto broadcast_pred = [](std::shared_ptr n) {
+                return static_cast(std::dynamic_pointer_cast(n));
+            };
+            auto skip_broadcast = std::make_shared(zero, broadcast_pred);
+            auto max = make_shared(skip_broadcast, val);
+
+        pattern::graph_rewrite_callback callback = [val, zero](pattern::Matcher& m) { 
+                NGRAPH_DEBUG << "In a callback for construct_relu_pattern against "
+                            << m.get_match_root()->get_name();
+
+                auto pattern_map = m.get_pattern_map();
+                auto mzero = m.get_pattern_map()[zero];
+                if (!ngraph::is_zero(mzero))
+                {
+                    NGRAPH_DEBUG << "zero constant = " << mzero->get_name() << " not equal to 0n";
+                    return false;
+                }
+                auto mpattern = m.get_match_root();
+
+                auto cg = shared_ptr(new op::Relu(pattern_map[val]));
+                ngraph::replace_node(m.get_match_root(), cg);
+                return true;
+            };
+
+            auto m = make_shared(max, callback); 
+            this->add_matcher(m);
+        }
+    };
+            
+Recurrent patterns 
+------------------
+
+::
+  $ (((A + 0) + 0) + 0) = A$ 
+
+Equivalent to ``"A(BC)+A"`` in regexes 
+
+
+::
+  $ (((A + 0) + 0) + 0) = A$ 
+
+|image12|
+
+|image13|
+
+::
+    Shape shape{};
+    auto a = make_shared<op::Parameter>(element::i32, shape);
+    auto b = make_shared<op::Parameter>(element::i32, shape);
+    auto rpattern = std::make_shared<pattern::op::Label>(b);
+    auto iconst0 = ngraph::make_zero(element::i32, shape);
+    auto abs = make_shared<op::Abs>(a);
+    auto add1 = iconst0 + b;
+    auto add2 = iconst0 + add1;
+    auto add3 = iconst0 + add2;
+    auto padd = iconst0 + rpattern;
+    std::set<std::shared_ptr<pattern::op::Label>> empty_correlated_matches;
+    RecurrentMatcher rm(padd, rpattern, empty_correlated_matches, nullptr);
+    ASSERT_TRUE(rm.match(add3));
+
+.. |image3| image:: mg/pr1_graph2.png
+.. |image4| image:: mg/pr1_graph3.png
+.. |image5| image:: mg/pr1_pattern2.png
+.. |image6| image:: mg/pr1_graph4.png
+.. |image7| image:: mg/pr1_graph5.png
+.. |image8| image:: mg/pr2_graph1.png
+.. |image9| image:: mg/pr2_graph2.png
+.. |image10| image:: mg/pr2_pattern2.png
+.. |image11| image:: mg/fusion_pattern.png
+.. |image12| image:: mg/rp_graph1.png
+.. |image13| image:: mg/rp_pattern.png
\ No newline at end of file
--- a/doc/sphinx/source/fusion/index.rst
+++ b/doc/sphinx/source/fusion/index.rst
+.. fusion/index.rst: 
+
+
+Optimize Graphs 
+===============
+
+with nGraph Compiler fusions
+-----------------------------
+
+The nGraph Compiler is an optimizing compiler. As such, it performs a series 
+of optimization passes over a given function graph to translate it into a 
+semantically-equivalent and inherently-optimized graph with superior runtime 
+characteristics for any of nGraph's current or future backends. Indeed, a  
+framework's capability to increase training performance or to reduce inference 
+latency by simply adding another device of *any* specialized form factor (CPU, 
+GPU, VPU, or FPGA) is one of the :doc:`key benefits <../project/about>` of 
+developing upon a framework that uses the nGraph Compiler.   
+
+In handling a :term:`function graph`, there are many ways to describe what 
+happens when we translate the framework's output of ops into an nGraph 
+graph. :term:`Fusion` is the term we shall use in our documentation, but the the 
+action also can be described as: *combining*, *folding*, *collapsing*, or 
+*merging* of graph functions. The most common use case is to *fuse* a subgraph 
+from the function graph into :doc:`one of the nGraph Core ops <../ops/index>`. 
+
+Optimization passes may include algebraic simplifications, domain-specific 
+simplifications, and fusion. Most passes share the same mode of operation (or 
+the same operational structure) and consist of two stages:
+
+#. Locating a list of potential transformation candidates (usually, subgraphs) 
+   in the given graph.
+#. Transforming the selected candidates into semantically-equivalent subgraphs 
+   that run faster and/or with less memory.
+
+Optimization passes can be programmed ahead of time if you know what your graph 
+will look like when it's ready to be executed, or the optimization passes can 
+be figured out manually with *Interpreter* mode on a stateless graph. 
+
+Let us first consider an example. A user would like to execute a simple graph 
+that describes the following arithmetic expression:
+
+:math:`a + b * 1` or :math:`Add(a, Mul(b, 1))` 
+
+In the above expressions, `1` is an identity element; any element multiplied by 
+the identity element is equal to itself. This is the same as saying:
+
+:math:`b * 1 = b` 
+
+The writer of an optimization pass which uses algebraic simplification would 
+probably want to first ``locate`` all multiplication expressions where 
+multiplicands are multiplied by `1` (for stage 1) and to then ``transform``, 
+``simplify``, or ``replace`` those expressions with just their multiplicands 
+(for stage 2).  
+
+To make the work of an optimization pass writer easier, the nGraph library 
+includes facilities that enable the *finding* of relevant candidates using 
+pattern matching (via ``pattern/matcher.hpp``), and the *transforming* of the 
+original graph into a condensed version (via ``pass/graph_rewrite.hpp``).
+
+Let's consider each of the two in more detail and many ways they can help the 
+work of the optimization pass writer.
+
+
+
+.. toctree::
+   :maxdepth: 1 
+
+   graph-rewrite.rst
+
+
+
+
--- a/doc/sphinx/source/fusion/mg/cat.jpg
+++ b/doc/sphinx/source/fusion/mg/cat.jpg
--- a/doc/sphinx/source/fusion/mg/fusion_pattern.png
+++ b/doc/sphinx/source/fusion/mg/fusion_pattern.png
--- a/doc/sphinx/source/fusion/mg/pr1_graph1.png
+++ b/doc/sphinx/source/fusion/mg/pr1_graph1.png
--- a/doc/sphinx/source/fusion/mg/pr1_graph2.png
+++ b/doc/sphinx/source/fusion/mg/pr1_graph2.png
--- a/doc/sphinx/source/fusion/mg/pr1_graph3.png
+++ b/doc/sphinx/source/fusion/mg/pr1_graph3.png
--- a/doc/sphinx/source/fusion/mg/pr1_graph4.png
+++ b/doc/sphinx/source/fusion/mg/pr1_graph4.png
--- a/doc/sphinx/source/fusion/mg/pr1_graph5.png
+++ b/doc/sphinx/source/fusion/mg/pr1_graph5.png
--- a/doc/sphinx/source/fusion/mg/pr1_pattern.png
+++ b/doc/sphinx/source/fusion/mg/pr1_pattern.png
--- a/doc/sphinx/source/fusion/mg/pr1_pattern2.png
+++ b/doc/sphinx/source/fusion/mg/pr1_pattern2.png
--- a/doc/sphinx/source/fusion/mg/pr2_graph1.png
+++ b/doc/sphinx/source/fusion/mg/pr2_graph1.png
--- a/doc/sphinx/source/fusion/mg/pr2_graph2.png
+++ b/doc/sphinx/source/fusion/mg/pr2_graph2.png
--- a/doc/sphinx/source/fusion/mg/pr2_graph3.png
+++ b/doc/sphinx/source/fusion/mg/pr2_graph3.png
--- a/doc/sphinx/source/fusion/mg/pr2_pattern2.png
+++ b/doc/sphinx/source/fusion/mg/pr2_pattern2.png
--- a/doc/sphinx/source/fusion/mg/rp_graph1.png
+++ b/doc/sphinx/source/fusion/mg/rp_graph1.png
--- a/doc/sphinx/source/fusion/mg/rp_pattern.png
+++ b/doc/sphinx/source/fusion/mg/rp_pattern.png
--- a/doc/sphinx/source/glossary.rst
+++ b/doc/sphinx/source/glossary.rst
@@ -33,6 +33,12 @@ Glossary
      The Intel nGraph library uses a function graph to represent an
      ``op``'s parameters and results.

+   fusion
+   
+      Fusion is the fusing, combining, merging, collapsing, or refactoring
+      of a graph's functional operations (``ops``) into one or more of
+      nGraph's core ops.   
+
   op

      An op represents an operation. Ops are stateless and have zero
@@ -98,6 +104,14 @@ Glossary
      Tensors are maps from *coordinates* to scalar values, all of the
      same type, called the *element type* of the tensor.

+   
+   Tensorview 
+
+      The interface backends implement for tensor use. When there are no more 
+      references to the tensor view, it will be freed when convenient for the 
+      backend.
+
+
   model description

      A description of a program's fundamental operations that are 

--- a/doc/sphinx/source/graphics/develop-without-lockin.png
+++ b/doc/sphinx/source/graphics/develop-without-lockin.png
--- a/doc/sphinx/source/howto/execute.rst
+++ b/doc/sphinx/source/howto/execute.rst
@@ -166,17 +166,16 @@ you switch between odd/even generations of variables on each update.

 Backends are responsible for managing storage. If the storage is off-CPU, caches 
 are used to minimize copying between device and CPU. We can allocate storage for 
-the three parameters and the return value as follows:
+the three parameters and the return value.

 .. literalinclude:: ../../../examples/abc/abc.cpp
   :language: cpp
   :lines: 41-46

-
-Each tensor is a shared pointer to a :doc:`../programmable/index/tensorview`, 
-the interface backends implement for tensor use. When there are no more references to the 
+Each tensor is a shared pointer to a :term:`Tensorview`, which is the interface 
+backends implement for tensor use. When there are no more references to the 
 tensor view, it will be freed when convenient for the backend. See the 
-:doc:`../programmable/index` documentation for details on ``TensorView ``.
+:doc:`../programmable/index` documentation for details on ``TensorView``.


 .. _initialize_inputs:

--- a/doc/sphinx/source/index.rst
+++ b/doc/sphinx/source/index.rst
@@ -71,8 +71,9 @@ Python-based API. See the `ngraph onnx companion tool`_ to get started.

   TensorFlow, Yes, Yes
   MXNet, Yes, Yes
+   PaddlePaddle, Coming Soon, Yes
   neon, none needed, Yes
-   PyTorch, Not yet, Yes
+   PyTorch, Coming Soon, Yes
   CNTK, Not yet, Yes
   Other, Not yet, Doable

@@ -140,13 +141,14 @@ Contents

   install.rst
   graph-basics.rst
+   fusion/index.rst
   howto/index.rst
   ops/index.rst
-   project/index.rst
   framework-integration-guides.rst
-   optimize/index.rst
+   frameworks/index.rst
   programmable/index.rst
   python_api/index.rst
+   project/index.rst




--- a/doc/sphinx/source/project/about.rst
+++ b/doc/sphinx/source/project/about.rst
 .. about: 

-About
-=====
+Overview
+========
+
+
+Welcome to the documentation site for nGraph™, an open-source C++ Compiler, 
+Library, and runtime suite for running training and inference on 
+:abbr:`Deep Neural Network (DNN)` models. nGraph is framework-neutral and can be 
+targeted for programming and deploying :abbr:`Deep Learning (DL)` applications 
+on the most modern compute and edge devices.   
+
+Features
+--------
+
+:ref:`no-lockin`
+:ref:`framework-flexibility`
+
+
+.. _no-lockin:
+
+Develop without lock-in
+~~~~~~~~~~~~~~~~~~~~~~~
+
+.. figure:: ../graphics/develop-without-lockin.png
+   :width: 650px
+  
+
+Indeed, capabilities to increase training performance or to reduce inference 
+latency by simply adding another device of *any* specialized form factor -- 
+whether it be more compute (CPU), GPU or VPU processing power, custom ASIC or 
+FPGA, or a yet-to-be invented generation of NNP or accelerator -- are the key 
+benefits for frameworks developers working with nGraph. Our commitment to bake 
+flexibility into our ecosystem ensures developers' freedom to design user-facing 
+APIs for various hardware deployments directly into the framework. 
+Developers working on things like edge-devices augmented by machine learning, or 
+large distributed training clusters, or those who simply want a framework 
+without restrictive lock-in to let users switch or upgrade backends quickly and 
+easily.  

-Welcome to nGraph™, an open-source C++ compiler library for running and 
-training :abbr:`Deep Neural Network (DNN)` models. This project is 
-framework-neutral and can target a variety of modern devices or platforms. 

 .. figure:: ../graphics/ngraph-ecosystem.png
   :width: 585px   
@@ -14,11 +46,11 @@ nGraph currently supports :doc:`three popular <../framework-integration-guides>`
 frameworks for :abbr:`Deep Learning (DL)` models through what we call 
 a :term:`bridge` that can be integrated during the framework's build time. 
 For developers working with other frameworks (even those not listed above), 
-we've created a :doc:`How to Guide <../howto/index>` so you can learn how to create 
-custom bridge code that can be used to :doc:`compile and run <../howto/execute>` 
-a training model.
+we've created a :doc:`How to Guide <../howto/index>` so you can learn how to 
+create custom bridge code that can be used to 
+:doc:`compile and run <../howto/execute>` a training model.  

-We've recently added initial support for the `ONNX`_ format. Developers who 
+Additionally We've recently added initial support for the `ONNX`_ format. Developers who 
 already have a "trained" model can use nGraph to bypass a lot of the 
 framework-based complexity and :doc:`../howto/import` to test or run it 
 on targeted and efficient backends with our user-friendly ``ngraph_api``. 
@@ -29,17 +61,14 @@ about how to adapt models to train and run efficiently on different devices.
 Supported platforms
 --------------------

-Initially-supported backends include:

 * Intel® Architecture Processors (CPUs), 
 * Intel® Nervana™ Neural Network Processor™ (NNPs), and 
 * NVIDIA\* CUDA (GPUs). 

-Tentatively in the pipeline, we plan to add support for more backends,
-including:

-* :abbr:`Field Programmable Gate Arrays (FPGA)` (FPGAs)
-* `Movidius`_ compute stick 
+
+

 .. note:: The library code is under active development as we're continually 
   adding support for more kinds of DL models and ops, framework compiler 
@@ -82,6 +111,8 @@ tensor outputs from zero or more tensor inputs. For a more detailed dive into
 how this works, read our documentation on how to :doc:`../howto/execute`.


+.. _framework-flexibility:
+
 How do I connect it to a framework? 
 ------------------------------------


--- a/doc/sphinx/source/project/index.rst
+++ b/doc/sphinx/source/project/index.rst
 .. project/index.rst


-Project Docs
-============
+More about nGraph
+==================

 This section contains documentation about the project and how to contribute.