WIP first pass edit review

bbe1c66c · LS Cook · cea849f7 · bbe1c66c
Commit bbe1c66c authored Feb 28, 2018 by LS Cook
Show whitespace changes
Inline Side-by-side

Showing with 90 additions and 74 deletions

execute.rst doc/sphinx/source/howto/execute.rst +90 -74

No files found.
--- a/doc/sphinx/source/howto/execute.rst
+++ b/doc/sphinx/source/howto/execute.rst
@@ -5,9 +5,17 @@ Executing a Computation
 #######################
 This section explains how to manually perform the steps that would normally be 
-performed by a framework :term:`bridge` to execute a computation. In order to 
+performed by a framework :term:`bridge` to execute a computation. Intel® nGraph 
-successfully run a computation, the entity (framework or user) must be able to 
+library is targeted toward automatic construction; it is far easier for a 
-do all of these things:
+processing unit (GPU, CPU, or custom silicon) to run a computation than it is 
+for a user to map out how that computation happens.  
+Here we will do all the bridge steps manually. Unfortunately, things that make 
+by-hand graph construction simpler tend to make automatic construction more 
+difficult, and vice versa. 
+In order to successfully run a computation, the entity (framework or user) must 
+be able to do all of these things:
 * :ref:`define_cmp`
 * :ref:`specify_bkd`
@@ -24,39 +32,51 @@ Define a Computation
 ====================
 To a :term:`framework`, a computation is simply a transformation of inputs to 
-outputs. To a user, a computation is a function whose body is a dataflow graph. 
+outputs. While a *framework bridge* can programmatically construct the graph 
-While a *framework bridge* can programmatically construct the graph from a 
+from a framework's representation of the computation, graph construction can be 
-framework's representation of the computation, graph construction can be somewhat 
+somewhat more tedious for users. To a user, who is usually interested in 
-more tedious for users.  Since nGraph is targeted toward automatic construction, 
+specific nodes (vertices) or edges of a computation that reveal "what is 
-we deconstruct here how this happens. 
+happening where", it can be helpful to think of a computation as a zoomed-out 
+and *stateless* dataflow graph where all of the nodes are well-defined and all
+of the edges are possible routes for executing a computation.  
+.. TODO
+.. image for representing nodes and edges 
 Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so 
 we will omit the namespace. Use of namespaces other than ``std`` will be 
-namespaces in ``ngraph``. For example, the ``op::Add`` refers to 
+namespaces in ``ngraph``. For example, the ``op::Add`` is assumed to refer to 
 ``ngraph::op::Add``.
-A computation's graph is constructed from ops; each a member of a
+A computation's graph is constructed from ops; each is a member of a subclass of 
-subclass of ``op::Op``, which, in turn, is a subclass of ``Node``. Not
+``op::Op``, which, in turn, is a subclass of ``Node``. Not all graphs are 
-all graphs are computation, but all graphs are composed entirely of
+computation, but all graphs are composed entirely of instances of ``Node``.  
-instances of ``Node``.  Computation graphs are only contain ``op::Op``
+Computation graphs contain only ``op::Op`` nodes.
-nodes.
+We mostly use shared pointers for nodes. In the nGraph APIs for :doc:`ops/index`, 
-We mostly use shared pointers for nodes,
+this is presented as ``std::shared_ptr<Node>``, which allows for the 
-i.e. ``std::shared_ptr<Node>``. This allows nodes to be deallocated
+de-allocation of unreferenced nodes when they are no longer referenced. The one 
-when the are no longer referenced. The one exception to this rule is
+exception to this rule is that there can not be a circular path through shared 
-that there can not be a circular path through shared pointers, as this
+pointers, as this would prevent the reference counts from ever getting to 0. In 
-would prevent the reference counts from every going to 0.
+other words, it would be impossible to entirely deallocate or unreference a node.
-Every node has zero or more inputs, zero or more outputs, and zero or
+Every node has zero or more *inputs*, zero or more *outputs*, and zero or more 
-more attributes. For our purposes, nodes should be thought of as
+*attributes*. For our purpose to :ref:`define_cmp`, nodes should be thought of 
-essentially immutable, so when we construct a node, we need to supply
+as essentially immutable; that is, when constructing a node, we need to supply 
-all of its inputs. Thus, to get this process started, we need some
+all of its inputs. To get this process started, we need some nodes that have no 
-nodes that have no inputs.
+inputs. 
-We use ``op::Parameter`` to specify the tensors that will be passed to
+For a more concrete example, we can use the analogy of a baton relay. It can be 
-the computation. They receive their values from outside of the graph,
+drawn ahead of time that the runner must hand off the baton to the next runner 
-so they have no inputs. They have attributes for the element type
+at the specific place (node). The runner waiting for the baton is essentially at
-and the shape of the tensor that will be passed to them.
+an empty node until he receives the baton. The runner who surrenders the baton 
+to the next runner (node) is de-allocated of the baton.
+``op::Parameter`` specifes the tensors that will be passed to the computation. 
+They receive their values from outside of the graph, so they have no inputs. 
+They have attributes for the element type and the shape of the tensor that will 
+be passed to them.
 .. code-block:: cpp
@@ -92,33 +112,34 @@ Once the graph is built, we need to package it in a ``Function``:
   auto f = make_shared<Function>(NodeVector{t1}, ParameterVector{a, b, c});
-The first argument to the constuctor specifies the nodes that the
+The first argument to the constuctor specifies the nodes that the function will 
-function will return; in this case, the product. A ``NodeVector`` is a
+return; in this case, the product. A ``NodeVector`` is a vector of shared 
-vector of shared pointers of ``op::Node``.  The second argument
+pointers of ``op::Node``.  The second argument specifies the parameters of the 
-specifies the parameters of the function, in the order they are to be
+function, in the order they are to be passed to the compiled function. A 
-passed to the compiled function. A ``ParameterVector`` is a vector of
+``ParameterVector`` is a vector of shared pointers to ``op::Parameter``. 
-shared pointers to ``op::Parameter``. *The parameter vector must
-include* **every** *parameter used in the computation of the results.*
+.. important:: The parameter vector must include* **every** *parameter used in 
+   the computation of the results.
 .. _specify_bkd:
 Specify the backend upon which to run the computation
 =====================================================
-A *backend* is an environment that can execute computations, such as
+For a framework bridge, a *backend* is the environment that can perform the 
-the CPU or an NNP. A *transformer* can compile computations for a
+computations; it can be done with a CPU, GPU, or an NNP. A *transformer* can 
-backend, allocate and deallocate tensors, and invoke computations.
+compile computations for a backend, allocate and deallocate tensors, and invoke 
+computations.
-The current selection process is showing signs of age, and will be
+Factory-like managers for classes of backend managers can compile a ``Function`` 
-changed. The general idea is that there are factory-like managers for
+and allocate backends. A backend is somewhat analogous to a multi-threaded
-classes of backend, Managers can compile a ``Function`` and allocate
-backends. A backend is somewhat analogous to a multi-threaded
 process.
 There are two backends for the CPU, the optimized "CPU" backend, which
-makes use of mkl-dnn, and the "INTERPRETER" backend which runs
+makes use of mkl-dnn, and the "INTERPRETER" backend which runs reference 
-reference versions of kernels where implementation clarity is favored
+versions of kernels where implementation clarity is favored over speed. The 
-over speed. The "INTERPRETER" backend is mainly used for testing.
+"INTERPRETER" backend is mainly used for testing. 
 To select the "CPU" backend,
@@ -150,21 +171,19 @@ the ``ExternalFunction``.
 Allocate backend storage for the inputs and outputs
 ===================================================
-At the graph level, functions are stateless. They do
+At the graph level, functions are stateless. They do have internal state related 
-have internal state related to execution, but there is no user-visible
+to execution, but there is no user-visible state. Variables must be passed as 
-state. Variables must be passed as arguments. If the function updates
+arguments. If the function updates variables, it must return the updated 
-variables, it must return the updated variables.
+variables.
-To invoke a function, tensors must be provided for every input and
+To invoke a function, tensors must be provided for every input and every output. 
-every output. At this time, a tensor used as an input cannot also be
+At this time, a tensor used as an input cannot also be used as an output. If 
-used as an output. If variables are being updated, you should use a
+variables are being updated, you should use a double-buffering approach where 
-double-buffering approach where you switch between odd/even
+you switch between odd/even generations of variables on each update.
-generations of variables on each update.
-Backends are responsible for managing storage. If the storage is
+Backends are responsible for managing storage. If the storage is off-CPU, caches 
-off-CPU, caches are used to minimize copying between device and
+are used to minimize copying between device and CPU. We can allocate storage for 
-CPU. We can allocate storage for the three parameters and return value
+the three parameters and return value as follows:
-as follows:
 .. code-block:: cpp
@@ -173,20 +192,18 @@ as follows:
   auto t_c = backend->make_primary_tensor_view(element::f32, shape);
   auto t_result = backend->make_primary_tensor_view(element::f32, shape);
-Each tensor is a shared pointer to a ``runtime::TensorView``, the
+Each tensor is a shared pointer to a ``runtime::TensorView``, the interface 
-interface backends implement for tensor use. When there are no more
+backends implement for tensor use. When there are no more references to the 
-references to the tensor view, it will be freed when convenient for
+tensor view, it will be freed when convenient for the backend.
-the backend.
 .. _initialize_inputs:
 Initialize the inputs
 =====================
-Normally the framework bridge reads/writes bytes to the tensor,
+Normally the framework bridge reads/writes bytes to the tensor, assuming a 
-assuming a row-major element layout. To simplify writing unit tests,
+row-major element layout. To simplify writing unit tests, we have developed a 
-we have developed a class for making tensor literals. We can use these
+class for making tensor literals. We can use these to initialize our tensors:
-to initialize our tensors:
 .. code-block:: cpp
@@ -194,12 +211,11 @@ to initialize our tensors:
   copy_data(t_b, test::NDArray<float, 2>({{7, 8, 9}, {10, 11, 12}}).get_vector());
   copy_data(t_c, test::NDArray<float, 2>({{1, 0, -1}, {-1, 1, 2}}).get_vector());
-The ``test::NDArray`` needs to know the element type (``float``) and
+The ``test::NDArray`` needs to know the element type (``float``) and rank (``2``) 
-rank (``2``) of the tensors, and figures out the shape during template
+of the tensors, and figures out the shape during template expansion.
-expansion.
-The ``runtime::TensorView`` interface has ``write`` and ``read``
+The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for 
-methods for copying data to/from the tensor.
+copying data to/from the tensor.
 .. _invoke_cmp: