Commit bbe1c66c authored by LS Cook's avatar LS Cook

WIP first pass edit review

parent cea849f7
...@@ -5,9 +5,17 @@ Executing a Computation ...@@ -5,9 +5,17 @@ Executing a Computation
####################### #######################
This section explains how to manually perform the steps that would normally be This section explains how to manually perform the steps that would normally be
performed by a framework :term:`bridge` to execute a computation. In order to performed by a framework :term:`bridge` to execute a computation. Intel® nGraph
successfully run a computation, the entity (framework or user) must be able to library is targeted toward automatic construction; it is far easier for a
do all of these things: processing unit (GPU, CPU, or custom silicon) to run a computation than it is
for a user to map out how that computation happens.
Here we will do all the bridge steps manually. Unfortunately, things that make
by-hand graph construction simpler tend to make automatic construction more
difficult, and vice versa.
In order to successfully run a computation, the entity (framework or user) must
be able to do all of these things:
* :ref:`define_cmp` * :ref:`define_cmp`
* :ref:`specify_bkd` * :ref:`specify_bkd`
...@@ -24,39 +32,51 @@ Define a Computation ...@@ -24,39 +32,51 @@ Define a Computation
==================== ====================
To a :term:`framework`, a computation is simply a transformation of inputs to To a :term:`framework`, a computation is simply a transformation of inputs to
outputs. To a user, a computation is a function whose body is a dataflow graph. outputs. While a *framework bridge* can programmatically construct the graph
While a *framework bridge* can programmatically construct the graph from a from a framework's representation of the computation, graph construction can be
framework's representation of the computation, graph construction can be somewhat somewhat more tedious for users. To a user, who is usually interested in
more tedious for users. Since nGraph is targeted toward automatic construction, specific nodes (vertices) or edges of a computation that reveal "what is
we deconstruct here how this happens. happening where", it can be helpful to think of a computation as a zoomed-out
and *stateless* dataflow graph where all of the nodes are well-defined and all
of the edges are possible routes for executing a computation.
.. TODO
.. image for representing nodes and edges
Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so
we will omit the namespace. Use of namespaces other than ``std`` will be we will omit the namespace. Use of namespaces other than ``std`` will be
namespaces in ``ngraph``. For example, the ``op::Add`` refers to namespaces in ``ngraph``. For example, the ``op::Add`` is assumed to refer to
``ngraph::op::Add``. ``ngraph::op::Add``.
A computation's graph is constructed from ops; each a member of a A computation's graph is constructed from ops; each is a member of a subclass of
subclass of ``op::Op``, which, in turn, is a subclass of ``Node``. Not ``op::Op``, which, in turn, is a subclass of ``Node``. Not all graphs are
all graphs are computation, but all graphs are composed entirely of computation, but all graphs are composed entirely of instances of ``Node``.
instances of ``Node``. Computation graphs are only contain ``op::Op`` Computation graphs contain only ``op::Op`` nodes.
nodes.
We mostly use shared pointers for nodes. In the nGraph APIs for :doc:`ops/index`,
We mostly use shared pointers for nodes, this is presented as ``std::shared_ptr<Node>``, which allows for the
i.e. ``std::shared_ptr<Node>``. This allows nodes to be deallocated de-allocation of unreferenced nodes when they are no longer referenced. The one
when the are no longer referenced. The one exception to this rule is exception to this rule is that there can not be a circular path through shared
that there can not be a circular path through shared pointers, as this pointers, as this would prevent the reference counts from ever getting to 0. In
would prevent the reference counts from every going to 0. other words, it would be impossible to entirely deallocate or unreference a node.
Every node has zero or more inputs, zero or more outputs, and zero or Every node has zero or more *inputs*, zero or more *outputs*, and zero or more
more attributes. For our purposes, nodes should be thought of as *attributes*. For our purpose to :ref:`define_cmp`, nodes should be thought of
essentially immutable, so when we construct a node, we need to supply as essentially immutable; that is, when constructing a node, we need to supply
all of its inputs. Thus, to get this process started, we need some all of its inputs. To get this process started, we need some nodes that have no
nodes that have no inputs. inputs.
We use ``op::Parameter`` to specify the tensors that will be passed to For a more concrete example, we can use the analogy of a baton relay. It can be
the computation. They receive their values from outside of the graph, drawn ahead of time that the runner must hand off the baton to the next runner
so they have no inputs. They have attributes for the element type at the specific place (node). The runner waiting for the baton is essentially at
and the shape of the tensor that will be passed to them. an empty node until he receives the baton. The runner who surrenders the baton
to the next runner (node) is de-allocated of the baton.
``op::Parameter`` specifes the tensors that will be passed to the computation.
They receive their values from outside of the graph, so they have no inputs.
They have attributes for the element type and the shape of the tensor that will
be passed to them.
.. code-block:: cpp .. code-block:: cpp
...@@ -92,33 +112,34 @@ Once the graph is built, we need to package it in a ``Function``: ...@@ -92,33 +112,34 @@ Once the graph is built, we need to package it in a ``Function``:
auto f = make_shared<Function>(NodeVector{t1}, ParameterVector{a, b, c}); auto f = make_shared<Function>(NodeVector{t1}, ParameterVector{a, b, c});
The first argument to the constuctor specifies the nodes that the The first argument to the constuctor specifies the nodes that the function will
function will return; in this case, the product. A ``NodeVector`` is a return; in this case, the product. A ``NodeVector`` is a vector of shared
vector of shared pointers of ``op::Node``. The second argument pointers of ``op::Node``. The second argument specifies the parameters of the
specifies the parameters of the function, in the order they are to be function, in the order they are to be passed to the compiled function. A
passed to the compiled function. A ``ParameterVector`` is a vector of ``ParameterVector`` is a vector of shared pointers to ``op::Parameter``.
shared pointers to ``op::Parameter``. *The parameter vector must
include* **every** *parameter used in the computation of the results.* .. important:: The parameter vector must include* **every** *parameter used in
the computation of the results.
.. _specify_bkd: .. _specify_bkd:
Specify the backend upon which to run the computation Specify the backend upon which to run the computation
===================================================== =====================================================
A *backend* is an environment that can execute computations, such as For a framework bridge, a *backend* is the environment that can perform the
the CPU or an NNP. A *transformer* can compile computations for a computations; it can be done with a CPU, GPU, or an NNP. A *transformer* can
backend, allocate and deallocate tensors, and invoke computations. compile computations for a backend, allocate and deallocate tensors, and invoke
computations.
The current selection process is showing signs of age, and will be Factory-like managers for classes of backend managers can compile a ``Function``
changed. The general idea is that there are factory-like managers for and allocate backends. A backend is somewhat analogous to a multi-threaded
classes of backend, Managers can compile a ``Function`` and allocate
backends. A backend is somewhat analogous to a multi-threaded
process. process.
There are two backends for the CPU, the optimized "CPU" backend, which There are two backends for the CPU, the optimized "CPU" backend, which
makes use of mkl-dnn, and the "INTERPRETER" backend which runs makes use of mkl-dnn, and the "INTERPRETER" backend which runs reference
reference versions of kernels where implementation clarity is favored versions of kernels where implementation clarity is favored over speed. The
over speed. The "INTERPRETER" backend is mainly used for testing. "INTERPRETER" backend is mainly used for testing.
To select the "CPU" backend, To select the "CPU" backend,
...@@ -150,21 +171,19 @@ the ``ExternalFunction``. ...@@ -150,21 +171,19 @@ the ``ExternalFunction``.
Allocate backend storage for the inputs and outputs Allocate backend storage for the inputs and outputs
=================================================== ===================================================
At the graph level, functions are stateless. They do At the graph level, functions are stateless. They do have internal state related
have internal state related to execution, but there is no user-visible to execution, but there is no user-visible state. Variables must be passed as
state. Variables must be passed as arguments. If the function updates arguments. If the function updates variables, it must return the updated
variables, it must return the updated variables. variables.
To invoke a function, tensors must be provided for every input and To invoke a function, tensors must be provided for every input and every output.
every output. At this time, a tensor used as an input cannot also be At this time, a tensor used as an input cannot also be used as an output. If
used as an output. If variables are being updated, you should use a variables are being updated, you should use a double-buffering approach where
double-buffering approach where you switch between odd/even you switch between odd/even generations of variables on each update.
generations of variables on each update.
Backends are responsible for managing storage. If the storage is Backends are responsible for managing storage. If the storage is off-CPU, caches
off-CPU, caches are used to minimize copying between device and are used to minimize copying between device and CPU. We can allocate storage for
CPU. We can allocate storage for the three parameters and return value the three parameters and return value as follows:
as follows:
.. code-block:: cpp .. code-block:: cpp
...@@ -173,20 +192,18 @@ as follows: ...@@ -173,20 +192,18 @@ as follows:
auto t_c = backend->make_primary_tensor_view(element::f32, shape); auto t_c = backend->make_primary_tensor_view(element::f32, shape);
auto t_result = backend->make_primary_tensor_view(element::f32, shape); auto t_result = backend->make_primary_tensor_view(element::f32, shape);
Each tensor is a shared pointer to a ``runtime::TensorView``, the Each tensor is a shared pointer to a ``runtime::TensorView``, the interface
interface backends implement for tensor use. When there are no more backends implement for tensor use. When there are no more references to the
references to the tensor view, it will be freed when convenient for tensor view, it will be freed when convenient for the backend.
the backend.
.. _initialize_inputs: .. _initialize_inputs:
Initialize the inputs Initialize the inputs
===================== =====================
Normally the framework bridge reads/writes bytes to the tensor, Normally the framework bridge reads/writes bytes to the tensor, assuming a
assuming a row-major element layout. To simplify writing unit tests, row-major element layout. To simplify writing unit tests, we have developed a
we have developed a class for making tensor literals. We can use these class for making tensor literals. We can use these to initialize our tensors:
to initialize our tensors:
.. code-block:: cpp .. code-block:: cpp
...@@ -194,12 +211,11 @@ to initialize our tensors: ...@@ -194,12 +211,11 @@ to initialize our tensors:
copy_data(t_b, test::NDArray<float, 2>({{7, 8, 9}, {10, 11, 12}}).get_vector()); copy_data(t_b, test::NDArray<float, 2>({{7, 8, 9}, {10, 11, 12}}).get_vector());
copy_data(t_c, test::NDArray<float, 2>({{1, 0, -1}, {-1, 1, 2}}).get_vector()); copy_data(t_c, test::NDArray<float, 2>({{1, 0, -1}, {-1, 1, 2}}).get_vector());
The ``test::NDArray`` needs to know the element type (``float``) and The ``test::NDArray`` needs to know the element type (``float``) and rank (``2``)
rank (``2``) of the tensors, and figures out the shape during template of the tensors, and figures out the shape during template expansion.
expansion.
The ``runtime::TensorView`` interface has ``write`` and ``read`` The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for
methods for copying data to/from the tensor. copying data to/from the tensor.
.. _invoke_cmp: .. _invoke_cmp:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment