Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
N
ngraph
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
ngraph
Commits
bbe1c66c
Commit
bbe1c66c
authored
Feb 28, 2018
by
LS Cook
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
WIP first pass edit review
parent
cea849f7
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
90 additions
and
74 deletions
+90
-74
execute.rst
doc/sphinx/source/howto/execute.rst
+90
-74
No files found.
doc/sphinx/source/howto/execute.rst
View file @
bbe1c66c
...
...
@@ -5,9 +5,17 @@ Executing a Computation
#######################
This section explains how to manually perform the steps that would normally be
performed by a framework :term:`bridge` to execute a computation. In order to
successfully run a computation, the entity (framework or user) must be able to
do all of these things:
performed by a framework :term:`bridge` to execute a computation. Intel® nGraph
library is targeted toward automatic construction; it is far easier for a
processing unit (GPU, CPU, or custom silicon) to run a computation than it is
for a user to map out how that computation happens.
Here we will do all the bridge steps manually. Unfortunately, things that make
by-hand graph construction simpler tend to make automatic construction more
difficult, and vice versa.
In order to successfully run a computation, the entity (framework or user) must
be able to do all of these things:
* :ref:`define_cmp`
* :ref:`specify_bkd`
...
...
@@ -24,39 +32,51 @@ Define a Computation
====================
To a :term:`framework`, a computation is simply a transformation of inputs to
outputs. To a user, a computation is a function whose body is a dataflow graph.
While a *framework bridge* can programmatically construct the graph from a
framework's representation of the computation, graph construction can be somewhat
more tedious for users. Since nGraph is targeted toward automatic construction,
we deconstruct here how this happens.
outputs. While a *framework bridge* can programmatically construct the graph
from a framework's representation of the computation, graph construction can be
somewhat more tedious for users. To a user, who is usually interested in
specific nodes (vertices) or edges of a computation that reveal "what is
happening where", it can be helpful to think of a computation as a zoomed-out
and *stateless* dataflow graph where all of the nodes are well-defined and all
of the edges are possible routes for executing a computation.
.. TODO
.. image for representing nodes and edges
Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so
we will omit the namespace. Use of namespaces other than ``std`` will be
namespaces in ``ngraph``. For example, the ``op::Add``
refers
to
namespaces in ``ngraph``. For example, the ``op::Add``
is assumed to refer
to
``ngraph::op::Add``.
A computation's graph is constructed from ops; each a member of a
subclass of ``op::Op``, which, in turn, is a subclass of ``Node``. Not
all graphs are computation, but all graphs are composed entirely of
instances of ``Node``. Computation graphs are only contain ``op::Op``
nodes.
We mostly use shared pointers for nodes,
i.e. ``std::shared_ptr<Node>``. This allows nodes to be deallocated
when the are no longer referenced. The one exception to this rule is
that there can not be a circular path through shared pointers, as this
would prevent the reference counts from every going to 0.
Every node has zero or more inputs, zero or more outputs, and zero or
more attributes. For our purposes, nodes should be thought of as
essentially immutable, so when we construct a node, we need to supply
all of its inputs. Thus, to get this process started, we need some
nodes that have no inputs.
We use ``op::Parameter`` to specify the tensors that will be passed to
the computation. They receive their values from outside of the graph,
so they have no inputs. They have attributes for the element type
and the shape of the tensor that will be passed to them.
A computation's graph is constructed from ops; each is a member of a subclass of
``op::Op``, which, in turn, is a subclass of ``Node``. Not all graphs are
computation, but all graphs are composed entirely of instances of ``Node``.
Computation graphs contain only ``op::Op`` nodes.
We mostly use shared pointers for nodes. In the nGraph APIs for :doc:`ops/index`,
this is presented as ``std::shared_ptr<Node>``, which allows for the
de-allocation of unreferenced nodes when they are no longer referenced. The one
exception to this rule is that there can not be a circular path through shared
pointers, as this would prevent the reference counts from ever getting to 0. In
other words, it would be impossible to entirely deallocate or unreference a node.
Every node has zero or more *inputs*, zero or more *outputs*, and zero or more
*attributes*. For our purpose to :ref:`define_cmp`, nodes should be thought of
as essentially immutable; that is, when constructing a node, we need to supply
all of its inputs. To get this process started, we need some nodes that have no
inputs.
For a more concrete example, we can use the analogy of a baton relay. It can be
drawn ahead of time that the runner must hand off the baton to the next runner
at the specific place (node). The runner waiting for the baton is essentially at
an empty node until he receives the baton. The runner who surrenders the baton
to the next runner (node) is de-allocated of the baton.
``op::Parameter`` specifes the tensors that will be passed to the computation.
They receive their values from outside of the graph, so they have no inputs.
They have attributes for the element type and the shape of the tensor that will
be passed to them.
.. code-block:: cpp
...
...
@@ -92,33 +112,34 @@ Once the graph is built, we need to package it in a ``Function``:
auto f = make_shared<Function>(NodeVector{t1}, ParameterVector{a, b, c});
The first argument to the constuctor specifies the nodes that the
function will return; in this case, the product. A ``NodeVector`` is a
vector of shared pointers of ``op::Node``. The second argument
specifies the parameters of the function, in the order they are to be
passed to the compiled function. A ``ParameterVector`` is a vector of
shared pointers to ``op::Parameter``. *The parameter vector must
include* **every** *parameter used in the computation of the results.*
The first argument to the constuctor specifies the nodes that the function will
return; in this case, the product. A ``NodeVector`` is a vector of shared
pointers of ``op::Node``. The second argument specifies the parameters of the
function, in the order they are to be passed to the compiled function. A
``ParameterVector`` is a vector of shared pointers to ``op::Parameter``.
.. important:: The parameter vector must include* **every** *parameter used in
the computation of the results.
.. _specify_bkd:
Specify the backend upon which to run the computation
=====================================================
A *backend* is an environment that can execute computations, such as
the CPU or an NNP. A *transformer* can compile computations for a
backend, allocate and deallocate tensors, and invoke computations.
For a framework bridge, a *backend* is the environment that can perform the
computations; it can be done with a CPU, GPU, or an NNP. A *transformer* can
compile computations for a backend, allocate and deallocate tensors, and invoke
computations.
The current selection process is showing signs of age, and will be
changed. The general idea is that there are factory-like managers for
classes of backend, Managers can compile a ``Function`` and allocate
backends. A backend is somewhat analogous to a multi-threaded
Factory-like managers for classes of backend managers can compile a ``Function``
and allocate backends. A backend is somewhat analogous to a multi-threaded
process.
There are two backends for the CPU, the optimized "CPU" backend, which
makes use of mkl-dnn, and the "INTERPRETER" backend which runs
reference versions of kernels where implementation clarity is favored
over speed. The "INTERPRETER" backend is mainly used for testing.
makes use of mkl-dnn, and the "INTERPRETER" backend which runs
reference
versions of kernels where implementation clarity is favored over speed. The
"INTERPRETER" backend is mainly used for testing.
To select the "CPU" backend,
...
...
@@ -150,21 +171,19 @@ the ``ExternalFunction``.
Allocate backend storage for the inputs and outputs
===================================================
At the graph level, functions are stateless. They do
have internal state related to execution, but there is no user-visible
state. Variables must be passed as arguments. If the function updates
variables
, it must return the updated variables
.
At the graph level, functions are stateless. They do
have internal state related
to execution, but there is no user-visible state. Variables must be passed as
arguments. If the function updates variables, it must return the updated
variables.
To invoke a function, tensors must be provided for every input and
every output. At this time, a tensor used as an input cannot also be
used as an output. If variables are being updated, you should use a
double-buffering approach where you switch between odd/even
generations of variables on each update.
To invoke a function, tensors must be provided for every input and every output.
At this time, a tensor used as an input cannot also be used as an output. If
variables are being updated, you should use a double-buffering approach where
you switch between odd/even generations of variables on each update.
Backends are responsible for managing storage. If the storage is
off-CPU, caches are used to minimize copying between device and
CPU. We can allocate storage for the three parameters and return value
as follows:
Backends are responsible for managing storage. If the storage is off-CPU, caches
are used to minimize copying between device and CPU. We can allocate storage for
the three parameters and return value as follows:
.. code-block:: cpp
...
...
@@ -173,20 +192,18 @@ as follows:
auto t_c = backend->make_primary_tensor_view(element::f32, shape);
auto t_result = backend->make_primary_tensor_view(element::f32, shape);
Each tensor is a shared pointer to a ``runtime::TensorView``, the
interface backends implement for tensor use. When there are no more
references to the tensor view, it will be freed when convenient for
the backend.
Each tensor is a shared pointer to a ``runtime::TensorView``, the interface
backends implement for tensor use. When there are no more references to the
tensor view, it will be freed when convenient for the backend.
.. _initialize_inputs:
Initialize the inputs
=====================
Normally the framework bridge reads/writes bytes to the tensor,
assuming a row-major element layout. To simplify writing unit tests,
we have developed a class for making tensor literals. We can use these
to initialize our tensors:
Normally the framework bridge reads/writes bytes to the tensor, assuming a
row-major element layout. To simplify writing unit tests, we have developed a
class for making tensor literals. We can use these to initialize our tensors:
.. code-block:: cpp
...
...
@@ -194,12 +211,11 @@ to initialize our tensors:
copy_data(t_b, test::NDArray<float, 2>({{7, 8, 9}, {10, 11, 12}}).get_vector());
copy_data(t_c, test::NDArray<float, 2>({{1, 0, -1}, {-1, 1, 2}}).get_vector());
The ``test::NDArray`` needs to know the element type (``float``) and
rank (``2``) of the tensors, and figures out the shape during template
expansion.
The ``test::NDArray`` needs to know the element type (``float``) and rank (``2``)
of the tensors, and figures out the shape during template expansion.
The ``runtime::TensorView`` interface has ``write`` and ``read``
methods for
copying data to/from the tensor.
The ``runtime::TensorView`` interface has ``write`` and ``read``
methods for
copying data to/from the tensor.
.. _invoke_cmp:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment