execute.rst 8.92 KB
Newer Older
1
.. execute-cmp.rst
Scott Cyphers's avatar
Scott Cyphers committed
2

3
######################
4
Execute a computation
5
######################
Scott Cyphers's avatar
Scott Cyphers committed
6

LS Cook's avatar
LS Cook committed
7
This section explains how to manually perform the steps that would normally be 
8
performed by a framework :term:`bridge` to execute a computation. The nGraph 
LS Cook's avatar
LS Cook committed
9
library is targeted toward automatic construction; it is far easier for a 
10
processing unit (GPU, CPU, or an `Intel Nervana NNP`_) to run a computation than 
L.S. Cook's avatar
L.S. Cook committed
11
it is for a human to map out how that computation happens. Unfortunately, things 
12 13
that make by-hand graph construction simpler tend to make automatic construction 
more difficult, and vice versa.
LS Cook's avatar
LS Cook committed
14

15
Here we will do all the bridge steps manually. The :term:`model description` 
L.S. Cook's avatar
L.S. Cook committed
16 17 18
walk-through below is based on the :file:`abc.cpp` code in the ``/doc/examples/`` 
directory. We'll be deconstructing the steps that must happen (either programmatically 
or manually) in order to successfully execute a computation:
Scott Cyphers's avatar
Scott Cyphers committed
19

20
* :ref:`define_cmp`
21
* :ref:`specify_backend`
22
* :ref:`compile_cmp`
23
* :ref:`allocate_backend_storage`
24 25 26
* :ref:`initialize_inputs`
* :ref:`invoke_cmp`
* :ref:`access_outputs`
27

L.S. Cook's avatar
L.S. Cook committed
28
The full code is at the :ref:`end of this page <all_together>`.
29

LS Cook's avatar
LS Cook committed
30 31

.. _define_cmp:
Scott Cyphers's avatar
Scott Cyphers committed
32

33 34
Define the computation
======================
Scott Cyphers's avatar
Scott Cyphers committed
35

36
To a :term:`framework`, a computation is simply a transformation of inputs to 
L.S. Cook's avatar
L.S. Cook committed
37
outputs. While a :term:`bridge` can programmatically construct the graph 
LS Cook's avatar
LS Cook committed
38
from a framework's representation of the computation, graph construction can be 
L.S. Cook's avatar
L.S. Cook committed
39 40 41 42 43 44
somewhat more tedious when done manually. For anyone interested in specific 
nodes (vertices) or edges of a computation that reveal "what is happening where", 
it can be helpful to think of a computation as a zoomed-out and *stateless* 
:term:`data-flow graph` where all of the nodes are well-defined tensor 
operations and all of the edges denote use of an output from one operation as an 
input for another operation.
Scott Cyphers's avatar
Scott Cyphers committed
45

46 47
Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so 
we will omit the namespace. Use of namespaces other than ``std`` will be 
LS Cook's avatar
LS Cook committed
48
namespaces in ``ngraph``. For example, the ``op::Add`` is assumed to refer to 
L.S. Cook's avatar
L.S. Cook committed
49 50 51 52
``ngraph::op::Add``. A computation's graph is constructed from ops; each is a 
member of a subclass of ``op::Op``, which, in turn, is a subclass of ``Node``. 
Not all graphs are computation, but all graphs are composed entirely of 
instances of ``Node``.  Computation graphs contain only ``op::Op`` nodes.
LS Cook's avatar
LS Cook committed
53

Scott Cyphers's avatar
Scott Cyphers committed
54
We mostly use :term:`shared pointers<shared pointer>` for nodes, i.e.
L.S. Cook's avatar
L.S. Cook committed
55 56 57
``std::shared_ptr<Node>``, so that they will be automatically deallocated when 
they are no longer needed. More detail on shared pointers is given in the 
glossary.
LS Cook's avatar
LS Cook committed
58 59

Every node has zero or more *inputs*, zero or more *outputs*, and zero or more 
L.S. Cook's avatar
L.S. Cook committed
60 61 62
*attributes*. 

The specifics for each ``type`` permitted on a core ``Op``-specific basis can be 
63
discovered in our :doc:`../../ops/index` docs. For our purpose to 
L.S. Cook's avatar
L.S. Cook committed
64 65 66 67
:ref:`define a computation <define_cmp>`, nodes should be thought of as essentially 
immutable; that is, when constructing a node, we need to supply all of its 
inputs. We get this process started with ops that have no inputs, since any op 
with no inputs is going to first need some inputs.
LS Cook's avatar
LS Cook committed
68 69 70 71 72

``op::Parameter`` specifes the tensors that will be passed to the computation. 
They receive their values from outside of the graph, so they have no inputs. 
They have attributes for the element type and the shape of the tensor that will 
be passed to them.
Scott Cyphers's avatar
Scott Cyphers committed
73

74
.. literalinclude:: ../../../../examples/abc/abc.cpp
75
   :language: cpp
L.S. Cook's avatar
L.S. Cook committed
76
   :lines: 25-29
Scott Cyphers's avatar
Scott Cyphers committed
77

L.S. Cook's avatar
L.S. Cook committed
78 79
The above code makes three parameter nodes where each is a 32-bit float of 
shape ``(2, 3)`` and a row-major element layout.
Scott Cyphers's avatar
Scott Cyphers committed
80

L.S. Cook's avatar
L.S. Cook committed
81
To create a graph for ``(a + b) * c``, first make an ``op::Add`` node with inputs 
82
from ``a`` and ``b``, and an ``op::Multiply`` node from the add node and ``c``:
Scott Cyphers's avatar
Scott Cyphers committed
83

84
.. literalinclude:: ../../../../examples/abc/abc.cpp
85
   :language: cpp
Scott Cyphers's avatar
Scott Cyphers committed
86
   :lines: 31-32
Scott Cyphers's avatar
Scott Cyphers committed
87

88 89 90 91
When the ``op::Add`` op is constructed, it will check that the element types and 
shapes of its inputs match; to support multiple frameworks, ngraph does not do 
automatic type conversion or broadcasting. In this case, they match, and the 
shape of the unique output of ``t0`` will be a 32-bit float with shape ``(2, 3)``. 
Scott Cyphers's avatar
Scott Cyphers committed
92
Similarly, ``op::Multiply`` checks that its inputs match and sets the element 
93
type and shape of its unique output.
LS Cook's avatar
LS Cook committed
94

Scott Cyphers's avatar
Scott Cyphers committed
95 96
Once the graph is built, we need to package it in a ``Function``:

97
.. literalinclude:: ../../../../examples/abc/abc.cpp
98
   :language: cpp
99
   :lines: 35-36
Scott Cyphers's avatar
Scott Cyphers committed
100

LS Cook's avatar
LS Cook committed
101 102 103 104 105 106
The first argument to the constuctor specifies the nodes that the function will 
return; in this case, the product. A ``NodeVector`` is a vector of shared 
pointers of ``op::Node``.  The second argument specifies the parameters of the 
function, in the order they are to be passed to the compiled function. A 
``ParameterVector`` is a vector of shared pointers to ``op::Parameter``. 

107
.. important:: The parameter vector must include **every** parameter used in 
LS Cook's avatar
LS Cook committed
108 109
   the computation of the results.

LS Cook's avatar
LS Cook committed
110

111
.. _specify_backend:
LS Cook's avatar
LS Cook committed
112 113 114 115

Specify the backend upon which to run the computation
=====================================================

LS Cook's avatar
LS Cook committed
116
For a framework bridge, a *backend* is the environment that can perform the 
117 118 119
computations; it can be done with a CPU, GPU, or an Intel Nervana NNP. A 
*transformer* can compile computations for a backend, allocate and deallocate 
tensors, and invoke computations.
Scott Cyphers's avatar
Scott Cyphers committed
120

LS Cook's avatar
LS Cook committed
121 122
Factory-like managers for classes of backend managers can compile a ``Function`` 
and allocate backends. A backend is somewhat analogous to a multi-threaded
Scott Cyphers's avatar
Scott Cyphers committed
123 124
process.

125 126 127
There are two backends for the CPU: the optimized ``"CPU"`` backend, which uses 
the `Intel MKL-DNN`_, and the ``"INTERPRETER"`` backend, which runs reference 
versions of kernels that favor implementation clarity over speed. The 
L.S. Cook's avatar
L.S. Cook committed
128
``"INTERPRETER"`` backend can be slow, and is primarily intended for testing. 
129
See the documentation on :doc:`runtime options for various backends <../../backend-support/index>` 
L.S. Cook's avatar
L.S. Cook committed
130
for additional details.
LS Cook's avatar
LS Cook committed
131

132
To continue with our original example and select the ``"CPU_Backend"``: 
Scott Cyphers's avatar
Scott Cyphers committed
133

134
.. literalinclude:: ../../../../examples/abc/abc.cpp
135
   :language: cpp
136
   :lines: 38-39
Scott Cyphers's avatar
Scott Cyphers committed
137

LS Cook's avatar
LS Cook committed
138 139 140 141 142 143

.. _compile_cmp:

Compile the computation 
=======================

144 145 146 147 148 149
Compilation triggers something that can be used as a factory for producing a 
``CallFrame`` which is a *function* and its associated *state* that can run 
in a single thread at a time. A ``CallFrame`` may be reused, but any particular 
``CallFrame`` must only be running in one thread at any time. If more than one 
thread needs to execute the function at the same time, create multiple 
``CallFrame`` objects from the ``ExternalFunction``.
LS Cook's avatar
LS Cook committed
150 151


152
.. _allocate_backend_storage:
LS Cook's avatar
LS Cook committed
153 154 155 156

Allocate backend storage for the inputs and outputs
===================================================

LS Cook's avatar
LS Cook committed
157 158 159 160
At the graph level, functions are stateless. They do have internal state related 
to execution, but there is no user-visible state. Variables must be passed as 
arguments. If the function updates variables, it must return the updated 
variables.
Scott Cyphers's avatar
Scott Cyphers committed
161

LS Cook's avatar
LS Cook committed
162 163 164 165
To invoke a function, tensors must be provided for every input and every output. 
At this time, a tensor used as an input cannot also be used as an output. If 
variables are being updated, you should use a double-buffering approach where 
you switch between odd/even generations of variables on each update.
LS Cook's avatar
LS Cook committed
166

LS Cook's avatar
LS Cook committed
167 168
Backends are responsible for managing storage. If the storage is off-CPU, caches 
are used to minimize copying between device and CPU. We can allocate storage for 
169
the three parameters and the return value.
LS Cook's avatar
LS Cook committed
170

171
.. literalinclude:: ../../../../examples/abc/abc.cpp
172
   :language: cpp
173
   :lines: 41-46
Scott Cyphers's avatar
Scott Cyphers committed
174

175 176
Each tensor is a shared pointer to a :term:`Tensorview`, which is the interface 
backends implement for tensor use. When there are no more references to the 
L.S. Cook's avatar
L.S. Cook committed
177
tensor view, it will be freed when convenient for the backend. See the 
178
:doc:`../../backend-support/cpp-api` documentation for details on how to work 
179
with ``Tensor``.
L.S. Cook's avatar
L.S. Cook committed
180

LS Cook's avatar
LS Cook committed
181 182 183 184 185 186

.. _initialize_inputs:

Initialize the inputs
=====================

Scott Cyphers's avatar
Scott Cyphers committed
187
Next we need to copy some data into the tensors.
Scott Cyphers's avatar
Scott Cyphers committed
188

189
.. literalinclude:: ../../../../examples/abc/abc.cpp
190
   :language: cpp
191
   :lines: 48-55
LS Cook's avatar
LS Cook committed
192

193
The ``runtime::Tensor`` interface has ``write`` and ``read`` methods for 
LS Cook's avatar
LS Cook committed
194
copying data to/from the tensor.
LS Cook's avatar
LS Cook committed
195 196 197 198 199 200

.. _invoke_cmp:

Invoke the computation
======================

201 202
To invoke the function, we simply pass argument and resultant tensors to the 
call frame:
LS Cook's avatar
LS Cook committed
203

204
.. literalinclude:: ../../../../examples/abc/abc.cpp
205
   :language: cpp
206
   :lines: 57-58
Scott Cyphers's avatar
Scott Cyphers committed
207

LS Cook's avatar
LS Cook committed
208

209
.. _access_outputs:
LS Cook's avatar
LS Cook committed
210 211 212 213

Access the outputs
==================

Scott Cyphers's avatar
Scott Cyphers committed
214 215
We can use the ``read`` method to access the result:

216
.. literalinclude:: ../../../../examples/abc/abc.cpp
217
   :language: cpp
218
   :lines: 60-77
LS Cook's avatar
LS Cook committed
219

Scott Cyphers's avatar
Scott Cyphers committed
220 221
.. _all_together:

222 223
Put it all together
===================
Scott Cyphers's avatar
Scott Cyphers committed
224

225
.. literalinclude:: ../../../../examples/abc/abc.cpp
226
   :language: cpp
L.S. Cook's avatar
L.S. Cook committed
227
   :linenos:
228
   :caption: "The (a + b) * c example for executing a computation on nGraph"
LS Cook's avatar
LS Cook committed
229

230 231 232



233
.. _Intel MKL-DNN: https://01.org/mkl-dnn
234
.. _Intel Nervana NNP: https://ai.intel.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/