Commit 5f5288a4 authored by Scott Cyphers's avatar Scott Cyphers

Finish exanple.

parent 7119f47f
......@@ -86,62 +86,144 @@ a 32-bit float with shape ``(2, 3)``. Similarly, ``op::Multiply``
checks that its inputs match and sets the element type and shape of
its unique output.
Once the graph is built, we need to package it in a ``Function``:
.. code-block:: cpp
auto f = make_shared<Function>(NodeVector{t1}, ParameterVector{a, b, c});
The first argument to the constuctor specifies the nodes that the
function will return; in this case, the product. A ``NodeVector`` is a
vector of shared pointers of ``op::Node``. The second argument
specifies the parameters of the function, in the order they are to be
passed to the compiled function. A ``ParameterVector`` is a vector of
shared pointers to ``op::Parameter``. *The parameter vector must
include* **every** *parameter used in the computation of the results.*
.. _specify_bkd:
Specify the backend upon which to run the computation
=====================================================
.. TODO
Describe how to specify nGraph++ backends.
A *backend* is an environment that can execute computations, such as
the CPU or an NNP. A *transformer* can compile computations for a
backend, allocate and deallocate tensors, and invoke computations.
The current selection process is showing signs of age, and will be
changed. The general idea is that there are factory-like managers for
classes of backend, Managers can compile a ``Function`` and allocate
backends. A backend is somewhat analogous to a multi-threaded
process.
There are two backends for the CPU, the optimized "CPU" backend, which
makes use of mkl-dnn, and the "INTERPRETER" backend which runs
reference versions of kernels where implementation clarity is favored
over speed. The "INTERPRETER" backend is mainly used for testing.
To select the "CPU" backend,
.. code-block:: cpp
auto manager = runtime::Manager::get("CPU");
auto backend = manager->allocate_backend();
.. _compile_cmp:
Compile the computation
=======================
.. TODO
Compilation produces something misnamed an ``ExternalFunction``, which
is a factory for producing a ``CallFrame``, a function and associated
state that can run in a single thread at a time. A ``CallFrame`` may
be reused, but any particular ``CallFrame`` must only be running in
one thread at any time. If more than one thread needs to execute the
function at the same time, create multiple ``CallFrame`` objects from
the ``ExternalFunction``.
Describe how to compile a computation with nGraph++ ops/libs/etc. How to avoid
unnecessary compiler actions that might otherwise happen by default.
.. code-block:: cpp
auto external = manager->compile(f);
auto cf = backend->make_call_frame(external);
.. _allocate_bkd_storage:
Allocate backend storage for the inputs and outputs
===================================================
.. TODO
At the graph level, functions are stateless. They do
have internal state related to execution, but there is no user-visible
state. Variables must be passed as arguments. If the function updates
variables, it must return the updated variables.
To invoke a function, tensors must be provided for every input and
every output. At this time, a tensor used as an input cannot also be
used as an output. If variables are being updated, you should use a
double-buffering approach where you switch between odd/even
generations of variables on each update.
Explain how transformer(s) do(es) neat things.
Backends are responsible for managing storage. If the storage is
off-CPU, caches are used to minimize copying between device and
CPU. We can allocate storage for the three parameters and return value
as follows:
.. code-block:: cpp
auto t_a = backend->make_primary_tensor_view(element::f32, shape);
auto t_b = backend->make_primary_tensor_view(element::f32, shape);
auto t_c = backend->make_primary_tensor_view(element::f32, shape);
auto t_result = backend->make_primary_tensor_view(element::f32, shape);
Each tensor is a shared pointer to a ``runtime::TensorView``, the
interface backends implement for tensor use. When there are no more
references to the tensor view, it will be freed when convenient for
the backend.
.. _initialize_inputs:
Initialize the inputs
=====================
.. TODO
Normally the framework bridge reads/writes bytes to the tensor,
assuming a row-major element layout. To simplify writing unit tests,
we have developed a class for making tensor literals. We can use these
to initialize our tensors:
.. code-block:: cpp
copy_data(t_a, test::NDArray<float, 2>({{1, 2, 3}, {4, 5, 6}}).get_vector());
copy_data(t_b, test::NDArray<float, 2>({{7, 8, 9}, {10, 11, 12}}).get_vector());
copy_data(t_c, test::NDArray<float, 2>({{1, 0, -1}, {-1, 1, 2}}).get_vector());
Action that initializes inputs?
The ``test::NDArray`` needs to know the element type (``float``) and
rank (``2``) of the tensors, and figures out the shape during template
expansion.
The ``runtime::TensorView`` interface has ``write`` and ``read``
methods for copying data to/from the tensor.
.. _invoke_cmp:
Invoke the computation
======================
.. TODO
To invoke the function, we simply pass argument and result tensors to
the call frame:
.. code-block:: cpp
cf->call({t_a, t_b, t_c}, {t_result});
.. _access_outputs:
Access the outputs
==================
.. TODO
We can use the ``read`` method to access the result:
.. code-block:: cpp
float r[2,3];
t_result->read(&r, 0, sizeof(r));
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment