Commit 98740cd7 authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Docs/editing (#1026)

* editing how to execute computation file for clarity and linenos

* Add placeholder for runtime docs

* Update section on backends, interpreter, and FPGA options

* add updated master to fix python_ci

* Weird autosummary issue reverted

* Clarify new section

* remove renamed file

* sentence structure
parent 0d125c51
...@@ -8,14 +8,14 @@ This section explains how to manually perform the steps that would normally be ...@@ -8,14 +8,14 @@ This section explains how to manually perform the steps that would normally be
performed by a framework :term:`bridge` to execute a computation. The nGraph performed by a framework :term:`bridge` to execute a computation. The nGraph
library is targeted toward automatic construction; it is far easier for a library is targeted toward automatic construction; it is far easier for a
processing unit (GPU, CPU, or an `Intel Nervana NNP`_) to run a computation than processing unit (GPU, CPU, or an `Intel Nervana NNP`_) to run a computation than
it is for a user to map out how that computation happens. Unfortunately, things it is for a human to map out how that computation happens. Unfortunately, things
that make by-hand graph construction simpler tend to make automatic construction that make by-hand graph construction simpler tend to make automatic construction
more difficult, and vice versa. more difficult, and vice versa.
Here we will do all the bridge steps manually. The :term:`model description` Here we will do all the bridge steps manually. The :term:`model description`
we're explaining is based on the :file:`abc.cpp` file in the ``/doc/examples/`` walk-through below is based on the :file:`abc.cpp` code in the ``/doc/examples/``
directory. We'll be deconstructing the steps that an entity (framework or directory. We'll be deconstructing the steps that must happen (either programmatically
user) must be able to carry out in order to successfully execute a computation: or manually) in order to successfully execute a computation:
* :ref:`define_cmp` * :ref:`define_cmp`
* :ref:`specify_bkd` * :ref:`specify_bkd`
...@@ -25,7 +25,7 @@ user) must be able to carry out in order to successfully execute a computation: ...@@ -25,7 +25,7 @@ user) must be able to carry out in order to successfully execute a computation:
* :ref:`invoke_cmp` * :ref:`invoke_cmp`
* :ref:`access_outputs` * :ref:`access_outputs`
The final code is at the :ref:`end of this page <all_together>`. The full code is at the :ref:`end of this page <all_together>`.
.. _define_cmp: .. _define_cmp:
...@@ -34,42 +34,37 @@ Define the computation ...@@ -34,42 +34,37 @@ Define the computation
====================== ======================
To a :term:`framework`, a computation is simply a transformation of inputs to To a :term:`framework`, a computation is simply a transformation of inputs to
outputs. While a *framework bridge* can programmatically construct the graph outputs. While a :term:`bridge` can programmatically construct the graph
from a framework's representation of the computation, graph construction can be from a framework's representation of the computation, graph construction can be
somewhat more tedious for users. To a user, who is usually interested in somewhat more tedious when done manually. For anyone interested in specific
specific nodes (vertices) or edges of a computation that reveal "what is nodes (vertices) or edges of a computation that reveal "what is happening where",
happening where", it can be helpful to think of a computation as a zoomed-out it can be helpful to think of a computation as a zoomed-out and *stateless*
and *stateless* dataflow graph where all of the nodes are well-defined tensor :term:`data-flow graph` where all of the nodes are well-defined tensor
operations and all of the edges denote use of an output from one operation as operations and all of the edges denote use of an output from one operation as an
an input for another operation. input for another operation.
.. TODO
.. image for representing nodes and edges of (a+b)*c
Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so Most of the public portion of the nGraph API is in the ``ngraph`` namespace, so
we will omit the namespace. Use of namespaces other than ``std`` will be we will omit the namespace. Use of namespaces other than ``std`` will be
namespaces in ``ngraph``. For example, the ``op::Add`` is assumed to refer to namespaces in ``ngraph``. For example, the ``op::Add`` is assumed to refer to
``ngraph::op::Add``. ``ngraph::op::Add``. A computation's graph is constructed from ops; each is a
member of a subclass of ``op::Op``, which, in turn, is a subclass of ``Node``.
A computation's graph is constructed from ops; each is a member of a subclass of Not all graphs are computation, but all graphs are composed entirely of
``op::Op``, which, in turn, is a subclass of ``Node``. Not all graphs are instances of ``Node``. Computation graphs contain only ``op::Op`` nodes.
computation, but all graphs are composed entirely of instances of ``Node``.
Computation graphs contain only ``op::Op`` nodes.
We mostly use :term:`shared pointers<shared pointer>` for nodes, i.e. We mostly use :term:`shared pointers<shared pointer>` for nodes, i.e.
``std::shared_ptr<Node>`` so that they will be automatically ``std::shared_ptr<Node>``, so that they will be automatically deallocated when
deallocated when they are no longer needed. A brief summary of shared they are no longer needed. More detail on shared pointers is given in the
pointers is given in the glossary. glossary.
Every node has zero or more *inputs*, zero or more *outputs*, and zero or more Every node has zero or more *inputs*, zero or more *outputs*, and zero or more
*attributes*. The specifics for each ``type`` permitted on a core ``Op``-specific *attributes*.
basis can be discovered in our :doc:`../ops/index` docs. For our
purpose to :ref:`define a computation <define_cmp>`, nodes should be thought of The specifics for each ``type`` permitted on a core ``Op``-specific basis can be
as essentially immutable; that is, when constructing a node, we need to supply discovered in our :doc:`../ops/index` docs. For our purpose to
all of its inputs. We get this process started with ops that have no inputs, :ref:`define a computation <define_cmp>`, nodes should be thought of as essentially
since any op with no inputs is going to first need some inputs. immutable; that is, when constructing a node, we need to supply all of its
inputs. We get this process started with ops that have no inputs, since any op
with no inputs is going to first need some inputs.
``op::Parameter`` specifes the tensors that will be passed to the computation. ``op::Parameter`` specifes the tensors that will be passed to the computation.
They receive their values from outside of the graph, so they have no inputs. They receive their values from outside of the graph, so they have no inputs.
...@@ -78,12 +73,12 @@ be passed to them. ...@@ -78,12 +73,12 @@ be passed to them.
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 26-29 :lines: 25-29
Here we have made three parameter nodes, each a 32-bit float of shape ``(2, 3)`` The above code makes three parameter nodes where each is a 32-bit float of
using a row-major element layout. shape ``(2, 3)`` and a row-major element layout.
We can create a graph for ``(a+b)*c`` by creating an ``op::Add`` node with inputs To create a graph for ``(a + b) * c``, first make an ``op::Add`` node with inputs
from ``a`` and ``b``, and an ``op::Multiply`` node from the add node and ``c``: from ``a`` and ``b``, and an ``op::Multiply`` node from the add node and ``c``:
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
...@@ -130,9 +125,11 @@ process. ...@@ -130,9 +125,11 @@ process.
There are two backends for the CPU: the optimized ``"CPU"`` backend, which uses There are two backends for the CPU: the optimized ``"CPU"`` backend, which uses
the `Intel MKL-DNN`_, and the ``"INTERPRETER"`` backend, which runs reference the `Intel MKL-DNN`_, and the ``"INTERPRETER"`` backend, which runs reference
versions of kernels that favor implementation clarity over speed. The versions of kernels that favor implementation clarity over speed. The
``"INTERPRETER"`` backend can be slow, and is primarily intended for testing. ``"INTERPRETER"`` backend can be slow, and is primarily intended for testing.
See the documentation on :doc:`runtime options for various backends <../programmable/index>`
for additional details.
To select the ``"CPU"`` backend, To continue with our original example and select the ``"CPU"`` backend:
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
...@@ -151,10 +148,6 @@ in a single thread at a time. A ``CallFrame`` may be reused, but any particular ...@@ -151,10 +148,6 @@ in a single thread at a time. A ``CallFrame`` may be reused, but any particular
thread needs to execute the function at the same time, create multiple thread needs to execute the function at the same time, create multiple
``CallFrame`` objects from the ``ExternalFunction``. ``CallFrame`` objects from the ``ExternalFunction``.
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 43-44
.. _allocate_bkd_storage: .. _allocate_bkd_storage:
...@@ -179,9 +172,12 @@ the three parameters and the return value as follows: ...@@ -179,9 +172,12 @@ the three parameters and the return value as follows:
:language: cpp :language: cpp
:lines: 41-46 :lines: 41-46
Each tensor is a shared pointer to a ``runtime::TensorView``, the interface
backends implement for tensor use. When there are no more references to the Each tensor is a shared pointer to a :doc:`../programmable/index/tensorview`,
tensor view, it will be freed when convenient for the backend. the interface backends implement for tensor use. When there are no more references to the
tensor view, it will be freed when convenient for the backend. See the
:doc:`../programmable/index` documentation for details on ``TensorView ``.
.. _initialize_inputs: .. _initialize_inputs:
...@@ -228,6 +224,7 @@ Put it all together ...@@ -228,6 +224,7 @@ Put it all together
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:linenos:
:caption: "The (a + b) * c example for executing a computation on nGraph" :caption: "The (a + b) * c example for executing a computation on nGraph"
......
...@@ -145,9 +145,11 @@ Contents ...@@ -145,9 +145,11 @@ Contents
project/index.rst project/index.rst
framework-integration-guides.rst framework-integration-guides.rst
optimize/index.rst optimize/index.rst
programmable/index.rst
python_api/index.rst python_api/index.rst
Indices and tables Indices and tables
================== ==================
......
...@@ -144,17 +144,13 @@ The process documented here will work on CentOS 7.4. ...@@ -144,17 +144,13 @@ The process documented here will work on CentOS 7.4.
$ make && sudo make install $ make && sudo make install
#. Clone the `NervanaSystems` ``ngraph`` repo via HTTPS and use Cmake 3.4.3 to #. Clone the `NervanaSystems` ``ngraph`` repo via HTTPS and use Cmake 3.4.3 to
install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your install the nGraph libraries to ``$HOME/ngraph_dist``.
deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to
target the accelerations available directly by compiling the build as follows
during the cmake step: ``-DNGRAPH_TARGET_ARCH=skylake-avx512``.
.. code-block:: console .. code-block:: console
$ cd /opt/libraries $ cd /opt/libraries
$ git clone https://github.com/NervanaSystems/ngraph.git $ git clone https://github.com/NervanaSystems/ngraph.git
$ cd ngraph && mkdir build && cd build $ cd ngraph && mkdir build && cd build
$ cmake ../ [-DNGRAPH_TARGET_ARCH=skylake-avx512] $ cmake ../
$ make && sudo make install $ make && sudo make install
......
.. index.rst
#######################
Interact with Backends
#######################
Backend
========
Backends are responsible for function execution and value allocation. They
can be used to :doc:`carry out a programmed computation<../howto/execute>`
from a framework by using a CPU or GPU; or they can be used with an *Interpreter*
mode, which is primarily intended for testing, to analyze a program, or for a
framework developer to develop a custom UI or API.
.. figure:: ../graphics/runtime.png
:width: 650px
.. doxygenclass:: ngraph::runtime::Backend
:project: ngraph
:members:
TensorView
===========
.. doxygenclass:: ngraph::runtime::TensorView
:project: ngraph
:members:
...@@ -3,19 +3,6 @@ ngraph.exceptions ...@@ -3,19 +3,6 @@ ngraph.exceptions
.. automodule:: ngraph.exceptions .. automodule:: ngraph.exceptions
.. rubric:: Exceptions .. rubric:: Exceptions
.. autosummary:: .. autosummary::
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment