Commit 76e36f2a authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Final PR review edits plus repair abc.cpp example docs that broke whe… (#987)

* Final PR review edits plus repair abc.cpp example docs that broke when code was added

* Word
parent f4a36aaf
......@@ -22,23 +22,23 @@ framework's hardware abstraction layer:
* The framework expects complete control of the GPU, and that the device doesn't
need to be shared.
* The framework expects that developers will write things in a `SIMT-friendly`_
manner, thus requring only a limited set of data layout conventions.
manner.
Some of these design decisions have implications that do not translate well to
the newer or more demanding generation of **adaptable software**. For example,
the newer, more demanding generation of **adaptable software**. For example,
most frameworks that expect full control of the GPU devices experience their
own per-device inefficiency for resource utilization whenever the system
encounters a bottleneck.
Most framework owners will tell you to refactor the model in order to remove the
unimplemented copy, rather than attempt to run multiple models in parallel, or
attempt to figure out how to build graphs more efficiently. In other words, if
a model requires any operation that hasn't been implemented on GPU, it must wait
for copies to propagate from the CPU to the GPU(s). An effect of this
inefficiency is that it slows down the system. Data scientists who are facing a
large curve of uncertainty in how large (or how small) the compute-power needs
of their model will be, investing heavily in frameworks reliant upon GPUs may
not be the best decision.
own per-device inefficiency for resource utilization whenever the system is
oversubscribed.
Most framework owners will tell you to refactor the model in order to remove
operations that are not implemented on the GPU, rather than attempt to run
multiple models in parallel, or attempt to figure out how to build graphs
more efficiently. In other words, if a model requires any operation that
hasn't been implemented on GPU, it must wait for copies to propagate from
the CPU to the GPU(s). An effect of this inefficiency is that it slows down
the system. For data scientists who are facing a large curve of uncertainty in
how large (or how small) the compute-power needs of their model will be,
investing heavily in frameworks reliant upon GPUs may not be the best decision.
Meanwhile, the shift toward greater diversity in deep learning **hardware devices**
requires that these assumptions be revisited. Incorporating direct support for
......@@ -166,7 +166,8 @@ and results in a tensor with the same element type and shape:
Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor
:math:`X`. So the value of the sum of two tensors is a tensor whose value at a
coordinate is the sum of the elements' two inputs. Unlike many frameworks, it
says nothing about storage or arrays.
does not require the user or the framework bridge to specify anything about
storage or arrays.
An ``Add`` op is used to represent an elementwise tensor sum. To
construct an Add op, each of the two inputs of the ``Add`` must be
......@@ -266,4 +267,4 @@ After the graph is constructed, we create the function, passing the
that are arguments.
.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
\ No newline at end of file
.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
......@@ -136,7 +136,7 @@ To select the ``"CPU"`` backend,
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 39-40
:lines: 38-39
.. _compile_cmp:
......@@ -177,7 +177,7 @@ the three parameters and the return value as follows:
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 46-51
:lines: 41-46
Each tensor is a shared pointer to a ``runtime::TensorView``, the interface
backends implement for tensor use. When there are no more references to the
......@@ -192,7 +192,7 @@ Next we need to copy some data into the tensors.
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 53-60
:lines: 48-55
The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for
copying data to/from the tensor.
......@@ -207,7 +207,7 @@ call frame:
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 63
:lines: 57-58
.. _access_outputs:
......@@ -219,7 +219,7 @@ We can use the ``read`` method to access the result:
.. literalinclude:: ../../../examples/abc.cpp
:language: cpp
:lines: 65-67
:lines: 60-77
.. _all_together:
......
......@@ -143,7 +143,7 @@ The process documented here will work on CentOS 7.4.
$ ./bootstrap
$ make && sudo make install
#. Clone the `NervanaSystems` ``ngraph`` repo via SSH and use Cmake 3.4.3 to
#. Clone the `NervanaSystems` ``ngraph`` repo via HTTPS and use Cmake 3.4.3 to
install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your
deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to
target the accelerations available directly by compiling the build as follows
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment