Commit 76e36f2a authored by L.S. Cook's avatar L.S. Cook Committed by Scott Cyphers

Final PR review edits plus repair abc.cpp example docs that broke whe… (#987)

* Final PR review edits plus repair abc.cpp example docs that broke when code was added

* Word
parent f4a36aaf
...@@ -22,23 +22,23 @@ framework's hardware abstraction layer: ...@@ -22,23 +22,23 @@ framework's hardware abstraction layer:
* The framework expects complete control of the GPU, and that the device doesn't * The framework expects complete control of the GPU, and that the device doesn't
need to be shared. need to be shared.
* The framework expects that developers will write things in a `SIMT-friendly`_ * The framework expects that developers will write things in a `SIMT-friendly`_
manner, thus requring only a limited set of data layout conventions. manner.
Some of these design decisions have implications that do not translate well to Some of these design decisions have implications that do not translate well to
the newer or more demanding generation of **adaptable software**. For example, the newer, more demanding generation of **adaptable software**. For example,
most frameworks that expect full control of the GPU devices experience their most frameworks that expect full control of the GPU devices experience their
own per-device inefficiency for resource utilization whenever the system own per-device inefficiency for resource utilization whenever the system is
encounters a bottleneck. oversubscribed.
Most framework owners will tell you to refactor the model in order to remove the Most framework owners will tell you to refactor the model in order to remove
unimplemented copy, rather than attempt to run multiple models in parallel, or operations that are not implemented on the GPU, rather than attempt to run
attempt to figure out how to build graphs more efficiently. In other words, if multiple models in parallel, or attempt to figure out how to build graphs
a model requires any operation that hasn't been implemented on GPU, it must wait more efficiently. In other words, if a model requires any operation that
for copies to propagate from the CPU to the GPU(s). An effect of this hasn't been implemented on GPU, it must wait for copies to propagate from
inefficiency is that it slows down the system. Data scientists who are facing a the CPU to the GPU(s). An effect of this inefficiency is that it slows down
large curve of uncertainty in how large (or how small) the compute-power needs the system. For data scientists who are facing a large curve of uncertainty in
of their model will be, investing heavily in frameworks reliant upon GPUs may how large (or how small) the compute-power needs of their model will be,
not be the best decision. investing heavily in frameworks reliant upon GPUs may not be the best decision.
Meanwhile, the shift toward greater diversity in deep learning **hardware devices** Meanwhile, the shift toward greater diversity in deep learning **hardware devices**
requires that these assumptions be revisited. Incorporating direct support for requires that these assumptions be revisited. Incorporating direct support for
...@@ -166,7 +166,8 @@ and results in a tensor with the same element type and shape: ...@@ -166,7 +166,8 @@ and results in a tensor with the same element type and shape:
Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor
:math:`X`. So the value of the sum of two tensors is a tensor whose value at a :math:`X`. So the value of the sum of two tensors is a tensor whose value at a
coordinate is the sum of the elements' two inputs. Unlike many frameworks, it coordinate is the sum of the elements' two inputs. Unlike many frameworks, it
says nothing about storage or arrays. does not require the user or the framework bridge to specify anything about
storage or arrays.
An ``Add`` op is used to represent an elementwise tensor sum. To An ``Add`` op is used to represent an elementwise tensor sum. To
construct an Add op, each of the two inputs of the ``Add`` must be construct an Add op, each of the two inputs of the ``Add`` must be
...@@ -266,4 +267,4 @@ After the graph is constructed, we create the function, passing the ...@@ -266,4 +267,4 @@ After the graph is constructed, we create the function, passing the
that are arguments. that are arguments.
.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads .. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
\ No newline at end of file
...@@ -136,7 +136,7 @@ To select the ``"CPU"`` backend, ...@@ -136,7 +136,7 @@ To select the ``"CPU"`` backend,
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 39-40 :lines: 38-39
.. _compile_cmp: .. _compile_cmp:
...@@ -177,7 +177,7 @@ the three parameters and the return value as follows: ...@@ -177,7 +177,7 @@ the three parameters and the return value as follows:
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 46-51 :lines: 41-46
Each tensor is a shared pointer to a ``runtime::TensorView``, the interface Each tensor is a shared pointer to a ``runtime::TensorView``, the interface
backends implement for tensor use. When there are no more references to the backends implement for tensor use. When there are no more references to the
...@@ -192,7 +192,7 @@ Next we need to copy some data into the tensors. ...@@ -192,7 +192,7 @@ Next we need to copy some data into the tensors.
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 53-60 :lines: 48-55
The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for
copying data to/from the tensor. copying data to/from the tensor.
...@@ -207,7 +207,7 @@ call frame: ...@@ -207,7 +207,7 @@ call frame:
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 63 :lines: 57-58
.. _access_outputs: .. _access_outputs:
...@@ -219,7 +219,7 @@ We can use the ``read`` method to access the result: ...@@ -219,7 +219,7 @@ We can use the ``read`` method to access the result:
.. literalinclude:: ../../../examples/abc.cpp .. literalinclude:: ../../../examples/abc.cpp
:language: cpp :language: cpp
:lines: 65-67 :lines: 60-77
.. _all_together: .. _all_together:
......
...@@ -143,7 +143,7 @@ The process documented here will work on CentOS 7.4. ...@@ -143,7 +143,7 @@ The process documented here will work on CentOS 7.4.
$ ./bootstrap $ ./bootstrap
$ make && sudo make install $ make && sudo make install
#. Clone the `NervanaSystems` ``ngraph`` repo via SSH and use Cmake 3.4.3 to #. Clone the `NervanaSystems` ``ngraph`` repo via HTTPS and use Cmake 3.4.3 to
install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your
deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to
target the accelerations available directly by compiling the build as follows target the accelerations available directly by compiling the build as follows
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment