Final PR review edits plus repair abc.cpp example docs that broke whe… (#987)

* Final PR review edits plus repair abc.cpp example docs that broke when code was added * Word

Final PR review edits plus repair abc.cpp example docs that broke whe… (#987)
* Final PR review edits plus repair abc.cpp example docs that broke when code was added * Word
76e36f2a · L.S. Cook · Scott Cyphers · f4a36aaf · 76e36f2a · 76e36f2a
Commit 76e36f2a authored May 11, 2018 by L.S. Cook Committed by Scott Cyphers May 11, 2018
Hide whitespace changes
Inline Side-by-side

Showing with 23 additions and 23 deletions

graph-basics.rst doc/sphinx/source/graph-basics.rst +17 -17

execute.rst doc/sphinx/source/howto/execute.rst +5 -5

install.rst doc/sphinx/source/install.rst +1 -1

No files found.
--- a/doc/sphinx/source/graph-basics.rst
+++ b/doc/sphinx/source/graph-basics.rst
@@ -22,23 +22,23 @@ framework's hardware abstraction layer:
 * The framework expects complete control of the GPU, and that the device doesn't 
  need to be shared. 
 * The framework expects that developers will write things in a `SIMT-friendly`_ 
-  manner, thus requring only a limited set of data layout conventions.    
+  manner.    
  
 Some of these design decisions have implications that do not translate well to 
-the newer or more demanding generation of **adaptable software**. For example, 
+the newer, more demanding generation of **adaptable software**. For example, 
 most frameworks that expect full control of the GPU devices experience their 
-own per-device inefficiency for resource utilization whenever the system 
-encounters a bottleneck. 
-
-Most framework owners will tell you to refactor the model in order to remove the 
-unimplemented copy, rather than attempt to run multiple models in parallel, or 
-attempt to figure out how to build graphs more efficiently. In other words, if 
-a model requires any operation that hasn't been implemented on GPU, it must wait 
-for copies to propagate from the CPU to the GPU(s). An effect of this 
-inefficiency is that it slows down the system. Data scientists who are facing a 
-large curve of uncertainty in how large (or how small) the compute-power needs 
-of their model will be, investing heavily in frameworks reliant upon GPUs may 
-not be the best decision.  
+own per-device inefficiency for resource utilization whenever the system is 
+oversubscribed. 
+
+Most framework owners will tell you to refactor the model in order to remove 
+operations that are not implemented on the GPU, rather than attempt to run 
+multiple models in parallel, or attempt to figure out how to build graphs 
+more efficiently. In other words, if a model requires any operation that 
+hasn't been implemented on GPU, it must wait for copies to propagate from 
+the CPU to the GPU(s). An effect of this inefficiency is that it slows down 
+the system. For data scientists who are facing a large curve of uncertainty in 
+how large (or how small) the compute-power needs of their model will be, 
+investing heavily in frameworks reliant upon GPUs may not be the best decision.  

 Meanwhile, the shift toward greater diversity in deep learning **hardware devices** 
 requires that these assumptions be revisited. Incorporating direct support for 
@@ -166,7 +166,8 @@ and results in a tensor with the same element type and shape:
 Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor 
 :math:`X`. So the value of the sum of two tensors is a tensor whose value at a 
 coordinate is the sum of the elements' two inputs. Unlike many frameworks, it 
-says nothing about storage or arrays.
+does not require the user or the framework bridge to specify anything about 
+storage or arrays.

 An ``Add`` op is used to represent an elementwise tensor sum. To
 construct an Add op, each of the two inputs of the ``Add`` must be
@@ -266,4 +267,4 @@ After the graph is constructed, we create the function, passing the
 that are arguments.


-.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
\ No newline at end of file
+.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
--- a/doc/sphinx/source/howto/execute.rst
+++ b/doc/sphinx/source/howto/execute.rst
@@ -136,7 +136,7 @@ To select the ``"CPU"`` backend,

 .. literalinclude:: ../../../examples/abc.cpp
   :language: cpp
-   :lines: 39-40
+   :lines: 38-39


 .. _compile_cmp:
@@ -177,7 +177,7 @@ the three parameters and the return value as follows:

 .. literalinclude:: ../../../examples/abc.cpp
   :language: cpp
-   :lines: 46-51
+   :lines: 41-46

 Each tensor is a shared pointer to a ``runtime::TensorView``, the interface 
 backends implement for tensor use. When there are no more references to the 
@@ -192,7 +192,7 @@ Next we need to copy some data into the tensors.

 .. literalinclude:: ../../../examples/abc.cpp
   :language: cpp
-   :lines: 53-60
+   :lines: 48-55

 The ``runtime::TensorView`` interface has ``write`` and ``read`` methods for 
 copying data to/from the tensor.
@@ -207,7 +207,7 @@ call frame:

 .. literalinclude:: ../../../examples/abc.cpp
   :language: cpp
-   :lines: 63
+   :lines: 57-58


 .. _access_outputs:
@@ -219,7 +219,7 @@ We can use the ``read`` method to access the result:

 .. literalinclude:: ../../../examples/abc.cpp
   :language: cpp
-   :lines: 65-67
+   :lines: 60-77

 .. _all_together:


--- a/doc/sphinx/source/install.rst
+++ b/doc/sphinx/source/install.rst
@@ -143,7 +143,7 @@ The process documented here will work on CentOS 7.4.
      $ ./bootstrap
      $ make && sudo make install  

-#. Clone the `NervanaSystems` ``ngraph`` repo via SSH and use Cmake 3.4.3 to 
+#. Clone the `NervanaSystems` ``ngraph`` repo via HTTPS and use Cmake 3.4.3 to 
   install the nGraph libraries to ``$HOME/ngraph_dist``. Another option, if your 
   deployment system has Intel® Advanced Vector Extensions (Intel® AVX), is to 
   target the accelerations available directly by compiling the build as follows