graph-basics.rst 11.3 KB
Newer Older
L.S. Cook's avatar
L.S. Cook committed
1
.. graph-basics:
2

L.S. Cook's avatar
L.S. Cook committed
3
#############
4
Graph Basics
L.S. Cook's avatar
L.S. Cook committed
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#############

Overview
========

This section provides a brief overview of some concepts used in the nGraph 
Library. It also introduces new ideas regarding our unique departure from the 
first generation of deep learning software design. 

The historical dominance of GPUs at the beginning of the current 
:abbr:`DL (Deep Learning)` boom means that many framework authors made 
GPU-specific design decisions at a very deep level. Those assumptions created 
an "ecosystem" of frameworks that all behave essentially the same at the
framework's hardware abstraction layer: 

* The framework expects to own memory allocation. 
* The framework expects the execution device to be a GPU. 
* The framework expects complete control of the GPU, and that the device doesn't 
  need to be shared. 
* The framework expects that developers will write things in a `SIMT-friendly`_ 
25
  manner.    
L.S. Cook's avatar
L.S. Cook committed
26 27
  
Some of these design decisions have implications that do not translate well to 
28
the newer, more demanding generation of **adaptable software**. For example, 
L.S. Cook's avatar
L.S. Cook committed
29
most frameworks that expect full control of the GPU devices experience their 
30 31 32 33 34 35 36 37 38 39 40 41
own per-device inefficiency for resource utilization whenever the system is 
oversubscribed. 

Most framework owners will tell you to refactor the model in order to remove 
operations that are not implemented on the GPU, rather than attempt to run 
multiple models in parallel, or attempt to figure out how to build graphs 
more efficiently. In other words, if a model requires any operation that 
hasn't been implemented on GPU, it must wait for copies to propagate from 
the CPU to the GPU(s). An effect of this inefficiency is that it slows down 
the system. For data scientists who are facing a large curve of uncertainty in 
how large (or how small) the compute-power needs of their model will be, 
investing heavily in frameworks reliant upon GPUs may not be the best decision.  
L.S. Cook's avatar
L.S. Cook committed
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117

Meanwhile, the shift toward greater diversity in deep learning **hardware devices** 
requires that these assumptions be revisited. Incorporating direct support for 
all of the different hardware targets out there, each of which has its own 
preferences when it comes to the above factors, is a very heavy burden 
on framework owners.

Adding the nGraph compiler to the system lightens that burden by raising the 
abstraction level, and by letting any hardware-specific backends make these 
decisions automatically. The nGraph Compiler is designed to be able to take into 
account the needs of each target hardware platform, and to achieve maximum 
performance.

This makes things easier on framework owners, but also (as new models are developed) 
on data scientists, who will not have to keep in mind nearly as many low-level 
hardware details when architecting their models with layers of complexity for 
anything other than a :abbr:`Just-in-Time (JIT)` compilation.     

While the first generation frameworks tended to need to make a tradeoff between 
being "specialized" and "adaptable" (the trade-off between training and inference), 
nGraph Library permits algorithms implemented in a DNN to be both specialized 
and adaptable. The new generation of software design in and around AI ecosystems 
can and should be much more flexible.   


* :ref:`framework_bridges`
* :ref:`about_transformers`
* :ref:`graph_shaping`
 


.. _framework_bridges:

Framework bridges
=================

In the nGraph ecosystem, a framework is what the data scientist uses to solve 
a specific (and usually large-scale) deep learning computational problem with 
the use of a high-level, data science-oriented language. 

A framework :term:`bridge` is a software layer (typically a plugin *for* or an 
extension *to* a framework) that translates the data science-oriented language 
into a compute-oriented language called a :term:`data-flow graph`. The bridge 
can then present the problem to the nGraph :abbr:`Abstraction Layer (AL)` which 
is responsible for execution on an optimized backend by performing graph 
transformations that replace subgraphs of the computation with more optimal 
(in terms of machine code) subgraphs. Throughout this process, ``ops`` represent 
tensor operations. 

Either the framework can provide its own graph of functions to be compiled and 
optimized via :abbr:`Ahead-of-Time (AoT)` compilation to send back to the 
framework, or an entity (framework or user) who requires the flexibility of 
shaping ops directly can use our graph construction functions to experiment with 
building runtime APIs for their framework, thus exposing more flexible multi-theaded compute 
power options to 

See the section on :doc:`howto/execute` for a detailed walk-through describing 
how this translation can be programmed to happen automatically via a framework. 


.. _about_transformers:

Transformer ops
================

A framework bridge may define its own bridge-specific ops, as long as they can be 
converted to transformer ops. This is usually achieved by them first being 
converted to core ops on the way. For example, if a framework has a 
``PaddedCell`` op, nGraph pattern replacement facilities can be used to convert 
it into one of our core ops.  More detail on transformer ops will be coming soon.  


.. _graph_shaping:

Graph shaping
=============
118

L.S. Cook's avatar
L.S. Cook committed
119 120 121
Tensors
-------

122 123 124 125 126
*Tensors* are maps from coordinates to scalar values, all of the same
type, called the *element type* of the tensor. Coordinates are tuples
of non-negative integers; all the coordinates for a tensor have the
same length, called the *rank* of the tensor. We often use
:math:`n`-tensor for tensors with rank :math:`n`.
L.S. Cook's avatar
L.S. Cook committed
127 128 129 130 131 132

The :term:`shape` of a tensor is a tuple of non-negative integers that 
represents an exclusive upper bound for coordinate values. A tensor has an 
element for every coordinate less than the shape, so the *size* of the tensor 
is the product of the values in the shape.

133 134 135 136 137
An :math:`n`-dimensional array is the usual implementation for a
tensor, and the two terms are often used interchangeably, but a tensor
could just as easily be represented by a function that returns 0 for
every coordinate or a function that adds the elements of two other
tensors at the same coordinate and returns that sum.
L.S. Cook's avatar
L.S. Cook committed
138 139 140 141

Ops
---

142 143 144 145 146 147 148 149
A computation graph is a composition of tensor computations, called
``ops``, which are nodes in the graph. In the graph, every :term:`op`
*input* must be associated with an op *output*, and every op output
must have a fixed element type and shape to correspond with the
tensors used in the computation. Every op has zero or more inputs and
zero or more outputs.  The outputs represent tensors that will be
provided during execution. Ops may also have additional attributes
that do not change during execution.
L.S. Cook's avatar
L.S. Cook committed
150

151 152 153 154
Every `op` is a `Node`, but not all nodes are ops. This is because
pattern graphs are another kind of graph that includes ops combined
with nodes that describe how to match subgraphs during graph
optimization.
L.S. Cook's avatar
L.S. Cook committed
155 156 157 158 159 160 161 162 163 164 165 166

Constructed ops have element types and shapes for each of their outputs, which 
are determined during op construction from the element types and shapes 
associated with the inputs, as well as additional attributes of the ops. For 
example, tensor addition is defined for two tensors of the same shape and size 
and results in a tensor with the same element type and shape:

.. math::

  (A+B)_I = A_I + B_I

Here, :math:`X_I` means the value of a coordinate :math:`I` for the tensor 
L.S. Cook's avatar
L.S. Cook committed
167 168
:math:`X`. So the value of the sum of two tensors is a tensor whose value at a 
coordinate is the sum of the elements' two inputs. Unlike many frameworks, it 
169 170
does not require the user or the framework bridge to specify anything about 
storage or arrays.
L.S. Cook's avatar
L.S. Cook committed
171

172 173 174 175 176 177 178
An ``Add`` op is used to represent an elementwise tensor sum. To
construct an Add op, each of the two inputs of the ``Add`` must be
assigned some output of some already-created op. All outputs of
constructed ops have element types and shapes, so when the Add is
constructed, it verifies that the two input tensors have the same
element type and shape and then sets its output to have the same
element type and shape.
L.S. Cook's avatar
L.S. Cook committed
179

180 181 182 183
Since all nodes supplying outputs for inputs to a new node must exist
before the new node can be created, it is impossible to construct a
cyclic graph.  Furthermore, type-checking is performed as the ops are
constructed.
L.S. Cook's avatar
L.S. Cook committed
184 185 186 187 188


Functions
---------

189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
Ops are grouped together in a ``Function``, which describes a
computation that can be invoked on tensor arguments to compute tensor
results. When called by a bridge, the bridge provides tensors in the
form of row-major arrays for each argument and each computed
result. The same array can be used for more than one argument, but
each result must use a distinct array, and argument arrays cannot be
used as result arrays.

Function definition begins with creating one or more ``Parameter``
ops, which represent the tensors that will be supplied as arguments to
the function.  Parameters have no inputs and attributes for the
element type and shape of the tensor that will be provided as an
argument. The unique output of the ``Parameter`` will have the
provided element type and shape.

A ``Function`` has ``Parameters``, a vector of ``Parameter`` ops,
where no ``Parameter`` op may appear more than once in the vector.  A
``Parameter`` op has no inputs and attributes for its shape and
element type; arrays passed to the function must have the same shape
and element type as the corresponding parameter.  The ``Function``
also has ``Nodes``, a vector of ops that are the results being
computed.
L.S. Cook's avatar
L.S. Cook committed
211 212 213 214 215

During execution, the output of the nth ``Parameter`` op will be the tensor
corresponding to the array provided as the nth argument, and the outputs
of all result ops will be written into the result arrays in row-major
order.
216

L.S. Cook's avatar
L.S. Cook committed
217 218 219 220




221
An Example
L.S. Cook's avatar
L.S. Cook committed
222
==========
223

224
::
225

226 227
   #include <memory>
   #include <ngraph.hpp>
228

229
   using ngraph;
230

231 232 233
   // f(a, b, c) = (a + b) * c
   void make_function()
   {
234

235 236 237 238 239 240 241
       // First construct the graph
       Shape shape{32, 32};
       auto a = std::make_shared<op::Parameter>(element::f32, shape);
       auto b = std::make_shared<op::Parameter>(element::f32, shape);
       auto c = std::make_shared<op::Parameter>(element::f32, shape);
       auto t0 = std::make_shared<op::Add>(a, b);
       auto t1 = std::make_shared<op::Multiply>(t0, c);
242

243 244
       auto f = std::make_shared<Function>(Nodes{t1}, Parameters{a, b, c});
   }
245

L.S. Cook's avatar
L.S. Cook committed
246

247 248 249 250
We use shared pointers for all ops. For each parameter, we need to
element type and shape attributes. When the function is called, each
argument must conform to the corresponding parameter element type and
shape.
251

252 253 254 255 256 257
During typical graph construction, all ops have one output and some
number of inputs, which makes it easy to construct the graph by
assigning each unique output of a constructor argument node to an
input of the op being constructed.  For example, `Add` need to supply
node outputs to each of its two inputs, which we supply from the
unique outputs of the parameters `a` and `b`.
258

259 260 261 262 263
We do not perform any implicit element type coercion or shape
conversion (such as broadcasts) since these can be
framework-dependent, so all the shapes for the add and multiply must
be the same. If there is a mismatch, the constructor will throw an
exception.
264

265 266 267
After the graph is constructed, we create the function, passing the
`Function` constructor the nodes that are results and the parameters
that are arguments.
268

L.S. Cook's avatar
L.S. Cook committed
269

270
.. _SIMT-friendly: https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads