Merge branch 'master' into cpu_layout2

6bf066d7 · Jaikrishnan Menon · 18f7c5fb · fd2bf807 · 6bf066d7 · 6bf066d7
Commit 6bf066d7 authored Feb 12, 2018 by Jaikrishnan Menon
43 changed files
--- a/.gitignore
+++ b/.gitignore
@@ -52,6 +52,9 @@ doc/source/generated
 .cache/
 nervana_aeon.egg-info/
+# emacs
+*~
 # vim
 *.swp
 *.swo

--- a/README.md
+++ b/README.md
@@ -17,4 +17,4 @@ See our [install] docs for how to get started.
 For this early release, we provide framework integration guides to compile 
 MXNet and TensorFlow-based projects.  
-[install]: doc/sphinx/source/installation.rst
+[install]: http://ngraph.nervanasys.com/docs/cpp/installation.html
\ No newline at end of file
--- a/doc/sphinx/ngraph_theme/static/css/theme.css
+++ b/doc/sphinx/ngraph_theme/static/css/theme.css
@@ -2220,7 +2220,7 @@ div[class^='highlight'] pre {
  background-color: #272525;
  display: block;
  text-align: right;
-  font-size: 90%;
+  font-size: 95%;
  cursor: pointer;
  color: #27AE60;
  *zoom: 1;
@@ -2488,8 +2488,8 @@ div[class^='highlight'] pre {
  line-height: 1.0em;
 }
 .rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
-  font-size: 101% !important;
+  font-size: 100% !important;
-  color: #72a1ab;
+  color: #528481;
  line-height: 0.91em;
 }
 .rst-content tt.xref, a .rst-content tt, .rst-content tt.xref, .rst-content code.xref, a .rst-content tt, a .rst-content code {

--- a/doc/sphinx/ngraph_theme/static/css/theme.css.backup
+++ b/doc/sphinx/ngraph_theme/static/css/theme.css.backup
@@ -2489,7 +2489,7 @@ div[class^='highlight'] pre {
 }
 .rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
  font-size: 101% !important;
-  color: #72a1ab;
+  color: #528481;
  line-height: 0.91em;
 }
 .rst-content tt.xref, a .rst-content tt, .rst-content tt.xref, .rst-content code.xref, a .rst-content tt, a .rst-content code {

--- a/doc/sphinx/source/api.rst
+++ b/doc/sphinx/source/api.rst
-.. api.rst:
-API 
-###
-.. Don't add Python APIs that will break the build.  
-Sections
-========
--- a/doc/sphinx/source/autodiff.rst
+++ b/doc/sphinx/source/autodiff.rst
-.. autodiff.rst
-Autodiff 
-########
-The ``autodiff`` ... 
-.. TODO update for cpp
--- a/doc/sphinx/source/branding-notice.rst
+++ b/doc/sphinx/source/branding-notice.rst
+:orphan:
 .. branding-notice:

--- a/doc/sphinx/source/glossary.rst
+++ b/doc/sphinx/source/glossary.rst
+:orphan:
 .. glossary: 
 Glossary 

--- a/doc/sphinx/source/graph-basics.rst
+++ b/doc/sphinx/source/graph-basics.rst
@@ -3,8 +3,22 @@
 Graph Basics
 ============
-This section describes the basic concepts you need to know when constructing
+This section describes the basic concepts you need to know when 
-a graph.
+constructing a graph.
+Framework Bridges
+------------------
+Frontends (or users who require the flexibility of constructing 
+Ops directly) can utilize a set of graph construction functions 
+to construct graphs. 
+A framework bridge constructs a function which is compiled/optimized
+by a sequence of graph transformations that replace subgraphs of the
+computation with more optimal subgraphs. Throughout this process, ops
+represent tensor operations.
 Tensors
 -------
@@ -150,38 +164,3 @@ After the graph is constructed, we create the function, passing the
 `Function` constructor the nodes that are results and the parameters
 that are arguments.
-Defining ops
-============
-A framework bridge constructs a function which is compiled/optimized
-by a sequence of graph transformations that replace subgraphs of the
-computation with more optimal subgraphs. Throughout this process, ops
-represent tensor operations.
-*Core ops* are ops that are available and generally useful to all
-framework bridges and that can be compiled by all transformers. A
-framework bridge may define framework-specific ops to simplify graph
-construction, provided that the bridge can enable every transformer to
-replace all such ops with equivalent subgraphs composed of core
-ops. Similary, transformers may define transformer-specific ops to
-represent kernels or other intermediate operations. If a framework
-supports extending the set of ops it offers, a bridge may even expose
-transformer-specific ops to the framework user.
-It is easiest to define a new op by adapting an existing op. Some of
-the tasks that must be performed are:
- Op constructor:
-  * Checking type-consistency of arguments 
-  * Specifying the result type for a call 
- Serializer/Deserializer
- Transformer handlers:
-  * Interpreter (reference) implementation of behavior. The
-    implementation should favor clarity over efficiency.
--- a/doc/sphinx/source/index.rst
+++ b/doc/sphinx/source/index.rst
@@ -22,6 +22,9 @@ of :abbr:`Deep Learning (DL)` (DL) systems. Here you will find a suite of
 components, APIs, and documentation that can be used to compile and run  
 :abbr:`Deep Neural Network (DNN)` (DNN) models defined in a variety of frameworks.  
+.. figure:: graphics/ngraph-hub.png  
 For this early release, we provide :doc:`framework-integration-guides` to compile 
 and run MXNet and TensorFlow-based projects.
@@ -32,54 +35,26 @@ Architecture CPUs (CPU), the Intel® Nervana Neural Network Processor™ (NNP),
 and NVIDIA\* GPUs. Currently-supported compiler optimizations include efficient 
 memory management and data layout abstraction. 
-Further overview details can be found on our :doc:`about` page. 
+Further project details can be found on our :doc:`project/about` page. 
 =======
+Sections
+=========
 .. toctree::
   :maxdepth: 1
-   :caption: Table Of Contents
   :name: tocmaster
+   :caption: Table of Contents
   installation.rst
   testing-libngraph.rst
   framework-integration-guides.rst
   graph-basics.rst
+   ops/index.rst
-.. toctree::
+   project/index.rst
-   :maxdepth: 1
-   :caption: Algorithms 
-   :name: 
-.. toctree::
-   :maxdepth: 1
-   :caption: Reference API
-   api.rst
-   autodiff.rst
-   glossary.rst
-.. toctree::
-   :maxdepth: 1
-   :caption: Ops
-   ops/abs.rst
-   ops/convolution.rst
-.. toctree::
-   :maxdepth: 1
-   :caption: Project Docs
-   about.rst
-   release-notes.rst
-   code-contributor-README.rst
-.. toctree::
-   :maxdepth: 0
-   :hidden: 
-   branding-notice.rst
-   doc-contributor-README.rst
 Indices and tables
@@ -87,4 +62,5 @@ Indices and tables
   * :ref:`search`   
   * :ref:`genindex`
--- a/doc/sphinx/source/ops/abs.rst
+++ b/doc/sphinx/source/ops/abs.rst
@@ -4,23 +4,32 @@
 Abs
 ###
+Description
+===========
 Elementwise absolute value operation.
-Produces a single output tensor of the same element type and shape as the input,
+Produces a single output tensor of the same element type and shape as ``arg``,
-where the value at each coordinate of the output is the absoloute value of the
+where the value at each coordinate of ``output`` is the absoloute value of the
-value at each input coordinate.
+value at each ``arg`` coordinate.
+Inputs
+------
 +-----------------+-------------------------+--------------------------------+
-| Input Name      | Element Type            | Shape                          |
+| Name            | Element Type            | Shape                          |
 +=================+=========================+================================+
-| ``input``       | Any                     | Any                            |
+| ``arg``         | Any                     | Any                            |
 +-----------------+-------------------------+--------------------------------+
-+------------------+-------------------------+----------------------------------------------------+
+Outputs
-| Output Name      | Element Type            | Shape                                              |
+-------
-+==================+=========================+====================================================+
-| ``output``       | Same as ``input``       | Same as input.                                     |
+-----------------+-------------------------+--------------------------------+
-+------------------+-------------------------+----------------------------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | Same as ``arg``         | Same as ``arg``.               |
+-----------------+-------------------------+--------------------------------+
 Mathematical Definition
@@ -28,14 +37,15 @@ Mathematical Definition
 .. math::
-   output_{i_0, \ldots, i_{n-1}} = \mathrm{abs}(input_{i_0, \ldots, i_{n-1}})
+   \mathtt{output}_{i_0, \ldots, i_{n-1}} = \left|\mathtt{arg}_{i_0,
+   \ldots, i_{n-1}}\right|
 Backprop
 ========
 .. math::
-   \overline{input} \leftarrow \mathrm{sgn}(input)\Delta
+   \overline{\texttt{arg}} \leftarrow \Delta\ \mathrm{sgn}(\texttt{arg})
 C++ Interface
@@ -43,8 +53,3 @@ C++ Interface
 .. doxygenclass:: ngraph::op::Abs
   :members:
-Python Interface
-================
-is not merged yet, but could go here!
--- a/doc/sphinx/source/ops/acos.rst
+++ b/doc/sphinx/source/ops/acos.rst
+.. acos.rst:
+####
+Acos
+####
+Description
+===========
+Elementwise acos operation.
+Produces a tensor of the same element type and shape as ``arg``,
+where the value at each coordinate of ``output`` is the inverse cosine of the
+value at the corresponding coordinate of ``arg`` .
+Inputs
+------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``arg``         | Any                     | Any                            |
+-----------------+-------------------------+--------------------------------+
+Outputs
+-------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | Same as ``arg``         | Same as ``arg``.               |
+-----------------+-------------------------+--------------------------------+
+Mathematical Definition
+=======================
+.. math::
+   \texttt{output}_{i_0, \ldots, i_{n-1}} = \cos^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
+Backprop
+========
+.. math::
+   \overline{\texttt{arg}} \leftarrow -\frac{\Delta}{\sqrt{1-\texttt{arg}^2}}
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::Acos
+   :members:
--- a/doc/sphinx/source/ops/add.rst
+++ b/doc/sphinx/source/ops/add.rst
+.. add.rst:
+###
+Add
+###
+Description
+===========
+Elementwise add operation.
+Produces tensor of the same element type and shape as the two inputs,
+where the value at each coordinate of ``output`` is the sum of the
+value at the corresponding input coordinates.
+Inputs
+------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``arg0``        | any                     | any                            |
+-----------------+-------------------------+--------------------------------+
+| ``arg1``        | same as ``arg0``        | same as ``arg0``               |
+-----------------+-------------------------+--------------------------------+
+Outputs
+-------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | same as ``arg0``        | same as ``arg0``               |
+-----------------+-------------------------+--------------------------------+
+Mathematical Definition
+=======================
+.. math::
+   \texttt{output}_{i_0, \ldots, i_{n-1}} = \texttt{arg0}_{i_0, \ldots, i_{n-1}} + \texttt{arg1}_{i_0, \ldots, i_{n-1}}
+Backprop
+========
+.. math::
+   \overline{\texttt{arg0}} &\leftarrow \Delta \\
+   \overline{\texttt{arg1}} &\leftarrow \Delta
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::Add
+   :members:
--- a/doc/sphinx/source/ops/asin.rst
+++ b/doc/sphinx/source/ops/asin.rst
+.. asin.rst:
+####
+Asin
+####
+Description
+===========
+Elementwise asin operation.
+Produces a tensor of the same element type and shape as ``arg``,
+where the value at each coordinate of ``output`` is the inverse sine of the
+value at the corresponding coordinate of ``arg`` .
+Inputs
+------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``arg``         | Any                     | Any                            |
+-----------------+-------------------------+--------------------------------+
+Outputs
+-------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | Same as ``arg``         | Same as ``arg``.               |
+-----------------+-------------------------+--------------------------------+
+Mathematical Definition
+=======================
+.. math::
+   \texttt{output}_{i_0, \ldots, i_{n-1}} = \sin^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
+Backprop
+========
+.. math::
+   \overline{\texttt{arg}} \leftarrow \frac{\Delta}{\sqrt{1-\texttt{arg}^2}}
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::Asin
+   :members:
--- a/doc/sphinx/source/ops/atan.rst
+++ b/doc/sphinx/source/ops/atan.rst
+.. atan.rst:
+####
+Atan
+####
+Description
+===========
+Elementwise atan operation.
+Produces a tensor of the same element type and shape as ``arg``,
+where the value at each coordinate of ``output`` is the inverse tangent of the
+value at the corresponding coordinate of ``arg`` .
+Inputs
+------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``arg``         | Any                     | Any                            |
+-----------------+-------------------------+--------------------------------+
+Outputs
+-------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | Same as ``arg``         | Same as ``arg``.               |
+-----------------+-------------------------+--------------------------------+
+Mathematical Definition
+=======================
+.. math::
+   \texttt{output}_{i_0, \ldots, i_{n-1}} = \tan^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
+Backprop
+========
+.. math::
+   \overline{\texttt{arg}} \leftarrow \frac{\Delta}{1+\texttt{arg}^2}
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::Atan
+   :members:
--- a/doc/sphinx/source/ops/avg_pool.rst
+++ b/doc/sphinx/source/ops/avg_pool.rst
+.. avg_pool.rst:
+#######
+AvgPool
+#######
+Description
+===========
+Average Pooling operation.
+Average pooling windows its input and produces an average for each window.
+Inputs
+------
+-----------------+----------------+--------------------------------+--------------------+
+| Name            | Element Type   | Shape                          | Notes              |
+=================+================+================================+====================+
+| ``data``        | Any            | :math:`(N,C,d_1,\ldots,d_n)`   | :math:`n>0, d_i>0` |
+-----------------+----------------+--------------------------------+--------------------+
+Attributes
+----------
+----------------------+-----------------+----------------------------------+
+| Name                 | Type            | Notes                            |
+======================+=================+==================================+
+| ``w``                | ``Shape[n]``    | Window shape. :math:`w_i\le d_i` |
+----------------------+-----------------+----------------------------------+
+| ``s``                | ``Strides[n]``  | Window strides.                  |
+----------------------+-----------------+----------------------------------+
+| ``p``                | ``Shape[n]``    | Padding below.                   |
+----------------------+-----------------+----------------------------------+
+| ``q``                | ``Shape[n]``    | Padding above.                   |
+----------------------+-----------------+----------------------------------+
+Outputs
+-------
+-----------------+-------------------------+--------------------------------+
+| Name            | Element Type            | Shape                          |
+=================+=========================+================================+
+| ``output``      | Any                     | :math:`(N,C,d'_1,\ldots,d'_n)` |
+-----------------+-------------------------+--------------------------------+
+Average pooling takes as its input a batch tensor `data` of shape
+:math:`(N,C,d_1,\ldots,d_n)` where  where :math:`N` is the batch
+size, and :math:`C > 0` is the
+number of channels (sometimes called features).  The dimensions
+:math:`(d_1,\ldots,d_n)` correspond to the shape of an
+:math:`n`-dimensional data item in a batch. For example, where
+:math:`n=2`, the data may represent a two-dimensional image. It also
+takes four attributes:
+1. *window shape*,
+2. *window movement strides*, (optional)
+3. *padding below*, (optional)
+4. *padding above*, (optional).
+The shape of `output` is :math:`(N,C,d'_1,\ldots,d'_n)`, where
+:math:`d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil`.
+*In the absence of padding*, given an input data batch tensor
+ :math:`T_\textit{in}`, the output tensor is defined by the equation
+.. math::
+   T_\textit{out}[a,c,i_1,\ldots,i_n] =
+   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
+   T_\textit{in}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
+*In the presence of padding*, we do not always want to divide by a
+reciprocal equal to the number of elements in the window, since some
+of the output points are determined by a window that is partly hanging
+beyond the edge of the tensor. In this case we can define the output
+via a few intermediate steps.
+First define the *sum tensor* :math:`T_\textit{sum}`, with shape
+:math:`(N,C,d'_1,\ldots,d'_n)`, as follows.
+.. math::
+   T_\textit{sum}[a,c,i_1,\ldots,i_n] =
+   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
+   \textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
+where
+.. math::
+   \textit{val}[a,c,j_1,\ldots,j_n] =
+   \begin{cases}
+   T_\textit{in}[a,c,j_1,\ldots,j_n]&\text{if for all } k, p_k \le j_k < p_k + d_k\\
+   0&\text{otherwise}.
+   \end{cases}
+Second, define the *divisor tensor* :math:`T_\textit{div}`, with shape :math:`(N,C,d'_1,\ldots,d'_n)`, as follows.
+.. math::
+   T_\textit{div}[a,c,i_1,\ldots,i_n] =
+   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
+   \textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
+where
+.. math::
+   \textit{val}[a,c,j_1,\ldots,j_n] =
+   \begin{cases}
+   1&\text{if for all }k, p_k \le j_k < p_k + d_k\\
+   0&\text{otherwise}.
+   \end{cases}
+Finally, define :math:`T_\textit{out}` as the result of elementwise
+dividing :math:`T_\textit{sum}` by :math:`T_\textit{div}`.  Note that
+at positions where :math:`T_\textit{div}` is zero, values may be
+infinity or nan.  (This corresponds to a condition where the pooling
+window is completely out of bounds, encompassing no valid values.)
+Backprop
+========
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::AvgPool
+   :members:
--- a/doc/sphinx/source/ops/avg_pool_backprop.rst
+++ b/doc/sphinx/source/ops/avg_pool_backprop.rst
+.. avg_pool_backprop.rst:
+###############
+AvgPoolBackprop
+###############
+Average Pooling backprop operation.
+C++ Interface
+=============
+.. doxygenclass:: ngraph::op::AvgPoolBackprop
+   :members:
+Python Interface
+================
+is not merged yet, but could go here!
--- a/doc/sphinx/source/ops/convolution.rst
+++ b/doc/sphinx/source/ops/convolution.rst
@@ -4,21 +4,50 @@
 Convolution
 ###########
+Description
+===========
 A batched convolution operation.
-Basic Operation
+Inputs
-===============
+------
 +-----------------+-------------------------+--------------------------------+
-| Input Name      | Element Type            | Shape                          |
+| Name            | Element Type            | Shape                          |
 +=================+=========================+================================+
 | ``image_batch`` | Any                     | ``(N, C_in, d_1, ..., d_n)``   |
 +-----------------+-------------------------+--------------------------------+
 | ``filters``     | Same as ``image_batch`` | ``(N, C_in, df_1, ..., df_n)`` |
 +-----------------+-------------------------+--------------------------------+
+Attributes
+----------
+-----------------------------+-----------------------------+---------------------------------------+
+| Name                        | Type                        | Notes                                 |
+=============================+=============================+=======================================+
+| ``window_movement_strides`` | ``Strides[n]``              | How far to slide the window along     |
+|                             |                             | each axis at each step.               |
+-----------------------------+-----------------------------+---------------------------------------+
+| ``window_dilation_strides`` | ``Strides[n]``              | Per-axis dilation to apply to the     |
+|                             |                             | filters.                              |
+-----------------------------+-----------------------------+---------------------------------------+
+| ``padding_below``           | ``Shape[n]``                | How many padding elements to add      |
+|                             |                             | below the 0-coordinate on each axis.  |
+-----------------------------+-----------------------------+---------------------------------------+
+| ``padding_above``           | ``Shape[n]``                | How manny padding elements to add     |
+|                             |                             | above the max-coordinate on each axis.|
+-----------------------------+-----------------------------+---------------------------------------+
+| ``image_dilation_strides``  | ``Strides[n]``              | Per-axis dilation to apply to the     |
+|                             |                             | image batch.                          |
+-----------------------------+-----------------------------+---------------------------------------+
+Outputs
+-------
 +------------------+-------------------------+----------------------------------------------------+
-| Output Name      | Element Type            | Shape                                              |
+| Name             | Element Type            | Shape                                              |
 +==================+=========================+====================================================+
 | ``features_out`` | Same as ``image_batch`` | ``(N, C_in, d_1 - df_1 + 1, ..., d_n - df_n + 1)`` |
 +------------------+-------------------------+----------------------------------------------------+
@@ -27,38 +56,6 @@ It must be the case that after dilation and padding are applied, the filter fits
 .. TODO image add
-Window Parameters
-=================
-+-----------------------------+-----------------------------+------------------------------------+
-| Parameter Name              | Type                        | Meaning                            |
-+=============================+=============================+====================================+
-| ``window_movement_strides`` | ``Strides`` of length ``n`` | How far to slide the window along  |
-|                             |                             | each axis at each step.            |
-+-----------------------------+                             +------------------------------------+
-| ``window_dilation_strides`` |                             | Per-axis dilation to apply to the  |
-|                             |                             | filters.                           |
-+-----------------------------+-----------------------------+------------------------------------+
-.. TODO: pictorial example of the effect of window movement stride.
-.. TODO: pictorial example of window before and after dilation.
-Image Batch Parameters
-======================
-+----------------------------+-----------------------------+---------------------------------------+
-| Parameter Name             | Type                        | Meaning                               |
-+============================+=============================+=======================================+
-| ``padding_below``          | ``Padding`` of length ``n`` | How many padding elements to add      |
-|                            |                             | below the 0-coordinate on each axis.  |
-+----------------------------+                             +---------------------------------------+
-| ``padding_above``          |                             | How manny padding elements to add     |
-|                            |                             | above the max-coordinate on each axis.|
-+----------------------------+-----------------------------+---------------------------------------+
-| ``image_dilation_strides`` | ``Strides`` of length ``n`` | Per-axis dilation to apply to the     |
-|                            |                             | image batch.                          |
-+----------------------------+-----------------------------+---------------------------------------+
 Mathematical Definition
 =======================

--- a/doc/sphinx/source/ops/index.rst
+++ b/doc/sphinx/source/ops/index.rst
+.. ops/index.rst
+Core Ops
+========
+An ``Op``'s primary role is to function as a node in a directed acyclic graph 
+dependency computation graph.  
+*Core ops* are ops that are available and generally useful to all framework 
+bridges and that can be compiled by all transformers. A framework bridge may 
+define framework-specific ops to simplify graph construction, provided that the 
+bridge can enable every transformer to replace all such ops with equivalent 
+subgraphs composed of core ops. Similary, transformers may define 
+transformer-specific ops to represent kernels or other intermediate operations. 
+If a framework supports extending the set of ops it offers, a bridge may even 
+expose transformer-specific ops to the framework user.
+Our design philosophy is that the graph is not a script for running kernels; 
+rather, our compilation will match ``ops`` to appropriate kernels for the 
+backend(s) in use. Thus, we expect that adding of new Core ops should be 
+infrequent and that most functionality instead gets added with new functions 
+that build sub-graphs from existing core ops.   
+It is easiest to define a new op by adapting an existing op. Some of the tasks 
+that must be performed are:
+- Op constructor:
+  * Checking type-consistency of arguments 
+  * Specifying the result type for a call 
+- Serializer/Deserializer
+- Transformer handlers:
+  * Interpreter (reference) implementation of behavior. The
+    implementation should favor clarity over efficiency.
+Alphabetical list of Core ``ops``
+----------------------------------
+Not currently a comprehensive list.  
+.. toctree::
+   :maxdepth: 1
+   abs.rst
+   acos.rst
+   add.rst
+   asin.rst
+   atan.rst
+   avg_pool.rst
+   avg_pool_backprop.rst
+   convolution.rst
--- a/doc/sphinx/source/about.rst
+++ b/doc/sphinx/source/about.rst
@@ -8,6 +8,8 @@ of :abbr:`Deep Learning (DL)` (DL) systems. Here you will find a suite of
 components, APIs, and documentation that can be used to compile and run  
 :abbr:`Deep Neural Network (DNN)` models defined in a variety of frameworks.  
+.. figure:: ../graphics/ngraph-hub.png  
 The nGraph library translates a framework’s representation of computations into 
 an :abbr:`Intermediate Representation (IR)` designed to promote computational 
 efficiency on target hardware. Initially-supported backends include Intel 
@@ -15,8 +17,6 @@ Architecture CPUs, the Intel® Nervana Neural Network Processor™ (NNP),
 and NVIDIA\* GPUs. Currently-supported compiler optimizations include efficient 
 memory management and data layout abstraction. 
-.. figure:: graphics/fig.jpeg  
 The *nGraph core* uses a strongly-typed and platform-neutral stateless graph 
 representation for computations. Each node, or *op*, in the graph corresponds
 to one step in a computation, where each step produces zero or more tensor

--- a/doc/sphinx/source/project/branding-notice.rst
+++ b/doc/sphinx/source/project/branding-notice.rst
+:orphan:
+.. branding-notice:
+Branding Notice
+===============
+The Intel® nGraph™ library is an open source project providing code and component 
+reference for many kinds of machine learning, deep learning, and DNN applications. 
+Documentation may include references to frontend frameworks, modules, extensions, 
+or other libraries that may be wholly or partially open source, or that may be 
+claimed as the property of others.  
+Intel nGraph library core documentation
+---------------------------------------
+.. note:: The branding notice below applies to code and documentation 
+   contributions intended to be added directly to Intel nGraph library core.   
+Use the first or most prominent usage with symbols as described below.
+Subsequent references on the same document, or on a file with an 
+already-present prominent form (such as Sphinx\* documentation sidebars), 
+may be done as an abbreviated form (sub-bullet items) and/or without the 
+repeated use of the trademark / branding symbols. 
+* Intel® Nervana™ Neural Network Processor 
+  * Intel® Nervana™ NNP 
+* Intel® Xeon Phi™ (CPU processor)
+* Intel® Xeon® (CPU processor)
+* Intel® nGraph™
+* Intel® nGraph™ library 
+    * nGraph library
+    * ``ngraph`` API
+    * ``ngraph`` library
+    * ``ngraph`` backend
+    * nGraph abstraction layer
+    * neon™ frontend framework 
+* Intel® Math Kernel Library
+  * Intel® MKL
+* Intel® Math Kernel Library for Deep Neural Networks 
+  * Intel® MKL-DNN
+* Intel® Nervana™ Graph (deprecated)
--- a/doc/sphinx/source/code-contributor-README.rst
+++ b/doc/sphinx/source/code-contributor-README.rst
--- a/doc/sphinx/source/doc-contributor-README.rst
+++ b/doc/sphinx/source/doc-contributor-README.rst
@@ -56,14 +56,14 @@ source file (``.rst``):
 ::
-  .. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
+  .. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
     :language: cpp
     :lines: 20-31
 And the raw code will render as follows
-.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
+.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
   :language: cpp
   :lines: 20-31
@@ -86,7 +86,7 @@ line numbers, and add a caption "One way to define neon axes within the dqn_atar
 ::
-  .. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
+  .. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
    :language: cpp
    :lines: 20-31
    :caption: 
@@ -94,7 +94,7 @@ line numbers, and add a caption "One way to define neon axes within the dqn_atar
 and the generated output will show readers of your helpful documentation
-.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
+.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
   :language: cpp
   :lines: 20-31
   :caption: 

--- a/doc/sphinx/source/project/index.rst
+++ b/doc/sphinx/source/project/index.rst
+.. project/index.rst
+Project Docs
+============
+This section contains documentation about the project and how to contribute.
+.. toctree::
+   :maxdepth: 1
+   about.rst
+   release-notes.rst
+   code-contributor-README.rst
+   doc-contributor-README.rst
+   ../glossary.rst
--- a/doc/sphinx/source/release-notes.rst
+++ b/doc/sphinx/source/release-notes.rst
--- a/doc/sphinx/source/testing-libngraph.rst
+++ b/doc/sphinx/source/testing-libngraph.rst
@@ -28,10 +28,11 @@ After building and installing the nGraph library to your system, the next
 logical step is to compile a framework that you can use to run a 
 training/inference model with one of the backends that are now enabled.
-For this early |release| release, we're providing integration guides for:
+For this early |release| release, we're providing :doc:`framework-integration-guides`, 
+for:
-* `MXNet`_,  
+* :doc:`framework-integration-guides` framework,  
-* `TensorFlow`_, and
+* :doc:`framework-integration-guides` framework, and
 * neon™ `frontend framework`_.
 Integration guides for other frameworks are tentatively forthcoming.

--- a/src/ngraph/CMakeLists.txt
+++ b/src/ngraph/CMakeLists.txt
@@ -149,7 +149,7 @@ if (NGRAPH_CPU_ENABLE AND LLVM_INCLUDE_DIR AND
            message(FATAL_ERROR "TBB is needed by the CPU backend and was not found")
        else()
            message(STATUS "Found TBB and imported target ${TBB_IMPORTED_TARGETS}")
-            endif()
+        endif()
    endif()
    include_directories(SYSTEM ${LLVM_INCLUDE_DIR} ${MKLDNN_INCLUDE_DIR})
@@ -172,6 +172,7 @@ if (NGRAPH_CPU_ENABLE AND LLVM_INCLUDE_DIR AND
        runtime/cpu/cpu_tensor_view.cpp
        runtime/cpu/cpu_tensor_view_wrapper.cpp
        runtime/cpu/cpu_layout_descriptor.cpp
+        runtime/cpu/cpu_tracing.cpp
        runtime/cpu/mkldnn_utils.cpp
        runtime/cpu/ops/convert_layout.cpp
        runtime/cpu/ops/matmul_bias.cpp
@@ -182,14 +183,23 @@ if (NGRAPH_CPU_ENABLE AND LLVM_INCLUDE_DIR AND
    # The built-in headers are in a version-specific directory
    # This must be kept in sync with the LLVM + Clang version in use
    set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_FLAGS "-fno-rtti")
+    set(HEADER_SEARCH_DEFINES
+        "EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\""
+        "MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\""
+        "CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\""
+        "NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\""
+        "INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\""
+    )
    if (NGRAPH_TBB_ENABLE)
-        set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS
-          "EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\";MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\";CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\";TBB_HEADERS_PATH=\"${TBB_ROOT}/include\";NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\";INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\";NGRAPH_TBB_ENABLE;")
        set_source_files_properties(runtime/cpu/cpu_external_function.cpp PROPERTIES COMPILE_DEFINITIONS "NGRAPH_TBB_ENABLE")
-    else()
+        set(HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES}
-        set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS
+            "TBB_HEADERS_PATH=\"${TBB_ROOT}/include\""
-          "EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\";MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\";CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\";NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\";INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\";")
+            "NGRAPH_TBB_ENABLE"
+        )
    endif()
+    set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_SEARCH_DEFINES}")
    set(NGRAPH_CPU_DEBUGINFO_ENABLE 0 CACHE STRING "Enable debuginfo in the CPU backend")
    # GPU backend current requires CPU because they share compiler.cpp,

--- a/src/ngraph/ops/abs.hpp
+++ b/src/ngraph/ops/abs.hpp
@@ -31,10 +31,11 @@ namespace ngraph
        public:
            /// \brief Constructs an absolute value operation.
            ///
-            /// Output `[d1, ...]`
-            ///
            /// \param arg Node that produces the input tensor.<br>
            /// `[d1, ...]`
+            ///
+            /// Output `[d1, ...]`
+            ///
            Abs(const std::shared_ptr<Node>& arg)
                : UnaryElementwiseArithmetic("Abs", arg)
            {

--- a/src/ngraph/ops/acos.hpp
+++ b/src/ngraph/ops/acos.hpp
@@ -26,23 +26,16 @@ namespace ngraph
    {
        /// \brief Elementwise inverse cosine (arccos) operation.
        ///
-        /// ## Inputs
-        ///
-        /// |       | Type                              | Description                                     |
-        /// | ----- | --------------------------------- | ----------------------------------------------- |
-        /// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
-        ///
-        /// ## Output
-        ///
-        /// | Type                   | Description                                                                             |
-        /// | ---------------------- | --------------------------------------------------------------------------------------- |
-        /// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arccos(\texttt{arg}[i_1,\dots,i_n])\f$ |
        class Acos : public UnaryElementwiseArithmetic
        {
        public:
            /// \brief Constructs an arccos operation.
            ///
-            /// \param arg Node that produces the input tensor.
+            /// \param arg Node that produces the input tensor.<br>
+            /// `[d1, ...]`
+            ///
+            /// Output `[d1, ...]`
+            ///
            Acos(const std::shared_ptr<Node>& arg)
                : UnaryElementwiseArithmetic("Acos", arg)
            {

--- a/src/ngraph/ops/add.hpp
+++ b/src/ngraph/ops/add.hpp
@@ -26,25 +26,18 @@ namespace ngraph
    {
        /// \brief Elementwise addition operation.
        ///
-        /// ## Inputs
-        ///
-        /// |        | Type                              | Description                                            |
-        /// | ------ | --------------------------------- | ------------------------------------------------------ |
-        /// | `arg0` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type.        |
-        /// | `arg1` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of the same shape and element type as `arg0`. |
-        ///
-        /// ## Output
-        ///
-        /// | Type                   | Description                                                                                                    |
-        /// | ---------------------- | -------------------------------------------------------------------------------------------------------------- |
-        /// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \texttt{arg0}[i_1,\dots,i_n] + \texttt{arg1}[i_1,\dots,i_n]\f$ |
        class Add : public BinaryElementwiseArithmetic
        {
        public:
            /// \brief Constructs an addition operation.
            ///
-            /// \param arg0 Node that produces the first input tensor.
+            /// \param arg0 Node that produces the first input tensor.<br>
-            /// \param arg1 Node that produces the second input tensor.
+            /// `[d0, ...]`
+            /// \param arg1 Node that produces the second input tensor.<br>
+            /// `[d0, ...]`
+            ///
+            /// Output `[d0, ...]`
+            ///
            Add(const std::shared_ptr<Node>& arg0, const std::shared_ptr<Node>& arg1)
                : BinaryElementwiseArithmetic("Add", arg0, arg1)
            {

--- a/src/ngraph/ops/asin.hpp
+++ b/src/ngraph/ops/asin.hpp
@@ -26,23 +26,16 @@ namespace ngraph
    {
        /// \brief Elementwise inverse sine (arcsin) operation.
        ///
-        /// ## Inputs
-        ///
-        /// |       | Type                              | Description                                     |
-        /// | ----- | --------------------------------- | ----------------------------------------------- |
-        /// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
-        ///
-        /// ## Output
-        ///
-        /// | Type                   | Description                                                                             |
-        /// | ---------------------- | --------------------------------------------------------------------------------------- |
-        /// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arcsin(\texttt{arg}[i_1,\dots,i_n])\f$ |
        class Asin : public UnaryElementwiseArithmetic
        {
        public:
            /// \brief Constructs an arcsin operation.
            ///
-            /// \param arg Node that produces the input tensor.
+            /// \param arg Node that produces the input tensor.<br>
+            /// `[d1, ...]`
+            ///
+            /// Output `[d1, ...]`
+            ///
            Asin(const std::shared_ptr<Node>& arg)
                : UnaryElementwiseArithmetic("Asin", arg)
            {

--- a/src/ngraph/ops/atan.hpp
+++ b/src/ngraph/ops/atan.hpp
@@ -26,23 +26,16 @@ namespace ngraph
    {
        /// \brief Elementwise inverse tangent (arctan) operation.
        ///
-        /// ## Inputs
-        ///
-        /// |       | Type                              | Description                                     |
-        /// | ----- | --------------------------------- | ----------------------------------------------- |
-        /// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
-        ///
-        /// ## Output
-        ///
-        /// | Type                   | Description                                                                             |
-        /// | ---------------------- | --------------------------------------------------------------------------------------- |
-        /// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arctan(\texttt{arg}[i_1,\dots,i_n])\f$ |
        class Atan : public UnaryElementwiseArithmetic
        {
        public:
            /// \brief Constructs an arctan operation.
            ///
-            /// \param arg Node that produces the input tensor.
+            /// \param arg Node that produces the input tensor.<br>
+            /// `[d1, ...]`
+            ///
+            /// Output `[d1, ...]`
+            ///
            Atan(const std::shared_ptr<Node>& arg)
                : UnaryElementwiseArithmetic("Atan", arg)
            {

--- a/src/ngraph/ops/avg_pool.hpp
+++ b/src/ngraph/ops/avg_pool.hpp
@@ -24,55 +24,21 @@ namespace ngraph
    {
        /// \brief Batched average pooling operation, with optional padding and window stride.
        ///
-        /// Average pooling takes as its input an data batch tensor of shape \f$(N,C,d_1,\dots,d_n)\f$ where \f$n > 0\f$, every \f$d_i > 0\f$, and where \f$N\f$ is
-        /// the batch size, and \f$C > 0\f$ is the number of channels (sometimes called features). The dimensions \f$(d_1,\dots,d_n)\f$ correspond to the shape of
-        /// an \f$n\f$-dimensional data item in a batch. For example, where \f$n=2\f$, the data may represent a two-dimensional image. It also takes four parameters:
-        ///
-        /// 1. <i>(the window shape)</i> a size vector \f$(w_1,\dots,w_n)\f$ where every \f$w_i \le d_i\f$; and
-        /// 2. <i>(the window movement strides, optional)</i> a vector of positive integers \f$(s_1,\dots,s_n)\f$.
-        /// 3. <i>(the padding below, optional)</i> a vector of positive integers \f$(p_1,\dots,p_n)\f$.
-        /// 4. <i>(the padding above, optional)</i> a vector of positive integers \f$(q_1,\dots,q_n)\f$.
-        ///
-        /// The output has the shape \f$(N,C,d'_1,\dots,d'_n)\f$, where \f$d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil\f$.
-        ///
-        /// *In the absence of padding*, given an input data batch tensor \f$T_\textit{in}\f$, the output tensor is defined by the equation
-        ///
-        /// \f[
-        ///      T_\textit{out}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} T_\textit{in}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
-        /// \f]
-        ///
-        /// *In the presence of padding*, we do not always want to divide by a reciprocal equal to the number of elements in the window, since some of the output points are
-        /// determined by a window that is partly hanging beyond the edge of the tensor. In this case we can define the output via a few intermediate steps.
-        ///
-        /// First define the <i>sum tensor</i> \f$T_\textit{sum}\f$, with shape \f$(N,C,d'_1,\dots,d'_n)\f$, as follows.
-        ///
-        /// \f[
-        ///      T_\textit{sum}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
-        /// \f]
-        ///
-        /// where \f$\textit{val}[a,c,j_1,\dots,j_n] = T_\textit{in}[a,c,j_1,\dots,j_n]\f$ if for all \f$k\f$, \f$p_k \le j_k < p_k + d_k\f$; else \f$0\f$.
-        ///
-        /// Second, define the <i>divisor tensor</i> \f$T_\textit{div}\f$, with shape \f$(N,C,d'_1,\dots,d'_n)\f$, as follows.
-        ///
-        /// \f[
-        ///      T_\textit{div}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
-        /// \f]
-        ///
-        /// where \f$\textit{val}[a,c,j_1,\dots,j_n] = 1\f$ if for all \f$k\f$, \f$p_k \le j_k < p_k + d_k\f$; else \f$0\f$.
-        ///
-        /// Finally, define \f$T_\textit{out}\f$ as the result of elementwise dividing \f$T_\textit{sum}\f$ by \f$T_\textit{div}\f$.
-        /// Note that at positions where \f$T_\textit{div}\f$ is zero, values may be infinity or nan. (This corresponds to a condition where the pooling window is completely
-        /// out of bounds, encompassing no valid values.)
        class AvgPool : public RequiresTensorViewArgs
        {
        public:
            /// \brief Constructs a batched average pooling operation.
            ///
-            /// \param arg The node producing the input data batch tensor.
+            /// \param arg The node producing the input data batch tensor.<br>
-            /// \param window_shape The window shape.
+            /// `[d1, dn]`
-            /// \param window_movement_strides The window movement strides.
+            /// \param window_shape The window shape.<br>
-            /// \param padding_below The below-padding shape.
+            /// `[n]`
-            /// \param padding_above The above-padding shape.
+            /// \param window_movement_strides The window movement strides.<br>
+            /// `[n]`
+            /// \param padding_below The below-padding shape.<br>
+            /// `[n]`
+            /// \param padding_above The above-padding shape.<br>
+            /// `[n]`
            AvgPool(const std::shared_ptr<Node>& arg,
                    const Shape& window_shape,
                    const Strides& window_movement_strides,
@@ -81,17 +47,22 @@ namespace ngraph
            /// \brief Constructs a batched, unpadded average pooling operation (i.e., all padding shapes are set to 0).
            ///
-            /// \param arg The node producing the input data batch tensor.
+            /// \param arg The node producing the input data batch tensor.<br>
-            /// \param window_shape The window shape.
+            /// `[d1, ..., dn]`
-            /// \param window_movement_strides The window movement strides.
+            /// \param window_shape The window shape.<br>
+            /// `[n]`
+            /// \param window_movement_strides The window movement strides.<br>
+            /// `[n]`
            AvgPool(const std::shared_ptr<Node>& arg,
                    const Shape& window_shape,
                    const Strides& window_movement_strides);
            /// \brief Constructs an unstrided batched convolution operation (i.e., all window movement strides are 1 and all padding shapes are set to 0).
            ///
-            /// \param arg The node producing the input data batch tensor.
+            /// \param arg The node producing the input data batch tensor.<br>
-            /// \param window_shape The window shape.
+            /// `[d1, ..., dn]`
+            /// \param window_shape The window shape.<br>
+            /// `[n]`
            AvgPool(const std::shared_ptr<Node>& arg, const Shape& window_shape);
            virtual std::shared_ptr<Node> copy_with_new_args(

--- a/src/ngraph/ops/convolution.hpp
+++ b/src/ngraph/ops/convolution.hpp
@@ -29,8 +29,6 @@ namespace ngraph
        public:
            /// \brief Constructs a batched convolution operation.
            ///
-            /// Output `[N, C_OUT, R1, ... Rf]`
-            ///
            /// \param data_batch The node producing the input data batch tensor.<br>
            /// `[N, C_IN, D1, ... Df]`
            /// \param filters The node producing the filters tensor.<br>
@@ -45,6 +43,9 @@ namespace ngraph
            /// `[f]`
            /// \param data_dilation_strides The data dilation strides.<br>
            /// `[f]`
+            ///
+            /// Output `[N, C_OUT, R1, ... Rf]`
+            ///
            Convolution(const std::shared_ptr<Node>& data_batch,
                        const std::shared_ptr<Node>& filters,
                        const Strides& window_movement_strides,
@@ -67,6 +68,9 @@ namespace ngraph
            /// `[f]`
            /// \param padding_above The padding-above sizes.<br>
            /// `[f]`
+            ///
+            /// Output `[N, C_OUT, R1, ... Rf]`
+            ///
            Convolution(const std::shared_ptr<Node>& data_batch,
                        const std::shared_ptr<Node>& filters,
                        const Strides& window_movement_strides,
@@ -84,6 +88,9 @@ namespace ngraph
            /// `[f]`
            /// \param window_dilation_strides The window dilation strides.<br>
            /// `[f]`
+            ///
+            /// Output `[N, C_OUT, R1, ... Rf]`
+            ///
            Convolution(const std::shared_ptr<Node>& data_batch,
                        const std::shared_ptr<Node>& filters,
                        const Strides& window_movement_strides,
@@ -97,6 +104,9 @@ namespace ngraph
            /// `[C_OUT, C_IN, F1, ... Ff]`
            /// \param window_movement_strides The window movement strides.<br>
            /// `[f]`
+            ///
+            /// Output `[N, C_OUT, R1, ... Rf]`
+            ///
            Convolution(const std::shared_ptr<Node>& data_batch,
                        const std::shared_ptr<Node>& filters,
                        const Strides& window_movement_strides);
@@ -107,6 +117,9 @@ namespace ngraph
            /// `[N, C_IN, D1, ... Df]`
            /// \param filters The node producing the filters tensor.<br>
            /// `[C_OUT, C_IN, F1, ... Ff]`
+            ///
+            /// Output `[N, C_OUT, R1, ... Rf]`
+            ///
            Convolution(const std::shared_ptr<Node>& data_batch,
                        const std::shared_ptr<Node>& filters);

--- a/src/ngraph/runtime/cpu/cpu_call_frame.cpp
+++ b/src/ngraph/runtime/cpu/cpu_call_frame.cpp
@@ -19,6 +19,7 @@
 #include "ngraph/runtime/cpu/cpu_call_frame.hpp"
 #include "ngraph/runtime/cpu/cpu_external_function.hpp"
 #include "ngraph/runtime/cpu/cpu_tensor_view.hpp"
+#include "ngraph/runtime/cpu/cpu_tracing.hpp"
 using namespace std;
 using namespace ngraph;
@@ -28,6 +29,12 @@ runtime::cpu::CPU_CallFrame::CPU_CallFrame(std::shared_ptr<CPU_ExternalFunction>
    : m_external_function(external_function)
    , m_compiled_function(compiled_function)
 {
+    setup_runtime_context();
+}
+runtime::cpu::CPU_CallFrame::~CPU_CallFrame()
+{
+    cleanup_runtime_context();
 }
 void runtime::cpu::CPU_CallFrame::tensor_call(
@@ -54,7 +61,12 @@ void runtime::cpu::CPU_CallFrame::tensor_call(
    }
    // Invoke compiled computation
-    m_compiled_function(inputs.data(), outputs.data());
+    m_compiled_function(inputs.data(), outputs.data(), ctx);
+    if (runtime::cpu::IsTracingEnabled())
+    {
+        GenerateTimeline(m_external_function->get_op_attrs(), ctx->op_durations);
+    }
 }
 void runtime::cpu::CPU_CallFrame::call(
@@ -116,3 +128,20 @@ vector<runtime::PerformanceCounter> runtime::cpu::CPU_CallFrame::get_performance
    }
    return rc;
 }
+void runtime::cpu::CPU_CallFrame::setup_runtime_context()
+{
+    ctx = new CPURuntimeContext;
+    ctx->op_durations = nullptr;
+    if (runtime::cpu::IsTracingEnabled())
+    {
+        ctx->op_durations = new int64_t[m_external_function->get_op_attrs().size()];
+    }
+}
+void runtime::cpu::CPU_CallFrame::cleanup_runtime_context()
+{
+    delete[] ctx->op_durations;
+    delete ctx;
+}
--- a/src/ngraph/runtime/cpu/cpu_call_frame.hpp
+++ b/src/ngraph/runtime/cpu/cpu_call_frame.hpp
@@ -23,6 +23,7 @@
 #include "ngraph/function.hpp"
 #include "ngraph/runtime/call_frame.hpp"
 #include "ngraph/runtime/cpu/cpu_layout_descriptor.hpp"
+#include "ngraph/runtime/cpu/cpu_runtime_context.hpp"
 #include "ngraph/runtime/tensor_view.hpp"
 namespace ngraph
@@ -36,7 +37,7 @@ namespace ngraph
            class CPU_CallFrame;
            class CPU_ExternalFunction;
-            using EntryPoint_t = void(void** inputs, void** outputs);
+            using EntryPoint_t = void(void** inputs, void** outputs, CPURuntimeContext* ctx);
            using EntryPoint = std::function<EntryPoint_t>;
@@ -46,6 +47,7 @@ namespace ngraph
            public:
                CPU_CallFrame(std::shared_ptr<CPU_ExternalFunction> external_function,
                              EntryPoint compiled_function);
+                ~CPU_CallFrame();
                /// @brief Invoke the function with values matching the signature of the function.
                ///
@@ -65,9 +67,13 @@ namespace ngraph
                std::vector<ngraph::runtime::PerformanceCounter>
                    get_performance_data() const override;
+                void setup_runtime_context();
+                void cleanup_runtime_context();
            protected:
                std::shared_ptr<CPU_ExternalFunction> m_external_function;
                EntryPoint m_compiled_function;
+                CPURuntimeContext* ctx;
            };
        }
    }

--- a/src/ngraph/runtime/cpu/cpu_emitter.cpp
+++ b/src/ngraph/runtime/cpu/cpu_emitter.cpp
@@ -1014,7 +1014,7 @@ void runtime::cpu::CPU_Emitter::EmitFunctionCall(
        writer << "\n};\n";
        writer << "\n";
-        writer << function->get_name() << "(args, out);\n";
+        writer << function->get_name() << "(args, out, ctx);\n";
    }
    writer.indent--;
    writer << "}\n";
@@ -1093,13 +1093,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
            writer << "{   // " << n->get_name() << " 3\n";
            writer.indent++;
            string type = f_result_element_type.c_type_string();
-            writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+            writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
            writer.indent++;
            writer << "\n";
            writer << type << " result;\n";
            writer << "void* args[] = {&x, &y};\n";
            writer << "void* out[] = {&result};\n";
-            writer << reduction_function->get_name() << "(args, out);\n";
+            writer << reduction_function->get_name() << "(args, out, ctx);\n";
            writer << "return result;\n";
            writer.indent--;
            writer << "};\n";
@@ -1129,13 +1129,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
            writer << "{   // " << n->get_name() << " 5\n";
            writer.indent++;
            string type = f_result_element_type.c_type_string();
-            writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+            writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
            writer.indent++;
            writer << "\n";
            writer << type << " result;\n";
            writer << "void* args[] = {&x, &y};\n";
            writer << "void* out[] = {&result};\n";
-            writer << reduction_function->get_name() << "(args, out);\n";
+            writer << reduction_function->get_name() << "(args, out, ctx);\n";
            writer << "return result;\n";
            writer.indent--;
            writer << "};\n";
@@ -1161,13 +1161,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
            writer << "{   // " << n->get_name() << " 7\n";
            writer.indent++;
            string type = f_result_element_type.c_type_string();
-            writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+            writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
            writer.indent++;
            writer << "\n";
            writer << type << " result;\n";
            writer << "void* args[] = {&x, &y};\n";
            writer << "void* out[] = {&result};\n";
-            writer << reduction_function->get_name() << "(args, out);\n";
+            writer << reduction_function->get_name() << "(args, out, ctx);\n";
            writer << "return result;\n";
            writer.indent--;
            writer << "};\n";
@@ -1183,13 +1183,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
        writer.indent++;
        string type = f_result_element_type.c_type_string();
-        writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+        writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
        writer.indent++;
        writer << "\n";
        writer << type << " result;\n";
        writer << "void* args[] = {&x, &y};\n";
        writer << "void* out[] = {&result};\n";
-        writer << reduction_function->get_name() << "(args, out);\n";
+        writer << reduction_function->get_name() << "(args, out, ctx);\n";
        writer << "return result;\n";
        writer.indent--;
        writer << "};\n";
@@ -1211,13 +1211,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
    string type = f_result_element_type.c_type_string();
-    writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+    writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
    writer.indent++;
    writer << "\n";
    writer << type << " result;\n";
    writer << "void* args[] = {&x, &y};\n";
    writer << "void* out[] = {&result};\n";
-    writer << reduction_function->get_name() << "(args, out);\n";
+    writer << reduction_function->get_name() << "(args, out, ctx);\n";
    writer << "return result;\n";
    writer.indent--;
    writer << "};\n";
@@ -2194,13 +2194,13 @@ void runtime::cpu::CPU_Emitter::EmitReduceWindow(
    writer.indent++;
    string type = f_result_element_type.c_type_string();
-    writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+    writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
    writer.indent++;
    writer << "\n";
    writer << type << " result;\n";
    writer << "void* args[] = {&x, &y};\n";
    writer << "void* out[] = {&result};\n";
-    writer << reduction_function->get_name() << "(args, out);\n";
+    writer << reduction_function->get_name() << "(args, out, ctx);\n";
    writer << "return result;\n";
    writer.indent--;
    writer << "};\n";
@@ -2238,24 +2238,24 @@ void runtime::cpu::CPU_Emitter::EmitSelectAndScatter(
    string type = n->get_output_element_type(0).c_type_string();
-    writer << "auto f_select = [](" << type << " x, " << type << " y) -> char\n{";
+    writer << "auto f_select = [&](" << type << " x, " << type << " y) -> char\n{";
    writer.indent++;
    writer << "\n";
    writer << "char result;\n";
    writer << "void* args[] = {&x, &y};\n";
    writer << "void* out[] = {&result};\n";
-    writer << selection_function->get_name() << "(args, out);\n";
+    writer << selection_function->get_name() << "(args, out, ctx);\n";
    writer << "return result;\n";
    writer.indent--;
    writer << "};\n";
-    writer << "auto f_scatter = [](" << type << " x, " << type << " y) -> " << type << "\n{";
+    writer << "auto f_scatter = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
    writer.indent++;
    writer << "\n";
    writer << type << " result;\n";
    writer << "void* args[] = {&x, &y};\n";
    writer << "void* out[] = {&result};\n";
-    writer << scatter_function->get_name() << "(args, out);\n";
+    writer << scatter_function->get_name() << "(args, out, ctx);\n";
    writer << "return result;\n";
    writer.indent--;
    writer << "};\n";

--- a/src/ngraph/runtime/cpu/cpu_external_function.cpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.cpp
@@ -94,6 +94,7 @@
 #include "ngraph/runtime/cpu/cpu_emitter.hpp"
 #include "ngraph/runtime/cpu/cpu_external_function.hpp"
 #include "ngraph/runtime/cpu/cpu_tensor_view.hpp"
+#include "ngraph/runtime/cpu/cpu_tracing.hpp"
 #include "ngraph/runtime/cpu/ops/matmul_bias.hpp"
 #include "ngraph/runtime/cpu/pass/cpu_fusion.hpp"
 #include "ngraph/runtime/cpu/pass/cpu_layout.hpp"
@@ -265,6 +266,7 @@ void runtime::cpu::CPU_ExternalFunction::compile()
 #include "ngraph/runtime/aligned_buffer.hpp"
 #include "ngraph/runtime/cpu/cpu_eigen_utils.hpp"
 #include "ngraph/runtime/cpu/cpu_kernels.hpp"
+#include "ngraph/runtime/cpu/cpu_runtime_context.hpp"
 #include "ngraph/runtime/kernel/avg_pool.hpp"
 #include "ngraph/runtime/kernel/broadcast.hpp"
 #include "ngraph/runtime/kernel/concat.hpp"
@@ -402,7 +404,8 @@ using namespace ngraph::runtime;
    writer << "// Declare all functions\n";
    for (shared_ptr<Function> f : pass_manager.get_state().get_functions())
    {
-        writer << "extern \"C\" void " << f->get_name() << "(void** inputs, void** outputs);\n";
+        writer << "extern \"C\" void " << f->get_name()
+               << "(void** inputs, void** outputs, cpu::CPURuntimeContext* ctx);\n";
    }
    writer << "\n";
@@ -481,7 +484,7 @@ using namespace ngraph::runtime;
        }
        writer << "extern \"C\" void " << current_function->get_name();
-        writer << "(void** inputs, void** outputs)\n";
+        writer << "(void** inputs, void** outputs, cpu::CPURuntimeContext* ctx)\n";
        writer << "{\n";
        writer.indent++;
@@ -491,6 +494,13 @@ using namespace ngraph::runtime;
            writer << "tbb::flow::graph G;\n\n";
        }
+        // Execution tracing support
+        if (runtime::cpu::IsTracingEnabled() && current_function->get_name() == function_name)
+        {
+            writer << "cpu::Timestamp start_ts;\n"
+                   << "int profiler_count = 0;\n\n";
+        }
        bool temporaries_used = false;
        size_t worst_case_tmp_size = 0;
        for (shared_ptr<Node> node : current_function->get_ordered_ops())
@@ -614,12 +624,14 @@ using namespace ngraph::runtime;
                throw ngraph_error("Unhandled op during code generation : " + node->description());
            }
            vector<TensorViewWrapper> in;
+            vector<string> node_input_names, node_output_names;
            for (const descriptor::Input& input : node->get_inputs())
            {
                const descriptor::Output& output = input.get_output();
                shared_ptr<descriptor::TensorView> tv = output.get_tensor_view();
                in.push_back(
                    TensorViewWrapper(tv, m_variable_name_map[tv->get_tensor().get_name()]));
+                node_input_names.emplace_back(tv->get_tensor().get_name());
            }
            vector<TensorViewWrapper> out;
            for (const descriptor::Output& output : node->get_outputs())
@@ -627,11 +639,17 @@ using namespace ngraph::runtime;
                shared_ptr<descriptor::TensorView> tv = output.get_tensor_view();
                out.push_back(
                    TensorViewWrapper(tv, m_variable_name_map[tv->get_tensor().get_name()]));
+                node_output_names.emplace_back(tv->get_tensor().get_name());
            }
            // Emit operation prologue
            if (!node->is_parameter() && !node->is_constant())
            {
+                if (current_function->get_name() == function_name)
+                {
+                    m_op_attrs.emplace_back(
+                        node->description(), node_output_names, node_input_names);
+                }
                if (m_use_tbb)
                {
                    writer << "tbb::flow::continue_node<tbb::flow::continue_msg> "
@@ -644,6 +662,11 @@ using namespace ngraph::runtime;
                {
                    emit_debug_function_entry(writer, node.get(), in, out);
                }
+                if (runtime::cpu::IsTracingEnabled() &&
+                    current_function->get_name() == function_name)
+                {
+                    writer << "start_ts = cpu::Clock::now();\n";
+                }
            }
            // Emit operation body
@@ -668,7 +691,7 @@ using namespace ngraph::runtime;
                {
                    names.push_back(tv.get_name());
                }
-                writer << func_name << "(" << join(names) << ");\n";
+                writer << func_name << "(" << join(names) << ", ctx);\n";
            }
            // Emit operation epilogue
@@ -679,6 +702,13 @@ using namespace ngraph::runtime;
                {
                    emit_debug_function_exit(writer, node.get(), in, out);
                }
+                if (runtime::cpu::IsTracingEnabled() &&
+                    current_function->get_name() == function_name)
+                {
+                    writer << "ctx->op_durations[profiler_count++] = "
+                           << "(std::chrono::duration_cast<cpu::Timescale>(cpu::Clock::now() - "
+                              "start_ts)).count();\n";
+                }
                if (m_use_tbb)
                {
                    writer.indent--;
@@ -908,6 +938,7 @@ string runtime::cpu::CPU_ExternalFunction::emit_op_as_function(const Node& node,
        writer << tvw.get_type() << "* " << tvw.get_name();
        out.push_back(tvw);
    }
+    writer << ",\ncpu::CPURuntimeContext* ctx";
    writer.indent--;
    writer << "\n)\n";
    writer << "{\n";

--- a/src/ngraph/runtime/cpu/cpu_external_function.hpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.hpp
@@ -18,9 +18,11 @@
 #include <functional>
 #include <memory>
+#include <string>
 #include <typeindex>
 #include <typeinfo>
 #include <unordered_map>
+#include <vector>
 #include "ngraph/codegen/code_writer.hpp"
 #include "ngraph/codegen/compiler.hpp"
@@ -48,6 +50,21 @@ namespace ngraph
            using OpMap = std::unordered_map<std::type_index, OpFunction>;
+            struct OpAttributes
+            {
+                std::string Description;
+                std::vector<std::string> Outputs;
+                std::vector<std::string> Inputs;
+                OpAttributes(const std::string& desc,
+                             const std::vector<std::string>& outputs,
+                             const std::vector<std::string>& inputs)
+                    : Description(desc)
+                    , Outputs(outputs)
+                    , Inputs(inputs)
+                {
+                }
+            };
            class CPU_ExternalFunction : public ngraph::runtime::ExternalFunction,
                                         public std::enable_shared_from_this<CPU_ExternalFunction>
            {
@@ -61,6 +78,7 @@ namespace ngraph
                const LayoutDescriptorPtrs& get_parameter_layout_descriptors();
                const LayoutDescriptorPtrs& get_result_layout_descriptors();
+                const std::vector<OpAttributes>& get_op_attrs() const { return m_op_attrs; }
            protected:
                void compile();
@@ -95,6 +113,7 @@ namespace ngraph
                LayoutDescriptorPtrs parameter_layout_descriptors;
                LayoutDescriptorPtrs result_layout_descriptors;
+                std::vector<OpAttributes> m_op_attrs;
            };
        }
    }

--- a/src/ngraph/runtime/cpu/cpu_runtime_context.hpp
+++ b/src/ngraph/runtime/cpu/cpu_runtime_context.hpp
+// ----------------------------------------------------------------------------
+// Copyright 2018 Nervana Systems Inc.
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// ----------------------------------------------------------------------------
+#pragma once
+#include <chrono>
+#include <cstdint>
+namespace ngraph
+{
+    namespace runtime
+    {
+        namespace cpu
+        {
+            typedef std::chrono::high_resolution_clock Clock;
+            typedef std::chrono::time_point<Clock> Timestamp;
+            typedef std::chrono::microseconds Timescale;
+            extern "C" {
+            struct CPURuntimeContext
+            {
+                int64_t* op_durations;
+            };
+            }
+        }
+    }
+}
--- a/src/ngraph/runtime/cpu/cpu_tracing.cpp
+++ b/src/ngraph/runtime/cpu/cpu_tracing.cpp
+// ----------------------------------------------------------------------------
+// Copyright 2018 Nervana Systems Inc.
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// ----------------------------------------------------------------------------
+#include <fstream>
+#include <map>
+#include "cpu_tracing.hpp"
+void ngraph::runtime::cpu::to_json(nlohmann::json& json, const TraceEvent& event)
+{
+    std::map<std::string, std::string> args;
+    for (size_t i = 0; i < event.Inputs.size(); i++)
+    {
+        args["Input" + std::to_string(i + 1)] = event.Inputs[i];
+    }
+    for (size_t i = 0; i < event.Outputs.size(); i++)
+    {
+        args["Output" + std::to_string(i + 1)] = event.Outputs[i];
+    }
+    json = nlohmann::json{{"ph", event.Phase},
+                          {"cat", event.Category},
+                          {"name", event.Name},
+                          {"pid", event.PID},
+                          {"tid", event.TID},
+                          {"ts", event.Timestamp},
+                          {"dur", event.Duration},
+                          {"args", args}};
+}
+void ngraph::runtime::cpu::GenerateTimeline(const std::vector<OpAttributes>& op_attrs,
+                                            int64_t* op_durations)
+{
+    nlohmann::json timeline;
+    std::list<TraceEvent> trace;
+    std::ofstream out("timeline.json");
+    int64_t ts = 0;
+    for (size_t i = 0; i < op_attrs.size(); i++)
+    {
+        trace.emplace_back("X",
+                           "Op",
+                           op_attrs[i].Description,
+                           0,
+                           0,
+                           ts,
+                           op_durations[i],
+                           op_attrs[i].Outputs,
+                           op_attrs[i].Inputs);
+        ts += op_durations[i];
+    }
+    timeline["traceEvents"] = trace;
+    out << timeline;
+    out.close();
+    return;
+}
+bool ngraph::runtime::cpu::IsTracingEnabled()
+{
+    return (std::getenv("NGRAPH_CPU_TRACING") != nullptr);
+}
--- a/src/ngraph/runtime/cpu/cpu_tracing.hpp
+++ b/src/ngraph/runtime/cpu/cpu_tracing.hpp
+// ----------------------------------------------------------------------------
+// Copyright 2018 Nervana Systems Inc.
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// ----------------------------------------------------------------------------
+#pragma once
+#include <cstdint>
+#include <list>
+#include <string>
+#include <vector>
+#include "ngraph/json.hpp"
+#include "ngraph/runtime/cpu/cpu_external_function.hpp"
+namespace ngraph
+{
+    namespace runtime
+    {
+        namespace cpu
+        {
+            struct TraceEvent
+            {
+                // This should be a single character
+                // but the JSON encoder nlohmann::json
+                // is broken and doesn't handle character fields
+                std::string Phase;
+                std::string Category;
+                const std::string& Name;
+                unsigned int PID;
+                unsigned int TID;
+                int64_t Timestamp;
+                int64_t Duration;
+                const std::vector<std::string>& Outputs;
+                const std::vector<std::string>& Inputs;
+                TraceEvent(const std::string& ph,
+                           const std::string& cat,
+                           const std::string& name,
+                           unsigned int pid,
+                           unsigned int tid,
+                           int64_t ts,
+                           int64_t dur,
+                           const std::vector<std::string>& outputs,
+                           const std::vector<std::string>& inputs)
+                    : Phase(ph)
+                    , Category(cat)
+                    , Name(name)
+                    , PID(pid)
+                    , TID(tid)
+                    , Timestamp(ts)
+                    , Duration(dur)
+                    , Outputs(outputs)
+                    , Inputs(inputs)
+                {
+                }
+            };
+            void to_json(nlohmann::json& json, const TraceEvent& event);
+            void GenerateTimeline(const std::vector<OpAttributes>& op_attrs, int64_t* op_durations);
+            bool IsTracingEnabled();
+        }
+    }
+}
--- a/src/resource/CMakeLists.txt
+++ b/src/resource/CMakeLists.txt
@@ -25,7 +25,7 @@ if (NGRAPH_CPU_ENABLE AND NOT APPLE)
    add_executable(resource_generator ${SRC})
    add_dependencies(resource_generator ext_llvm eigen ext_mkldnn)
-    set(HEADER_PATHS
+    set(HEADER_SEARCH_DEFINES
        "EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\""
        "MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\""
        "CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\""
@@ -33,16 +33,11 @@ if (NGRAPH_CPU_ENABLE AND NOT APPLE)
    )
    if(NGRAPH_TBB_ENABLE)
-        list(APPEND HEADER_PATHS "TBB_HEADERS_PATH=\"${TBB_ROOT}/include\"")
+        list(APPEND HEADER_SEARCH_DEFINES "TBB_HEADERS_PATH=\"${TBB_ROOT}/include\"")
+        set(HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES} "NGRAPH_TBB_ENABLE")
    endif()
-    if(NGRAPH_TBB_ENABLE)
+    message("HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES}")
-        set(NGRAPH_TBB_OPTION "NGRAPH_TBB_ENABLE")
-    else()
-        set(NGRAPH_TBB_OPTION "")
-    endif()
-    message("HEADER_PATHS ${HEADER_PATHS}")
-    set_source_files_properties(main.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_PATHS};${NGRAPH_TBB_OPTION}")
+    set_source_files_properties(main.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_SEARCH_DEFINES}")
 endif()