Commit 6bf066d7 authored by Jaikrishnan Menon's avatar Jaikrishnan Menon

Merge branch 'master' into cpu_layout2

parents 18f7c5fb fd2bf807
......@@ -52,6 +52,9 @@ doc/source/generated
.cache/
nervana_aeon.egg-info/
# emacs
*~
# vim
*.swp
*.swo
......
......@@ -17,4 +17,4 @@ See our [install] docs for how to get started.
For this early release, we provide framework integration guides to compile
MXNet and TensorFlow-based projects.
[install]: doc/sphinx/source/installation.rst
[install]: http://ngraph.nervanasys.com/docs/cpp/installation.html
\ No newline at end of file
......@@ -2220,7 +2220,7 @@ div[class^='highlight'] pre {
background-color: #272525;
display: block;
text-align: right;
font-size: 90%;
font-size: 95%;
cursor: pointer;
color: #27AE60;
*zoom: 1;
......@@ -2488,8 +2488,8 @@ div[class^='highlight'] pre {
line-height: 1.0em;
}
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
font-size: 101% !important;
color: #72a1ab;
font-size: 100% !important;
color: #528481;
line-height: 0.91em;
}
.rst-content tt.xref, a .rst-content tt, .rst-content tt.xref, .rst-content code.xref, a .rst-content tt, a .rst-content code {
......
......@@ -2489,7 +2489,7 @@ div[class^='highlight'] pre {
}
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
font-size: 101% !important;
color: #72a1ab;
color: #528481;
line-height: 0.91em;
}
.rst-content tt.xref, a .rst-content tt, .rst-content tt.xref, .rst-content code.xref, a .rst-content tt, a .rst-content code {
......
.. api.rst:
API
###
.. Don't add Python APIs that will break the build.
Sections
========
.. autodiff.rst
Autodiff
########
The ``autodiff`` ...
.. TODO update for cpp
:orphan:
.. glossary:
Glossary
......
......@@ -3,8 +3,22 @@
Graph Basics
============
This section describes the basic concepts you need to know when constructing
a graph.
This section describes the basic concepts you need to know when
constructing a graph.
Framework Bridges
------------------
Frontends (or users who require the flexibility of constructing
Ops directly) can utilize a set of graph construction functions
to construct graphs.
A framework bridge constructs a function which is compiled/optimized
by a sequence of graph transformations that replace subgraphs of the
computation with more optimal subgraphs. Throughout this process, ops
represent tensor operations.
Tensors
-------
......@@ -150,38 +164,3 @@ After the graph is constructed, we create the function, passing the
`Function` constructor the nodes that are results and the parameters
that are arguments.
Defining ops
============
A framework bridge constructs a function which is compiled/optimized
by a sequence of graph transformations that replace subgraphs of the
computation with more optimal subgraphs. Throughout this process, ops
represent tensor operations.
*Core ops* are ops that are available and generally useful to all
framework bridges and that can be compiled by all transformers. A
framework bridge may define framework-specific ops to simplify graph
construction, provided that the bridge can enable every transformer to
replace all such ops with equivalent subgraphs composed of core
ops. Similary, transformers may define transformer-specific ops to
represent kernels or other intermediate operations. If a framework
supports extending the set of ops it offers, a bridge may even expose
transformer-specific ops to the framework user.
It is easiest to define a new op by adapting an existing op. Some of
the tasks that must be performed are:
- Op constructor:
* Checking type-consistency of arguments
* Specifying the result type for a call
- Serializer/Deserializer
- Transformer handlers:
* Interpreter (reference) implementation of behavior. The
implementation should favor clarity over efficiency.
......@@ -22,6 +22,9 @@ of :abbr:`Deep Learning (DL)` (DL) systems. Here you will find a suite of
components, APIs, and documentation that can be used to compile and run
:abbr:`Deep Neural Network (DNN)` (DNN) models defined in a variety of frameworks.
.. figure:: graphics/ngraph-hub.png
For this early release, we provide :doc:`framework-integration-guides` to compile
and run MXNet and TensorFlow-based projects.
......@@ -32,54 +35,26 @@ Architecture CPUs (CPU), the Intel® Nervana Neural Network Processor™ (NNP),
and NVIDIA\* GPUs. Currently-supported compiler optimizations include efficient
memory management and data layout abstraction.
Further overview details can be found on our :doc:`about` page.
Further project details can be found on our :doc:`project/about` page.
=======
Sections
=========
.. toctree::
:maxdepth: 1
:caption: Table Of Contents
:name: tocmaster
:caption: Table of Contents
installation.rst
testing-libngraph.rst
framework-integration-guides.rst
graph-basics.rst
.. toctree::
:maxdepth: 1
:caption: Algorithms
:name:
.. toctree::
:maxdepth: 1
:caption: Reference API
api.rst
autodiff.rst
glossary.rst
.. toctree::
:maxdepth: 1
:caption: Ops
ops/abs.rst
ops/convolution.rst
.. toctree::
:maxdepth: 1
:caption: Project Docs
about.rst
release-notes.rst
code-contributor-README.rst
.. toctree::
:maxdepth: 0
:hidden:
branding-notice.rst
doc-contributor-README.rst
ops/index.rst
project/index.rst
Indices and tables
......@@ -88,3 +63,4 @@ Indices and tables
* :ref:`search`
* :ref:`genindex`
......@@ -4,23 +4,32 @@
Abs
###
Description
===========
Elementwise absolute value operation.
Produces a single output tensor of the same element type and shape as the input,
where the value at each coordinate of the output is the absoloute value of the
value at each input coordinate.
Produces a single output tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the absoloute value of the
value at each ``arg`` coordinate.
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Input Name | Element Type | Shape |
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``input`` | Any | Any |
| ``arg`` | Any | Any |
+-----------------+-------------------------+--------------------------------+
+------------------+-------------------------+----------------------------------------------------+
| Output Name | Element Type | Shape |
+==================+=========================+====================================================+
| ``output`` | Same as ``input`` | Same as input. |
+------------------+-------------------------+----------------------------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | Same as ``arg`` | Same as ``arg``. |
+-----------------+-------------------------+--------------------------------+
Mathematical Definition
......@@ -28,14 +37,15 @@ Mathematical Definition
.. math::
output_{i_0, \ldots, i_{n-1}} = \mathrm{abs}(input_{i_0, \ldots, i_{n-1}})
\mathtt{output}_{i_0, \ldots, i_{n-1}} = \left|\mathtt{arg}_{i_0,
\ldots, i_{n-1}}\right|
Backprop
========
.. math::
\overline{input} \leftarrow \mathrm{sgn}(input)\Delta
\overline{\texttt{arg}} \leftarrow \Delta\ \mathrm{sgn}(\texttt{arg})
C++ Interface
......@@ -43,8 +53,3 @@ C++ Interface
.. doxygenclass:: ngraph::op::Abs
:members:
Python Interface
================
is not merged yet, but could go here!
.. acos.rst:
####
Acos
####
Description
===========
Elementwise acos operation.
Produces a tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the inverse cosine of the
value at the corresponding coordinate of ``arg`` .
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``arg`` | Any | Any |
+-----------------+-------------------------+--------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | Same as ``arg`` | Same as ``arg``. |
+-----------------+-------------------------+--------------------------------+
Mathematical Definition
=======================
.. math::
\texttt{output}_{i_0, \ldots, i_{n-1}} = \cos^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
Backprop
========
.. math::
\overline{\texttt{arg}} \leftarrow -\frac{\Delta}{\sqrt{1-\texttt{arg}^2}}
C++ Interface
=============
.. doxygenclass:: ngraph::op::Acos
:members:
.. add.rst:
###
Add
###
Description
===========
Elementwise add operation.
Produces tensor of the same element type and shape as the two inputs,
where the value at each coordinate of ``output`` is the sum of the
value at the corresponding input coordinates.
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``arg0`` | any | any |
+-----------------+-------------------------+--------------------------------+
| ``arg1`` | same as ``arg0`` | same as ``arg0`` |
+-----------------+-------------------------+--------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | same as ``arg0`` | same as ``arg0`` |
+-----------------+-------------------------+--------------------------------+
Mathematical Definition
=======================
.. math::
\texttt{output}_{i_0, \ldots, i_{n-1}} = \texttt{arg0}_{i_0, \ldots, i_{n-1}} + \texttt{arg1}_{i_0, \ldots, i_{n-1}}
Backprop
========
.. math::
\overline{\texttt{arg0}} &\leftarrow \Delta \\
\overline{\texttt{arg1}} &\leftarrow \Delta
C++ Interface
=============
.. doxygenclass:: ngraph::op::Add
:members:
.. asin.rst:
####
Asin
####
Description
===========
Elementwise asin operation.
Produces a tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the inverse sine of the
value at the corresponding coordinate of ``arg`` .
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``arg`` | Any | Any |
+-----------------+-------------------------+--------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | Same as ``arg`` | Same as ``arg``. |
+-----------------+-------------------------+--------------------------------+
Mathematical Definition
=======================
.. math::
\texttt{output}_{i_0, \ldots, i_{n-1}} = \sin^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
Backprop
========
.. math::
\overline{\texttt{arg}} \leftarrow \frac{\Delta}{\sqrt{1-\texttt{arg}^2}}
C++ Interface
=============
.. doxygenclass:: ngraph::op::Asin
:members:
.. atan.rst:
####
Atan
####
Description
===========
Elementwise atan operation.
Produces a tensor of the same element type and shape as ``arg``,
where the value at each coordinate of ``output`` is the inverse tangent of the
value at the corresponding coordinate of ``arg`` .
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``arg`` | Any | Any |
+-----------------+-------------------------+--------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | Same as ``arg`` | Same as ``arg``. |
+-----------------+-------------------------+--------------------------------+
Mathematical Definition
=======================
.. math::
\texttt{output}_{i_0, \ldots, i_{n-1}} = \tan^{-1}(\texttt{arg}_{i_0, \ldots, i_{n-1}})
Backprop
========
.. math::
\overline{\texttt{arg}} \leftarrow \frac{\Delta}{1+\texttt{arg}^2}
C++ Interface
=============
.. doxygenclass:: ngraph::op::Atan
:members:
.. avg_pool.rst:
#######
AvgPool
#######
Description
===========
Average Pooling operation.
Average pooling windows its input and produces an average for each window.
Inputs
------
+-----------------+----------------+--------------------------------+--------------------+
| Name | Element Type | Shape | Notes |
+=================+================+================================+====================+
| ``data`` | Any | :math:`(N,C,d_1,\ldots,d_n)` | :math:`n>0, d_i>0` |
+-----------------+----------------+--------------------------------+--------------------+
Attributes
----------
+----------------------+-----------------+----------------------------------+
| Name | Type | Notes |
+======================+=================+==================================+
| ``w`` | ``Shape[n]`` | Window shape. :math:`w_i\le d_i` |
+----------------------+-----------------+----------------------------------+
| ``s`` | ``Strides[n]`` | Window strides. |
+----------------------+-----------------+----------------------------------+
| ``p`` | ``Shape[n]`` | Padding below. |
+----------------------+-----------------+----------------------------------+
| ``q`` | ``Shape[n]`` | Padding above. |
+----------------------+-----------------+----------------------------------+
Outputs
-------
+-----------------+-------------------------+--------------------------------+
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``output`` | Any | :math:`(N,C,d'_1,\ldots,d'_n)` |
+-----------------+-------------------------+--------------------------------+
Average pooling takes as its input a batch tensor `data` of shape
:math:`(N,C,d_1,\ldots,d_n)` where where :math:`N` is the batch
size, and :math:`C > 0` is the
number of channels (sometimes called features). The dimensions
:math:`(d_1,\ldots,d_n)` correspond to the shape of an
:math:`n`-dimensional data item in a batch. For example, where
:math:`n=2`, the data may represent a two-dimensional image. It also
takes four attributes:
1. *window shape*,
2. *window movement strides*, (optional)
3. *padding below*, (optional)
4. *padding above*, (optional).
The shape of `output` is :math:`(N,C,d'_1,\ldots,d'_n)`, where
:math:`d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil`.
*In the absence of padding*, given an input data batch tensor
:math:`T_\textit{in}`, the output tensor is defined by the equation
.. math::
T_\textit{out}[a,c,i_1,\ldots,i_n] =
\frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
T_\textit{in}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
*In the presence of padding*, we do not always want to divide by a
reciprocal equal to the number of elements in the window, since some
of the output points are determined by a window that is partly hanging
beyond the edge of the tensor. In this case we can define the output
via a few intermediate steps.
First define the *sum tensor* :math:`T_\textit{sum}`, with shape
:math:`(N,C,d'_1,\ldots,d'_n)`, as follows.
.. math::
T_\textit{sum}[a,c,i_1,\ldots,i_n] =
\frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
\textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
where
.. math::
\textit{val}[a,c,j_1,\ldots,j_n] =
\begin{cases}
T_\textit{in}[a,c,j_1,\ldots,j_n]&\text{if for all } k, p_k \le j_k < p_k + d_k\\
0&\text{otherwise}.
\end{cases}
Second, define the *divisor tensor* :math:`T_\textit{div}`, with shape :math:`(N,C,d'_1,\ldots,d'_n)`, as follows.
.. math::
T_\textit{div}[a,c,i_1,\ldots,i_n] =
\frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}
\textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}}
where
.. math::
\textit{val}[a,c,j_1,\ldots,j_n] =
\begin{cases}
1&\text{if for all }k, p_k \le j_k < p_k + d_k\\
0&\text{otherwise}.
\end{cases}
Finally, define :math:`T_\textit{out}` as the result of elementwise
dividing :math:`T_\textit{sum}` by :math:`T_\textit{div}`. Note that
at positions where :math:`T_\textit{div}` is zero, values may be
infinity or nan. (This corresponds to a condition where the pooling
window is completely out of bounds, encompassing no valid values.)
Backprop
========
C++ Interface
=============
.. doxygenclass:: ngraph::op::AvgPool
:members:
.. avg_pool_backprop.rst:
###############
AvgPoolBackprop
###############
Average Pooling backprop operation.
C++ Interface
=============
.. doxygenclass:: ngraph::op::AvgPoolBackprop
:members:
Python Interface
================
is not merged yet, but could go here!
......@@ -4,21 +4,50 @@
Convolution
###########
Description
===========
A batched convolution operation.
Basic Operation
===============
Inputs
------
+-----------------+-------------------------+--------------------------------+
| Input Name | Element Type | Shape |
| Name | Element Type | Shape |
+=================+=========================+================================+
| ``image_batch`` | Any | ``(N, C_in, d_1, ..., d_n)`` |
+-----------------+-------------------------+--------------------------------+
| ``filters`` | Same as ``image_batch`` | ``(N, C_in, df_1, ..., df_n)`` |
+-----------------+-------------------------+--------------------------------+
Attributes
----------
+-----------------------------+-----------------------------+---------------------------------------+
| Name | Type | Notes |
+=============================+=============================+=======================================+
| ``window_movement_strides`` | ``Strides[n]`` | How far to slide the window along |
| | | each axis at each step. |
+-----------------------------+-----------------------------+---------------------------------------+
| ``window_dilation_strides`` | ``Strides[n]`` | Per-axis dilation to apply to the |
| | | filters. |
+-----------------------------+-----------------------------+---------------------------------------+
| ``padding_below`` | ``Shape[n]`` | How many padding elements to add |
| | | below the 0-coordinate on each axis. |
+-----------------------------+-----------------------------+---------------------------------------+
| ``padding_above`` | ``Shape[n]`` | How manny padding elements to add |
| | | above the max-coordinate on each axis.|
+-----------------------------+-----------------------------+---------------------------------------+
| ``image_dilation_strides`` | ``Strides[n]`` | Per-axis dilation to apply to the |
| | | image batch. |
+-----------------------------+-----------------------------+---------------------------------------+
Outputs
-------
+------------------+-------------------------+----------------------------------------------------+
| Output Name | Element Type | Shape |
| Name | Element Type | Shape |
+==================+=========================+====================================================+
| ``features_out`` | Same as ``image_batch`` | ``(N, C_in, d_1 - df_1 + 1, ..., d_n - df_n + 1)`` |
+------------------+-------------------------+----------------------------------------------------+
......@@ -27,38 +56,6 @@ It must be the case that after dilation and padding are applied, the filter fits
.. TODO image add
Window Parameters
=================
+-----------------------------+-----------------------------+------------------------------------+
| Parameter Name | Type | Meaning |
+=============================+=============================+====================================+
| ``window_movement_strides`` | ``Strides`` of length ``n`` | How far to slide the window along |
| | | each axis at each step. |
+-----------------------------+ +------------------------------------+
| ``window_dilation_strides`` | | Per-axis dilation to apply to the |
| | | filters. |
+-----------------------------+-----------------------------+------------------------------------+
.. TODO: pictorial example of the effect of window movement stride.
.. TODO: pictorial example of window before and after dilation.
Image Batch Parameters
======================
+----------------------------+-----------------------------+---------------------------------------+
| Parameter Name | Type | Meaning |
+============================+=============================+=======================================+
| ``padding_below`` | ``Padding`` of length ``n`` | How many padding elements to add |
| | | below the 0-coordinate on each axis. |
+----------------------------+ +---------------------------------------+
| ``padding_above`` | | How manny padding elements to add |
| | | above the max-coordinate on each axis.|
+----------------------------+-----------------------------+---------------------------------------+
| ``image_dilation_strides`` | ``Strides`` of length ``n`` | Per-axis dilation to apply to the |
| | | image batch. |
+----------------------------+-----------------------------+---------------------------------------+
Mathematical Definition
=======================
......
.. ops/index.rst
Core Ops
========
An ``Op``'s primary role is to function as a node in a directed acyclic graph
dependency computation graph.
*Core ops* are ops that are available and generally useful to all framework
bridges and that can be compiled by all transformers. A framework bridge may
define framework-specific ops to simplify graph construction, provided that the
bridge can enable every transformer to replace all such ops with equivalent
subgraphs composed of core ops. Similary, transformers may define
transformer-specific ops to represent kernels or other intermediate operations.
If a framework supports extending the set of ops it offers, a bridge may even
expose transformer-specific ops to the framework user.
Our design philosophy is that the graph is not a script for running kernels;
rather, our compilation will match ``ops`` to appropriate kernels for the
backend(s) in use. Thus, we expect that adding of new Core ops should be
infrequent and that most functionality instead gets added with new functions
that build sub-graphs from existing core ops.
It is easiest to define a new op by adapting an existing op. Some of the tasks
that must be performed are:
- Op constructor:
* Checking type-consistency of arguments
* Specifying the result type for a call
- Serializer/Deserializer
- Transformer handlers:
* Interpreter (reference) implementation of behavior. The
implementation should favor clarity over efficiency.
Alphabetical list of Core ``ops``
----------------------------------
Not currently a comprehensive list.
.. toctree::
:maxdepth: 1
abs.rst
acos.rst
add.rst
asin.rst
atan.rst
avg_pool.rst
avg_pool_backprop.rst
convolution.rst
......@@ -8,6 +8,8 @@ of :abbr:`Deep Learning (DL)` (DL) systems. Here you will find a suite of
components, APIs, and documentation that can be used to compile and run
:abbr:`Deep Neural Network (DNN)` models defined in a variety of frameworks.
.. figure:: ../graphics/ngraph-hub.png
The nGraph library translates a framework’s representation of computations into
an :abbr:`Intermediate Representation (IR)` designed to promote computational
efficiency on target hardware. Initially-supported backends include Intel
......@@ -15,8 +17,6 @@ Architecture CPUs, the Intel® Nervana Neural Network Processor™ (NNP),
and NVIDIA\* GPUs. Currently-supported compiler optimizations include efficient
memory management and data layout abstraction.
.. figure:: graphics/fig.jpeg
The *nGraph core* uses a strongly-typed and platform-neutral stateless graph
representation for computations. Each node, or *op*, in the graph corresponds
to one step in a computation, where each step produces zero or more tensor
......
:orphan:
.. branding-notice:
Branding Notice
===============
The Intel® nGraph™ library is an open source project providing code and component
reference for many kinds of machine learning, deep learning, and DNN applications.
Documentation may include references to frontend frameworks, modules, extensions,
or other libraries that may be wholly or partially open source, or that may be
claimed as the property of others.
Intel nGraph library core documentation
---------------------------------------
.. note:: The branding notice below applies to code and documentation
contributions intended to be added directly to Intel nGraph library core.
Use the first or most prominent usage with symbols as described below.
Subsequent references on the same document, or on a file with an
already-present prominent form (such as Sphinx\* documentation sidebars),
may be done as an abbreviated form (sub-bullet items) and/or without the
repeated use of the trademark / branding symbols.
* Intel® Nervana™ Neural Network Processor
* Intel® Nervana™ NNP
* Intel® Xeon Phi™ (CPU processor)
* Intel® Xeon® (CPU processor)
* Intel® nGraph™
* Intel® nGraph™ library
* nGraph library
* ``ngraph`` API
* ``ngraph`` library
* ``ngraph`` backend
* nGraph abstraction layer
* neon™ frontend framework
* Intel® Math Kernel Library
* Intel® MKL
* Intel® Math Kernel Library for Deep Neural Networks
* Intel® MKL-DNN
* Intel® Nervana™ Graph (deprecated)
......@@ -56,14 +56,14 @@ source file (``.rst``):
::
.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
:language: cpp
:lines: 20-31
And the raw code will render as follows
.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
:language: cpp
:lines: 20-31
......@@ -86,7 +86,7 @@ line numbers, and add a caption "One way to define neon axes within the dqn_atar
::
.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
:language: cpp
:lines: 20-31
:caption:
......@@ -94,7 +94,7 @@ line numbers, and add a caption "One way to define neon axes within the dqn_atar
and the generated output will show readers of your helpful documentation
.. literalinclude:: ../../../src/ngraph/descriptor/primary_tensor_view.cpp
.. literalinclude:: ../../../../src/ngraph/descriptor/primary_tensor_view.cpp
:language: cpp
:lines: 20-31
:caption:
......
.. project/index.rst
Project Docs
============
This section contains documentation about the project and how to contribute.
.. toctree::
:maxdepth: 1
about.rst
release-notes.rst
code-contributor-README.rst
doc-contributor-README.rst
../glossary.rst
......@@ -28,10 +28,11 @@ After building and installing the nGraph library to your system, the next
logical step is to compile a framework that you can use to run a
training/inference model with one of the backends that are now enabled.
For this early |release| release, we're providing integration guides for:
For this early |release| release, we're providing :doc:`framework-integration-guides`,
for:
* `MXNet`_,
* `TensorFlow`_, and
* :doc:`framework-integration-guides` framework,
* :doc:`framework-integration-guides` framework, and
* neon™ `frontend framework`_.
Integration guides for other frameworks are tentatively forthcoming.
......
......@@ -172,6 +172,7 @@ if (NGRAPH_CPU_ENABLE AND LLVM_INCLUDE_DIR AND
runtime/cpu/cpu_tensor_view.cpp
runtime/cpu/cpu_tensor_view_wrapper.cpp
runtime/cpu/cpu_layout_descriptor.cpp
runtime/cpu/cpu_tracing.cpp
runtime/cpu/mkldnn_utils.cpp
runtime/cpu/ops/convert_layout.cpp
runtime/cpu/ops/matmul_bias.cpp
......@@ -182,14 +183,23 @@ if (NGRAPH_CPU_ENABLE AND LLVM_INCLUDE_DIR AND
# The built-in headers are in a version-specific directory
# This must be kept in sync with the LLVM + Clang version in use
set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_FLAGS "-fno-rtti")
set(HEADER_SEARCH_DEFINES
"EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\""
"MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\""
"CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\""
"NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\""
"INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\""
)
if (NGRAPH_TBB_ENABLE)
set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS
"EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\";MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\";CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\";TBB_HEADERS_PATH=\"${TBB_ROOT}/include\";NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\";INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\";NGRAPH_TBB_ENABLE;")
set_source_files_properties(runtime/cpu/cpu_external_function.cpp PROPERTIES COMPILE_DEFINITIONS "NGRAPH_TBB_ENABLE")
else()
set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS
"EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\";MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\";CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\";NGRAPH_HEADERS_PATH=\"${NGRAPH_INCLUDE_PATH}\";INSTALLED_HEADERS_PATH=\"${CMAKE_INSTALL_PREFIX}/include\";")
set(HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES}
"TBB_HEADERS_PATH=\"${TBB_ROOT}/include\""
"NGRAPH_TBB_ENABLE"
)
endif()
set_source_files_properties(codegen/compiler.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_SEARCH_DEFINES}")
set(NGRAPH_CPU_DEBUGINFO_ENABLE 0 CACHE STRING "Enable debuginfo in the CPU backend")
# GPU backend current requires CPU because they share compiler.cpp,
......
......@@ -31,10 +31,11 @@ namespace ngraph
public:
/// \brief Constructs an absolute value operation.
///
/// Output `[d1, ...]`
///
/// \param arg Node that produces the input tensor.<br>
/// `[d1, ...]`
///
/// Output `[d1, ...]`
///
Abs(const std::shared_ptr<Node>& arg)
: UnaryElementwiseArithmetic("Abs", arg)
{
......
......@@ -26,23 +26,16 @@ namespace ngraph
{
/// \brief Elementwise inverse cosine (arccos) operation.
///
/// ## Inputs
///
/// | | Type | Description |
/// | ----- | --------------------------------- | ----------------------------------------------- |
/// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
///
/// ## Output
///
/// | Type | Description |
/// | ---------------------- | --------------------------------------------------------------------------------------- |
/// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arccos(\texttt{arg}[i_1,\dots,i_n])\f$ |
class Acos : public UnaryElementwiseArithmetic
{
public:
/// \brief Constructs an arccos operation.
///
/// \param arg Node that produces the input tensor.
/// \param arg Node that produces the input tensor.<br>
/// `[d1, ...]`
///
/// Output `[d1, ...]`
///
Acos(const std::shared_ptr<Node>& arg)
: UnaryElementwiseArithmetic("Acos", arg)
{
......
......@@ -26,25 +26,18 @@ namespace ngraph
{
/// \brief Elementwise addition operation.
///
/// ## Inputs
///
/// | | Type | Description |
/// | ------ | --------------------------------- | ------------------------------------------------------ |
/// | `arg0` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
/// | `arg1` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of the same shape and element type as `arg0`. |
///
/// ## Output
///
/// | Type | Description |
/// | ---------------------- | -------------------------------------------------------------------------------------------------------------- |
/// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \texttt{arg0}[i_1,\dots,i_n] + \texttt{arg1}[i_1,\dots,i_n]\f$ |
class Add : public BinaryElementwiseArithmetic
{
public:
/// \brief Constructs an addition operation.
///
/// \param arg0 Node that produces the first input tensor.
/// \param arg1 Node that produces the second input tensor.
/// \param arg0 Node that produces the first input tensor.<br>
/// `[d0, ...]`
/// \param arg1 Node that produces the second input tensor.<br>
/// `[d0, ...]`
///
/// Output `[d0, ...]`
///
Add(const std::shared_ptr<Node>& arg0, const std::shared_ptr<Node>& arg1)
: BinaryElementwiseArithmetic("Add", arg0, arg1)
{
......
......@@ -26,23 +26,16 @@ namespace ngraph
{
/// \brief Elementwise inverse sine (arcsin) operation.
///
/// ## Inputs
///
/// | | Type | Description |
/// | ----- | --------------------------------- | ----------------------------------------------- |
/// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
///
/// ## Output
///
/// | Type | Description |
/// | ---------------------- | --------------------------------------------------------------------------------------- |
/// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arcsin(\texttt{arg}[i_1,\dots,i_n])\f$ |
class Asin : public UnaryElementwiseArithmetic
{
public:
/// \brief Constructs an arcsin operation.
///
/// \param arg Node that produces the input tensor.
/// \param arg Node that produces the input tensor.<br>
/// `[d1, ...]`
///
/// Output `[d1, ...]`
///
Asin(const std::shared_ptr<Node>& arg)
: UnaryElementwiseArithmetic("Asin", arg)
{
......
......@@ -26,23 +26,16 @@ namespace ngraph
{
/// \brief Elementwise inverse tangent (arctan) operation.
///
/// ## Inputs
///
/// | | Type | Description |
/// | ----- | --------------------------------- | ----------------------------------------------- |
/// | `arg` | \f$N[d_1,\dots,d_n]~(n \geq 0)\f$ | A tensor of any shape and numeric element type. |
///
/// ## Output
///
/// | Type | Description |
/// | ---------------------- | --------------------------------------------------------------------------------------- |
/// | \f$N[d_1,\dots,d_n]\f$ | The tensor \f$T\f$, where \f$T[i_1,\dots,i_n] = \arctan(\texttt{arg}[i_1,\dots,i_n])\f$ |
class Atan : public UnaryElementwiseArithmetic
{
public:
/// \brief Constructs an arctan operation.
///
/// \param arg Node that produces the input tensor.
/// \param arg Node that produces the input tensor.<br>
/// `[d1, ...]`
///
/// Output `[d1, ...]`
///
Atan(const std::shared_ptr<Node>& arg)
: UnaryElementwiseArithmetic("Atan", arg)
{
......
......@@ -24,55 +24,21 @@ namespace ngraph
{
/// \brief Batched average pooling operation, with optional padding and window stride.
///
/// Average pooling takes as its input an data batch tensor of shape \f$(N,C,d_1,\dots,d_n)\f$ where \f$n > 0\f$, every \f$d_i > 0\f$, and where \f$N\f$ is
/// the batch size, and \f$C > 0\f$ is the number of channels (sometimes called features). The dimensions \f$(d_1,\dots,d_n)\f$ correspond to the shape of
/// an \f$n\f$-dimensional data item in a batch. For example, where \f$n=2\f$, the data may represent a two-dimensional image. It also takes four parameters:
///
/// 1. <i>(the window shape)</i> a size vector \f$(w_1,\dots,w_n)\f$ where every \f$w_i \le d_i\f$; and
/// 2. <i>(the window movement strides, optional)</i> a vector of positive integers \f$(s_1,\dots,s_n)\f$.
/// 3. <i>(the padding below, optional)</i> a vector of positive integers \f$(p_1,\dots,p_n)\f$.
/// 4. <i>(the padding above, optional)</i> a vector of positive integers \f$(q_1,\dots,q_n)\f$.
///
/// The output has the shape \f$(N,C,d'_1,\dots,d'_n)\f$, where \f$d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil\f$.
///
/// *In the absence of padding*, given an input data batch tensor \f$T_\textit{in}\f$, the output tensor is defined by the equation
///
/// \f[
/// T_\textit{out}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} T_\textit{in}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
/// \f]
///
/// *In the presence of padding*, we do not always want to divide by a reciprocal equal to the number of elements in the window, since some of the output points are
/// determined by a window that is partly hanging beyond the edge of the tensor. In this case we can define the output via a few intermediate steps.
///
/// First define the <i>sum tensor</i> \f$T_\textit{sum}\f$, with shape \f$(N,C,d'_1,\dots,d'_n)\f$, as follows.
///
/// \f[
/// T_\textit{sum}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
/// \f]
///
/// where \f$\textit{val}[a,c,j_1,\dots,j_n] = T_\textit{in}[a,c,j_1,\dots,j_n]\f$ if for all \f$k\f$, \f$p_k \le j_k < p_k + d_k\f$; else \f$0\f$.
///
/// Second, define the <i>divisor tensor</i> \f$T_\textit{div}\f$, with shape \f$(N,C,d'_1,\dots,d'_n)\f$, as follows.
///
/// \f[
/// T_\textit{div}[a,c,i_1,\dots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \dots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \dots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\dots,j_n]}{\prod_{i=1}^n{w_n}}
/// \f]
///
/// where \f$\textit{val}[a,c,j_1,\dots,j_n] = 1\f$ if for all \f$k\f$, \f$p_k \le j_k < p_k + d_k\f$; else \f$0\f$.
///
/// Finally, define \f$T_\textit{out}\f$ as the result of elementwise dividing \f$T_\textit{sum}\f$ by \f$T_\textit{div}\f$.
/// Note that at positions where \f$T_\textit{div}\f$ is zero, values may be infinity or nan. (This corresponds to a condition where the pooling window is completely
/// out of bounds, encompassing no valid values.)
class AvgPool : public RequiresTensorViewArgs
{
public:
/// \brief Constructs a batched average pooling operation.
///
/// \param arg The node producing the input data batch tensor.
/// \param window_shape The window shape.
/// \param window_movement_strides The window movement strides.
/// \param padding_below The below-padding shape.
/// \param padding_above The above-padding shape.
/// \param arg The node producing the input data batch tensor.<br>
/// `[d1, dn]`
/// \param window_shape The window shape.<br>
/// `[n]`
/// \param window_movement_strides The window movement strides.<br>
/// `[n]`
/// \param padding_below The below-padding shape.<br>
/// `[n]`
/// \param padding_above The above-padding shape.<br>
/// `[n]`
AvgPool(const std::shared_ptr<Node>& arg,
const Shape& window_shape,
const Strides& window_movement_strides,
......@@ -81,17 +47,22 @@ namespace ngraph
/// \brief Constructs a batched, unpadded average pooling operation (i.e., all padding shapes are set to 0).
///
/// \param arg The node producing the input data batch tensor.
/// \param window_shape The window shape.
/// \param window_movement_strides The window movement strides.
/// \param arg The node producing the input data batch tensor.<br>
/// `[d1, ..., dn]`
/// \param window_shape The window shape.<br>
/// `[n]`
/// \param window_movement_strides The window movement strides.<br>
/// `[n]`
AvgPool(const std::shared_ptr<Node>& arg,
const Shape& window_shape,
const Strides& window_movement_strides);
/// \brief Constructs an unstrided batched convolution operation (i.e., all window movement strides are 1 and all padding shapes are set to 0).
///
/// \param arg The node producing the input data batch tensor.
/// \param window_shape The window shape.
/// \param arg The node producing the input data batch tensor.<br>
/// `[d1, ..., dn]`
/// \param window_shape The window shape.<br>
/// `[n]`
AvgPool(const std::shared_ptr<Node>& arg, const Shape& window_shape);
virtual std::shared_ptr<Node> copy_with_new_args(
......
......@@ -29,8 +29,6 @@ namespace ngraph
public:
/// \brief Constructs a batched convolution operation.
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
/// \param data_batch The node producing the input data batch tensor.<br>
/// `[N, C_IN, D1, ... Df]`
/// \param filters The node producing the filters tensor.<br>
......@@ -45,6 +43,9 @@ namespace ngraph
/// `[f]`
/// \param data_dilation_strides The data dilation strides.<br>
/// `[f]`
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
Convolution(const std::shared_ptr<Node>& data_batch,
const std::shared_ptr<Node>& filters,
const Strides& window_movement_strides,
......@@ -67,6 +68,9 @@ namespace ngraph
/// `[f]`
/// \param padding_above The padding-above sizes.<br>
/// `[f]`
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
Convolution(const std::shared_ptr<Node>& data_batch,
const std::shared_ptr<Node>& filters,
const Strides& window_movement_strides,
......@@ -84,6 +88,9 @@ namespace ngraph
/// `[f]`
/// \param window_dilation_strides The window dilation strides.<br>
/// `[f]`
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
Convolution(const std::shared_ptr<Node>& data_batch,
const std::shared_ptr<Node>& filters,
const Strides& window_movement_strides,
......@@ -97,6 +104,9 @@ namespace ngraph
/// `[C_OUT, C_IN, F1, ... Ff]`
/// \param window_movement_strides The window movement strides.<br>
/// `[f]`
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
Convolution(const std::shared_ptr<Node>& data_batch,
const std::shared_ptr<Node>& filters,
const Strides& window_movement_strides);
......@@ -107,6 +117,9 @@ namespace ngraph
/// `[N, C_IN, D1, ... Df]`
/// \param filters The node producing the filters tensor.<br>
/// `[C_OUT, C_IN, F1, ... Ff]`
///
/// Output `[N, C_OUT, R1, ... Rf]`
///
Convolution(const std::shared_ptr<Node>& data_batch,
const std::shared_ptr<Node>& filters);
......
......@@ -19,6 +19,7 @@
#include "ngraph/runtime/cpu/cpu_call_frame.hpp"
#include "ngraph/runtime/cpu/cpu_external_function.hpp"
#include "ngraph/runtime/cpu/cpu_tensor_view.hpp"
#include "ngraph/runtime/cpu/cpu_tracing.hpp"
using namespace std;
using namespace ngraph;
......@@ -28,6 +29,12 @@ runtime::cpu::CPU_CallFrame::CPU_CallFrame(std::shared_ptr<CPU_ExternalFunction>
: m_external_function(external_function)
, m_compiled_function(compiled_function)
{
setup_runtime_context();
}
runtime::cpu::CPU_CallFrame::~CPU_CallFrame()
{
cleanup_runtime_context();
}
void runtime::cpu::CPU_CallFrame::tensor_call(
......@@ -54,7 +61,12 @@ void runtime::cpu::CPU_CallFrame::tensor_call(
}
// Invoke compiled computation
m_compiled_function(inputs.data(), outputs.data());
m_compiled_function(inputs.data(), outputs.data(), ctx);
if (runtime::cpu::IsTracingEnabled())
{
GenerateTimeline(m_external_function->get_op_attrs(), ctx->op_durations);
}
}
void runtime::cpu::CPU_CallFrame::call(
......@@ -116,3 +128,20 @@ vector<runtime::PerformanceCounter> runtime::cpu::CPU_CallFrame::get_performance
}
return rc;
}
void runtime::cpu::CPU_CallFrame::setup_runtime_context()
{
ctx = new CPURuntimeContext;
ctx->op_durations = nullptr;
if (runtime::cpu::IsTracingEnabled())
{
ctx->op_durations = new int64_t[m_external_function->get_op_attrs().size()];
}
}
void runtime::cpu::CPU_CallFrame::cleanup_runtime_context()
{
delete[] ctx->op_durations;
delete ctx;
}
......@@ -23,6 +23,7 @@
#include "ngraph/function.hpp"
#include "ngraph/runtime/call_frame.hpp"
#include "ngraph/runtime/cpu/cpu_layout_descriptor.hpp"
#include "ngraph/runtime/cpu/cpu_runtime_context.hpp"
#include "ngraph/runtime/tensor_view.hpp"
namespace ngraph
......@@ -36,7 +37,7 @@ namespace ngraph
class CPU_CallFrame;
class CPU_ExternalFunction;
using EntryPoint_t = void(void** inputs, void** outputs);
using EntryPoint_t = void(void** inputs, void** outputs, CPURuntimeContext* ctx);
using EntryPoint = std::function<EntryPoint_t>;
......@@ -46,6 +47,7 @@ namespace ngraph
public:
CPU_CallFrame(std::shared_ptr<CPU_ExternalFunction> external_function,
EntryPoint compiled_function);
~CPU_CallFrame();
/// @brief Invoke the function with values matching the signature of the function.
///
......@@ -65,9 +67,13 @@ namespace ngraph
std::vector<ngraph::runtime::PerformanceCounter>
get_performance_data() const override;
void setup_runtime_context();
void cleanup_runtime_context();
protected:
std::shared_ptr<CPU_ExternalFunction> m_external_function;
EntryPoint m_compiled_function;
CPURuntimeContext* ctx;
};
}
}
......
......@@ -1014,7 +1014,7 @@ void runtime::cpu::CPU_Emitter::EmitFunctionCall(
writer << "\n};\n";
writer << "\n";
writer << function->get_name() << "(args, out);\n";
writer << function->get_name() << "(args, out, ctx);\n";
}
writer.indent--;
writer << "}\n";
......@@ -1093,13 +1093,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
writer << "{ // " << n->get_name() << " 3\n";
writer.indent++;
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -1129,13 +1129,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
writer << "{ // " << n->get_name() << " 5\n";
writer.indent++;
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -1161,13 +1161,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
writer << "{ // " << n->get_name() << " 7\n";
writer.indent++;
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -1183,13 +1183,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
writer.indent++;
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -1211,13 +1211,13 @@ void runtime::cpu::CPU_Emitter::EmitReduce(codegen::CodeWriter& writer,
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -2194,13 +2194,13 @@ void runtime::cpu::CPU_Emitter::EmitReduceWindow(
writer.indent++;
string type = f_result_element_type.c_type_string();
writer << "auto f = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << reduction_function->get_name() << "(args, out);\n";
writer << reduction_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......@@ -2238,24 +2238,24 @@ void runtime::cpu::CPU_Emitter::EmitSelectAndScatter(
string type = n->get_output_element_type(0).c_type_string();
writer << "auto f_select = [](" << type << " x, " << type << " y) -> char\n{";
writer << "auto f_select = [&](" << type << " x, " << type << " y) -> char\n{";
writer.indent++;
writer << "\n";
writer << "char result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << selection_function->get_name() << "(args, out);\n";
writer << selection_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
writer << "auto f_scatter = [](" << type << " x, " << type << " y) -> " << type << "\n{";
writer << "auto f_scatter = [&](" << type << " x, " << type << " y) -> " << type << "\n{";
writer.indent++;
writer << "\n";
writer << type << " result;\n";
writer << "void* args[] = {&x, &y};\n";
writer << "void* out[] = {&result};\n";
writer << scatter_function->get_name() << "(args, out);\n";
writer << scatter_function->get_name() << "(args, out, ctx);\n";
writer << "return result;\n";
writer.indent--;
writer << "};\n";
......
......@@ -94,6 +94,7 @@
#include "ngraph/runtime/cpu/cpu_emitter.hpp"
#include "ngraph/runtime/cpu/cpu_external_function.hpp"
#include "ngraph/runtime/cpu/cpu_tensor_view.hpp"
#include "ngraph/runtime/cpu/cpu_tracing.hpp"
#include "ngraph/runtime/cpu/ops/matmul_bias.hpp"
#include "ngraph/runtime/cpu/pass/cpu_fusion.hpp"
#include "ngraph/runtime/cpu/pass/cpu_layout.hpp"
......@@ -265,6 +266,7 @@ void runtime::cpu::CPU_ExternalFunction::compile()
#include "ngraph/runtime/aligned_buffer.hpp"
#include "ngraph/runtime/cpu/cpu_eigen_utils.hpp"
#include "ngraph/runtime/cpu/cpu_kernels.hpp"
#include "ngraph/runtime/cpu/cpu_runtime_context.hpp"
#include "ngraph/runtime/kernel/avg_pool.hpp"
#include "ngraph/runtime/kernel/broadcast.hpp"
#include "ngraph/runtime/kernel/concat.hpp"
......@@ -402,7 +404,8 @@ using namespace ngraph::runtime;
writer << "// Declare all functions\n";
for (shared_ptr<Function> f : pass_manager.get_state().get_functions())
{
writer << "extern \"C\" void " << f->get_name() << "(void** inputs, void** outputs);\n";
writer << "extern \"C\" void " << f->get_name()
<< "(void** inputs, void** outputs, cpu::CPURuntimeContext* ctx);\n";
}
writer << "\n";
......@@ -481,7 +484,7 @@ using namespace ngraph::runtime;
}
writer << "extern \"C\" void " << current_function->get_name();
writer << "(void** inputs, void** outputs)\n";
writer << "(void** inputs, void** outputs, cpu::CPURuntimeContext* ctx)\n";
writer << "{\n";
writer.indent++;
......@@ -491,6 +494,13 @@ using namespace ngraph::runtime;
writer << "tbb::flow::graph G;\n\n";
}
// Execution tracing support
if (runtime::cpu::IsTracingEnabled() && current_function->get_name() == function_name)
{
writer << "cpu::Timestamp start_ts;\n"
<< "int profiler_count = 0;\n\n";
}
bool temporaries_used = false;
size_t worst_case_tmp_size = 0;
for (shared_ptr<Node> node : current_function->get_ordered_ops())
......@@ -614,12 +624,14 @@ using namespace ngraph::runtime;
throw ngraph_error("Unhandled op during code generation : " + node->description());
}
vector<TensorViewWrapper> in;
vector<string> node_input_names, node_output_names;
for (const descriptor::Input& input : node->get_inputs())
{
const descriptor::Output& output = input.get_output();
shared_ptr<descriptor::TensorView> tv = output.get_tensor_view();
in.push_back(
TensorViewWrapper(tv, m_variable_name_map[tv->get_tensor().get_name()]));
node_input_names.emplace_back(tv->get_tensor().get_name());
}
vector<TensorViewWrapper> out;
for (const descriptor::Output& output : node->get_outputs())
......@@ -627,11 +639,17 @@ using namespace ngraph::runtime;
shared_ptr<descriptor::TensorView> tv = output.get_tensor_view();
out.push_back(
TensorViewWrapper(tv, m_variable_name_map[tv->get_tensor().get_name()]));
node_output_names.emplace_back(tv->get_tensor().get_name());
}
// Emit operation prologue
if (!node->is_parameter() && !node->is_constant())
{
if (current_function->get_name() == function_name)
{
m_op_attrs.emplace_back(
node->description(), node_output_names, node_input_names);
}
if (m_use_tbb)
{
writer << "tbb::flow::continue_node<tbb::flow::continue_msg> "
......@@ -644,6 +662,11 @@ using namespace ngraph::runtime;
{
emit_debug_function_entry(writer, node.get(), in, out);
}
if (runtime::cpu::IsTracingEnabled() &&
current_function->get_name() == function_name)
{
writer << "start_ts = cpu::Clock::now();\n";
}
}
// Emit operation body
......@@ -668,7 +691,7 @@ using namespace ngraph::runtime;
{
names.push_back(tv.get_name());
}
writer << func_name << "(" << join(names) << ");\n";
writer << func_name << "(" << join(names) << ", ctx);\n";
}
// Emit operation epilogue
......@@ -679,6 +702,13 @@ using namespace ngraph::runtime;
{
emit_debug_function_exit(writer, node.get(), in, out);
}
if (runtime::cpu::IsTracingEnabled() &&
current_function->get_name() == function_name)
{
writer << "ctx->op_durations[profiler_count++] = "
<< "(std::chrono::duration_cast<cpu::Timescale>(cpu::Clock::now() - "
"start_ts)).count();\n";
}
if (m_use_tbb)
{
writer.indent--;
......@@ -908,6 +938,7 @@ string runtime::cpu::CPU_ExternalFunction::emit_op_as_function(const Node& node,
writer << tvw.get_type() << "* " << tvw.get_name();
out.push_back(tvw);
}
writer << ",\ncpu::CPURuntimeContext* ctx";
writer.indent--;
writer << "\n)\n";
writer << "{\n";
......
......@@ -18,9 +18,11 @@
#include <functional>
#include <memory>
#include <string>
#include <typeindex>
#include <typeinfo>
#include <unordered_map>
#include <vector>
#include "ngraph/codegen/code_writer.hpp"
#include "ngraph/codegen/compiler.hpp"
......@@ -48,6 +50,21 @@ namespace ngraph
using OpMap = std::unordered_map<std::type_index, OpFunction>;
struct OpAttributes
{
std::string Description;
std::vector<std::string> Outputs;
std::vector<std::string> Inputs;
OpAttributes(const std::string& desc,
const std::vector<std::string>& outputs,
const std::vector<std::string>& inputs)
: Description(desc)
, Outputs(outputs)
, Inputs(inputs)
{
}
};
class CPU_ExternalFunction : public ngraph::runtime::ExternalFunction,
public std::enable_shared_from_this<CPU_ExternalFunction>
{
......@@ -61,6 +78,7 @@ namespace ngraph
const LayoutDescriptorPtrs& get_parameter_layout_descriptors();
const LayoutDescriptorPtrs& get_result_layout_descriptors();
const std::vector<OpAttributes>& get_op_attrs() const { return m_op_attrs; }
protected:
void compile();
......@@ -95,6 +113,7 @@ namespace ngraph
LayoutDescriptorPtrs parameter_layout_descriptors;
LayoutDescriptorPtrs result_layout_descriptors;
std::vector<OpAttributes> m_op_attrs;
};
}
}
......
// ----------------------------------------------------------------------------
// Copyright 2018 Nervana Systems Inc.
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// ----------------------------------------------------------------------------
#pragma once
#include <chrono>
#include <cstdint>
namespace ngraph
{
namespace runtime
{
namespace cpu
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::time_point<Clock> Timestamp;
typedef std::chrono::microseconds Timescale;
extern "C" {
struct CPURuntimeContext
{
int64_t* op_durations;
};
}
}
}
}
// ----------------------------------------------------------------------------
// Copyright 2018 Nervana Systems Inc.
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// ----------------------------------------------------------------------------
#include <fstream>
#include <map>
#include "cpu_tracing.hpp"
void ngraph::runtime::cpu::to_json(nlohmann::json& json, const TraceEvent& event)
{
std::map<std::string, std::string> args;
for (size_t i = 0; i < event.Inputs.size(); i++)
{
args["Input" + std::to_string(i + 1)] = event.Inputs[i];
}
for (size_t i = 0; i < event.Outputs.size(); i++)
{
args["Output" + std::to_string(i + 1)] = event.Outputs[i];
}
json = nlohmann::json{{"ph", event.Phase},
{"cat", event.Category},
{"name", event.Name},
{"pid", event.PID},
{"tid", event.TID},
{"ts", event.Timestamp},
{"dur", event.Duration},
{"args", args}};
}
void ngraph::runtime::cpu::GenerateTimeline(const std::vector<OpAttributes>& op_attrs,
int64_t* op_durations)
{
nlohmann::json timeline;
std::list<TraceEvent> trace;
std::ofstream out("timeline.json");
int64_t ts = 0;
for (size_t i = 0; i < op_attrs.size(); i++)
{
trace.emplace_back("X",
"Op",
op_attrs[i].Description,
0,
0,
ts,
op_durations[i],
op_attrs[i].Outputs,
op_attrs[i].Inputs);
ts += op_durations[i];
}
timeline["traceEvents"] = trace;
out << timeline;
out.close();
return;
}
bool ngraph::runtime::cpu::IsTracingEnabled()
{
return (std::getenv("NGRAPH_CPU_TRACING") != nullptr);
}
// ----------------------------------------------------------------------------
// Copyright 2018 Nervana Systems Inc.
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// ----------------------------------------------------------------------------
#pragma once
#include <cstdint>
#include <list>
#include <string>
#include <vector>
#include "ngraph/json.hpp"
#include "ngraph/runtime/cpu/cpu_external_function.hpp"
namespace ngraph
{
namespace runtime
{
namespace cpu
{
struct TraceEvent
{
// This should be a single character
// but the JSON encoder nlohmann::json
// is broken and doesn't handle character fields
std::string Phase;
std::string Category;
const std::string& Name;
unsigned int PID;
unsigned int TID;
int64_t Timestamp;
int64_t Duration;
const std::vector<std::string>& Outputs;
const std::vector<std::string>& Inputs;
TraceEvent(const std::string& ph,
const std::string& cat,
const std::string& name,
unsigned int pid,
unsigned int tid,
int64_t ts,
int64_t dur,
const std::vector<std::string>& outputs,
const std::vector<std::string>& inputs)
: Phase(ph)
, Category(cat)
, Name(name)
, PID(pid)
, TID(tid)
, Timestamp(ts)
, Duration(dur)
, Outputs(outputs)
, Inputs(inputs)
{
}
};
void to_json(nlohmann::json& json, const TraceEvent& event);
void GenerateTimeline(const std::vector<OpAttributes>& op_attrs, int64_t* op_durations);
bool IsTracingEnabled();
}
}
}
......@@ -25,7 +25,7 @@ if (NGRAPH_CPU_ENABLE AND NOT APPLE)
add_executable(resource_generator ${SRC})
add_dependencies(resource_generator ext_llvm eigen ext_mkldnn)
set(HEADER_PATHS
set(HEADER_SEARCH_DEFINES
"EIGEN_HEADERS_PATH=\"${EIGEN_INCLUDE_DIR}\""
"MKLDNN_HEADERS_PATH=\"${MKLDNN_INCLUDE_DIR}\""
"CLANG_BUILTIN_HEADERS_PATH=\"${LLVM_LIB_DIR}/clang/5.0.1/include\""
......@@ -33,16 +33,11 @@ if (NGRAPH_CPU_ENABLE AND NOT APPLE)
)
if(NGRAPH_TBB_ENABLE)
list(APPEND HEADER_PATHS "TBB_HEADERS_PATH=\"${TBB_ROOT}/include\"")
list(APPEND HEADER_SEARCH_DEFINES "TBB_HEADERS_PATH=\"${TBB_ROOT}/include\"")
set(HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES} "NGRAPH_TBB_ENABLE")
endif()
if(NGRAPH_TBB_ENABLE)
set(NGRAPH_TBB_OPTION "NGRAPH_TBB_ENABLE")
else()
set(NGRAPH_TBB_OPTION "")
endif()
message("HEADER_PATHS ${HEADER_PATHS}")
message("HEADER_SEARCH_DEFINES ${HEADER_SEARCH_DEFINES}")
set_source_files_properties(main.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_PATHS};${NGRAPH_TBB_OPTION}")
set_source_files_properties(main.cpp PROPERTIES COMPILE_DEFINITIONS "${HEADER_SEARCH_DEFINES}")
endif()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment