Commit 8a5a4c89 authored by Leona C's avatar Leona C Committed by Sang Ik Lee

Robust debugging docs (#4060)

* Robust debugging docs

* Add section on nbench and address comments from review

* Collaborate with Gauri to revise profiling section

* Revise and PR feedback

* Move note

* Fix wording

* Order sections more logically and fix a comma

* Phrasing fix on nbench_tf summary

* More prominent notice of experimental debug flags

* Better description for diagnostic tools

* Remove miscellaneous framework support

* clean up section

* Remove deprecated links

* Update sitemap to not use a page title

* Useful descriptions

* PR feedback

* Not a flag

* Prebuilt MLIR compile flag available

* Remove duplicate flag

* update pass manager example

* Meta documentation note in release notes

* Ensure docs build with lastest upstream ops

* Transpose op doc fixes

* Better intra-doc links

* Commas in csv format are important

* Final review with Gauri

* Remove dupes, CPU-specific envvars

* changes re: Comments from Gauri
Co-authored-by: 's avatarRobert Kimball <robert.kimball@intel.com>
Co-authored-by: 's avatarSang Ik Lee <sang.ik.lee@intel.com>
parent 8e46ff86
......@@ -1414,15 +1414,16 @@ input[type="radio"][disabled], input[type="checkbox"][disabled] {
border-left-width: 0;
}
.wy-table thead, .rst-content table.docutils thead, .rst-content table.field-list thead {
color: #000;
text-align: left;
color: #1b1b1e;
text-align: center;
vertical-align: bottom;
white-space: nowrap;
}
.wy-table thead th, .rst-content table.docutils thead th, .rst-content table.field-list thead th {
font-family: Nunito, 'Nunito Sans', sans;
font-variant: small-caps;
border-bottom: solid 1px #c1c7d7;
font-family: monospace;
font-size: 1.33em;
background-color: #3e4451;
color: #e0e0e0;
}
.wy-table td, .rst-content table.docutils td, .rst-content table.field-list td {
background-color: transparent;
......@@ -1491,7 +1492,6 @@ input[type="radio"][disabled], input[type="checkbox"][disabled] {
.wy-table-horizontal td, .wy-table-horizontal th {
border-width: 0 0 1px 0;
border-bottom: 1px solid #e1e4e5;
font-variant: small-caps;
}
.wy-table-horizontal tbody > tr:last-child td {
border-bottom-width: 0;
......@@ -1512,8 +1512,8 @@ input[type="radio"][disabled], input[type="checkbox"][disabled] {
.wy-table-responsive table th {
white-space: pre-wrap;
font-family: Nunito, 'Nunito Sans', sans;
font-variant: small-caps;
font-family: monospace;
}
......@@ -2310,7 +2310,7 @@ div[class^='highlight'] pre {
margin-bottom: 12px;
}
.rst-content .toc-backref {
color: #7b7064;
color: #1b1b1e;
}
.rst-content .align-right {
float: right;
......@@ -3000,7 +3000,7 @@ footer span.commit code, footer span.commit .rst-content tt, .rst-content footer
}
.rst-footer-buttons {
*zoom: 1;
zoom: 1;
}
.rst-footer-buttons:before, .rst-footer-buttons:after {
display: table;
......
......@@ -32,12 +32,12 @@ steps and the code below.
#. Create a "pass manager" object (line 1)
#. Populate it with the desired pass or passes (lines 2-4)
#. Invoke the pass manager with a pointer to your unoptimized graph, and
it will return a pointer to an optimized graph (lines 5-6)
it will return a pointer to an optimized graph (lines 5-8)
.. literalinclude:: ../../../../../test/cpu_fusion.cpp
.. literalinclude:: ../../../../../test/pass_memory_layout.cpp
:language: cpp
:lines: 2085-2092
:lines: 222-230
:linenos:
nGraph Core includes a large library of hardware-agnostic passes useful
......
......@@ -13,4 +13,3 @@ Working with Frameworks
onnx_integ.rst
paddle_integ.rst
tensorflow_connect.rst
other/index.rst
.. frameworks/other/index.rst:
.. _fw_other:
.. contents::
Integrating other frameworks
============================
This section details some of the *configuration options* and some of the
*environment variables* that can be used to tune for optimal performance when
your system already has a version of nGraph installed with one or more of our
supported :doc:`../../backends/index`.
Regardless of the framework, after the :doc:`../../buildlb` step, a good place
to start usually involves making the libraries available to the framework. On
Linux\* systems built on Intel® Architecture, that command tends to looks
something like:
.. code-block:: console
export NGRAPH_CPP_BUILD_PATH=path/to/ngraph_dist/
export LD_LIBRARY_PATH=path/to/ngraph_dist/lib/
Find or display version
-----------------------
If you're working with the :doc:`../../python_api/index`, the following command
may be useful:
.. code-block:: console
python3 -c "import ngraph as ng; print('nGraph version: ',ng.__version__)";
To manually build a newer version than is available from the latest `PyPI`_
(:abbr:`Python Package Index (PyPI)`), see our nGraph Python API `BUILDING.md`_
documentation.
Activate logtrace-related environment variables
-----------------------------------------------
Another configuration option is to activate ``NGRAPH_CPU_DEBUG_TRACER``,
a runtime environment variable that supports extra logging and debug detail.
This is a useful tool for data scientists interested in outputs from logtrace
files that can, for example, help in tracking down model convergences. It can
also help engineers who might want to add their new ``Backend`` to an existing
framework to compare intermediate tensors/values to references from a CPU
backend.
To activate this tool, set the ``env`` var ``NGRAPH_CPU_DEBUG_TRACER=1``.
It will dump ``trace_meta.log`` and ``trace_bin_data.log``. The names of the
logfiles can be customized.
To specify the names of logs with those flags:
::
NGRAPH_TRACER_LOG = "meta.log"
NGRAPH_BIN_TRACER_LOG = "bin.log"
The meta_log contains::
kernel_name, serial_number_of_op, tensor_id, symbol_of_in_out, num_elements, shape, binary_data_offset, mean_of_tensor, variance_of_tensor
A line example from a unit-test might look like::
K=Add S=0 TID=0_0 >> size=4 Shape{2, 2} bin_data_offset=8 mean=1.5 var=1.25
The binary_log line contains::
tensor_id, binary data (tensor data)
A reference for the implementation of parsing these logfiles can also be found
in the unit test for this feature.
FMV
---
FMV stands for :abbr:`Function Multi-Versioning`, and it can also provide a
number of generic ways to patch or bring architecture-based optimizations to
the :abbr:`Operating System (OS)` that is handling your ML environment. See
the `GCC wiki for details`_.
If your nGraph build is a Neural Network configured on Clear Linux\* OS
for Intel® Architecture, and it includes at least one older CPU, the
`following article may be helpful`_.
Training Deep Neural Networks
-----------------------------
Before tweaking various environment variables, be aware that how the computation
gets executed depends on the data layout that the model is using. ``NHWC`` and
``NCHW`` are common layouts in Deep Learning models. Your ultimate
runtime can vary greatly -- even when all other factors are exactly the same --
when this detail is overlooked.
For CPU (and most cuDNN) backends, the preferred layout is currently ``NCHW``.
* **N** -- Number of images per batch
* **C** -- Channel of the image (expressed as a number like 3 for RGB and 1
for grayscale)
* **H** -- Height of the image
* **W** -- Width of the image
Intel® Math Kernel Library for Deep Neural Networks
---------------------------------------------------
.. important:: Intel® MKL-DNN is automatically enabled as part of an
nGraph default :doc:`build <../../buildlb>`; you do *not* need to add it
separately or as an additional component to be able to use these
configuration settings.
The following `KMP`_ options were originally optimized for models using the
Intel® `MKL-DNN`_ to train models with the ``NCHW`` data layout; however, other
configurations can be explored.
* ``KMP_BLOCKTIME`` Sets the time, in milliseconds, that a thread should wait
after completing the execution of a parallel region, before sleeping.
* ``KMP_AFFINITY`` Enables the runtime library to bind threads to physical
processing units. A useful article that explains more about how to use this
option for various CPU backends is here: https://web.archive.org/web/20190401182248/https://www.nas.nasa.gov/hecc/support/kb/Using-Intel-OpenMP-Thread-Affinity-for-Pinning_285.html
* ``KMP_SETTINGS`` Enables (``true``) or disables (``false``) the printing of
OpenMP\* runtime library environment variables during program execution.
* ``OMP_NUM_THREADS`` Specifies the number of threads to use.
nGraph-enabled Intel® Xeon®
---------------------------
The list below includes recommendations on data layout, parameters, and
application configuration to achieve best performance running DNN workloads on
Intel® Xeon® (CPU processor) systems.
Threading
---------
The number of threads set by ``OMP_NUM_THREADS`` ought not exceed the number of
physical cores. The threads should be pinned to their respective physical cores
and activated as follows:
* When ``HT=off``, ``KMP_AFFINITY=compact,granularity=fine``
* When ``HT=on``, ``KMP_AFFINITY=compact,1,0,granularity=fine``
Memory allocation
-----------------
Buffer pointers should be aligned on 64-byte boundaries. NUMA policy should be
configured for local memory allocation (``numactl --localloc``).
Convolution shapes
^^^^^^^^^^^^^^^^^^
* When **running inference, or training for forward-propagation and weight
updates**, for best performance:
- the number of input channels should be 1, 3, or a multiple of SIMD-width (8
for AVX2 systems, 16 for AVX512 systems).
- the number of output channels should be a multiple of SIMD-width (8 for AVX2
systems, 16 for AVX512 systems).
* When **training backward propagation**, the number of input and output
channels should be a multiple of SIMD-width (8 for AVX2 systems, 16 for AVX512
systems),
- padding should not exceed :math:`0.5x` where :math:`x` is the kernel size.
- kernel width should be less than 14.
``OMP_NUM_THREADS``
^^^^^^^^^^^^^^^^^^^
The best resource for this configuration option is the Intel® OpenMP\* docs
at the following link: `Intel OpenMP documentation`_. ``OMP_NUM_THREADS``
defaults to the number of logical cores. To check the number of cores on your
system, you can run the following on the command-line to see the details
of your CPU:
.. code-block:: console
$ lscpu
Intra-op and inter-op parallelism
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``intra_op_parallelism_threads``
* ``inter_op_parallelism_threads``
Some frameworks, like TensorFlow\*, use these settings to improve performance;
however, they are often not sufficient for optimal performance. Framework-based
adjustments cannot access the underlying NUMA configuration in multi-socket
Intel® Xeon® processor-based platforms, which is a key requirement for
many kinds of inference-engine computations. See the next section on NUMA
performance to learn more about this performance feature available to systems
utilizing nGraph.
NUMA performance
~~~~~~~~~~~~~~~~~
NUMA stands for :abbr:`Non-Uniform Memory Access (NUMA)`. It indicates how each
CPU can access memory attached to each socket.
Without the "knowledge" of CPU socket and NUMA configuration, a simple thread
affinity (as in the case of thread pool) does not lead to optimal performance.
In fact, it can sometimes prohibitively decrease throughput; a core from socket
0 might have to continually access cache lines from the memory bank of socket 1,
increasing bandwidth demands on the Intel® Ultra-Path Interconnect (Intel® UPI).
This situation is exacerbated with larger number of sockets found in 4, 8, and
16-socket systems. We believe that users need to be aware of system level
optimizations in addition to framework specific configuration parameters to
achieve the best performance for NN workloads on CPU platforms. The nGraph
Compiler stack runs on transformers handled by Intel® Architecture (IA), and
thus can make more efficient use of the underlying hardware.
.. _PyPI: https://pypi.org/project/ngraph-core
.. _KMP: https://software.intel.com/en-us/node/522691
.. _MKL-DNN: https://github.com/intel/mkl-dnn
.. _Intel OpenMP documentation: https://www.openmprtl.org/documentation
.. _Movidius: https://www.movidius.com/
.. _BUILDING.md: https://github.com/NervanaSystems/ngraph/blob/master/python/BUILDING.md
.. _GCC wiki for details: https://gcc.gnu.org/wiki/FunctionMultiVersioning
.. _following article may be helpful: https://clearlinux.org/documentation/clear-linux/tutorials/fmv
......@@ -46,7 +46,6 @@ nGraph Compiler Stack Documentation
frameworks/tensorflow_connect.rst
frameworks/onnx_integ.rst
frameworks/paddle_integ.rst
frameworks/other/index.rst
.. toctree::
:maxdepth: 1
......@@ -68,8 +67,7 @@ nGraph Compiler Stack Documentation
:caption: Backend Support
Basic Concepts <backends/index.rst>
backends/plaidml-ng-api/index.rst
Integrating Other Backends <backends/cpp-api.rst>
Adding New Backends <backends/cpp-api.rst>
.. toctree::
......@@ -89,9 +87,14 @@ nGraph Compiler Stack Documentation
.. toctree::
:maxdepth: 1
:caption: Debugging Graphs
inspection/index.rst
:caption: Diagnostics
inspection/debug_core.rst
inspection/debug_tf.rst
inspection/debug_onnx.rst
inspection/debug_paddle.rst
inspection/viz_tools.rst
inspection/profiling.rst
.. toctree::
......
.. inspection/debug_core.rst:
.. contents::
.. _debug_core:
Diagnostics
###########
.. important:: Many of the following flags may be experimental only and subject to change.
Build nGraph with various compile flags and environment variables to diagnose performance
and memory issues. See also :doc:`profiling`.
Compile Flags
=============
.. csv-table::
:header: "Compile Flag", "Description", "Default Value"
:widths: 20, 35, 5
:escape: ~
``NGRAPH_CODE_COVERAGE_ENABLE``, Enable code coverage data collection, ``FALSE``
``NGRAPH_DEBUG_ENABLE``, Enable output for ``NGRAPH_DEBUG`` statements, ``FALSE``
``NGRAPH_DEPRECATED_ENABLE``, Enable compiler deprecation pragmas for deprecated APIs (recommended only for development use), ``FALSE``
``NGRAPH_DEX_ONLY``, Build CPU DEX without codegen, ``FALSE``
``NGRAPH_DISTRIBUTED_ENABLE``, Enable distributed training using MLSL/OpenMPI, ``OFF``
``NGRAPH_DISTRIBUTED_MLSL_ENABLE``, Use MLSL, ``OFF``
``NGRAPH_DOC_BUILD_ENABLE``, Automatically build documentation, ``OFF``
``NGRAPH_FAST_MATH_ENABLE``, Enable fast math, ``ON``
``NGRAPH_HALIDE``, ,``OFF``
``NGRAPH_INTERPRETER_ENABLE``, Control the building of the ``INTERPRETER`` backend, ``TRUE``
``NGRAPH_INTERPRETER_STATIC_LIB_ENABLE``, Enable build INTERPRETER backend static library, ``FALSE``
``NGRAPH_JSON_ENABLE``, Enable JSON based serialization and tracing features, ``TRUE``
``NGRAPH_LIB_VERSIONING_ENABLE``, Enable shared library versioning, ``FALSE``
``NGRAPH_MLIR_ENABLE``, Control the building of MLIR backend, ``FALSE``
``NGRAPH_NOP_ENABLE``, Control the building of the NOP backend, ``TRUE``
``NGRAPH_ONNX_IMPORT_ENABLE``, Enable ONNX importer, ``FALSE``
``NGRAPH_PLAIDML_ENABLE``, Enable the PlaidML backend, ``${PLAIDML_FOUND}``
``NGRAPH_PYTHON_BUILD_ENABLE``, Enable build of ``NGRAPH`` python package wheel, ``FALSE``
``NGRAPH_STATIC_LIB_ENABLE``, Enable build ``NGRAPH`` static library, ``FALSE``
``NGRAPH_TBB_ENABLE``, Only if (``NGRAPH_CPU_ENABLE``) Control usage of TBB for CPU backend, ``TRUE``
``NGRAPH_TOOLS_ENABLE``, Control the building of tools, ``TRUE``
``NGRAPH_UNIT_TEST_ENABLE``, Control the building of unit tests, ``TRUE``
``NGRAPH_USE_PREBUILT_LLVM``, Use a precompiled LLVM, ``FALSE``
``NGRAPH_USE_PREBUILT_MLIR``, Use the `precompiled MLIR`_,``FALSE``
Environment Variables
=====================
.. important:: Many of the following flags may be experimental only and subject to change.
.. csv-table::
:header: "Environment Variable", "Description"
:widths: 20, 35
:escape: ~
``NGRAPH_DISABLE_LOGGING``, Disable printing all logs irrespective of build type
``NGRAPH_DISABLED_FUSIONS``, Disable specified fusions. Specified as `;` separated list and supports regex
``NGRAPH_ENABLE_REPLACE_CHECK``, Enables strict type checking in copy constructor copy_with_new_args
``NGRAPH_ENABLE_SERIALIZE_TRACING``, generates 1 ``json`` file per pass to run with ``nbench`` for localized execution rather than whole stack execution
``NGRAPH_ENABLE_TRACING``, Enables creating graph execution timelines to be viewed in ``chrome://tracing`` see also :doc:`viz_tools`.
``NGRAPH_ENABLE_VISUALIZE_TRACING``, Enables creating visual graph for each pass ``.svg`` files by default; see also :doc:`viz_tools`
``NGRAPH_FAIL_MATCH_AT``, Allows one to specify node name patterns to abort pattern matching at particular nodes. Helps debug an offending fusion
``NGRAPH_GTEST_INFO``, Enables printing info about a specific test
``NGRAPH_INTER_OP_PARALLELISM``, See :ref:`interop_intraop`
``NGRAPH_INTRA_OP_PARALLELISM``, See :ref:`interop_intraop`
``NGRAPH_PASS_ATTRIBUTES``, Specify pass-specific attributes as a semi-colon separated list to be enabled or disabled. Naming of pass attributes is up to the backends and see also `pass config`_
``NGRAPH_PASS_ENABLES``, Specify a semi-colon separated list to enable or disable a pass on core or backend. This will override the default enable/disable values
``NGRAPH_PROFILE_PASS_ENABLE``, Dump the name and execution time of each pass; shows per-pass time taken to compile
``NGRAPH_PROVENANCE_ENABLE``, Enable adding provenance info to nodes. This will also be added to serialized files.
``NGRAPH_SERIALIZER_OUTPUT_SHAPES``, Enable adding output shapes in the serialized graph
``NGRAPH_VISUALIZE_EDGE_JUMP_DISTANCE``, Calculated in code; helps prevent *long* edges between two nodes very far apart
``NGRAPH_VISUALIZE_EDGE_LABELS``, Set it to 1 in ``~/.bashrc``; adds label to a graph edge when NGRAPH_ENABLE_VISUALIZE_TRACING=1
``NGRAPH_VISUALIZE_TREE_OUTPUT_SHAPES``, Set it to 1 in ``~/.bashrc``; adds output shape of a node when NGRAPH_ENABLE_VISUALIZE_TRACING=1
``NGRAPH_VISUALIZE_TREE_OUTPUT_TYPES``, Set it to 1 in ``~/.bashrc``; adds output type of a node when NGRAPH_ENABLE_VISUALIZE_TRACING=1
``NGRAPH_VISUALIZE_TRACING_FORMAT``, Default format is ``.svg``. See also :doc:`viz_tools`
``OMP_NUM_THREADS``, See: `OpenMPI Runtime Library Documentation`_
.. _debug_tracer:
Debug Tracer
------------
Another diagnostic configuration option is to activate ``NGRAPH_CPU_DEBUG_TRACER``,
a runtime environment variable that supports extra logging and debug detail.
This is a useful tool for data scientists interested in outputs from logtrace
files that can, for example, help in tracking down model convergences. It can
also help engineers who might want to add their new ``Backend`` to an existing
framework to compare intermediate tensors/values to references from a CPU
backend.
To activate this tool, set the ``env`` var ``NGRAPH_CPU_DEBUG_TRACER=1``.
It will dump ``trace_meta.log`` and ``trace_bin_data.log``. The names of the
logfiles can be customized.
To specify the names of logs with those flags:
::
NGRAPH_TRACER_LOG = "meta.log"
NGRAPH_BIN_TRACER_LOG = "bin.log"
.. _interop_intraop:
Intra-op and inter-op parallelism
---------------------------------
* ``intra_op_parallelism_threads``
* ``inter_op_parallelism_threads``
Some frameworks, like TensorFlow\*, use these settings to improve performance;
however, they are often not sufficient for optimal performance. Framework-based
adjustments cannot access the underlying NUMA configuration in multi-socket
Intel® Xeon® processor-based platforms, which is a key requirement for
many kinds of inference-engine computations.
The meta_log contains::
kernel_name, serial_number_of_op, tensor_id, symbol_of_in_out, num_elements, shape, binary_data_offset, mean_of_tensor, variance_of_tensor
A line example from a unit-test might look like::
K=Add S=0 TID=0_0 >> size=4 Shape{2, 2} bin_data_offset=8 mean=1.5 var=1.25
The binary_log line contains::
tensor_id, binary data (tensor data)
A reference for the implementation of parsing these logfiles can also be found
in the unit test for this feature.
.. _pass config: https://github.com/NervanaSystems/ngraph/blob/a4a3031bb40f19ec28704f76de39762e1f27e031/src/ngraph/pass/pass_config.cpp#L54
.. _OpenMPI Runtime Library Documentation: https://www.openmprtl.org/documentation
.. _precompiled MLIR: https://github.com/IntelAI/mlir
\ No newline at end of file
.. inspection/debug_onnx:
.. _debug_onnx:
Debug ONNX
==========
.. note:: These flags are all disabled by default
.. csv-table::
:header: "Flag", "Description"
:widths: 20, 35
:escape: ~
``ONNXRUNTIME_NGRAPH_DUMP_OPS``, Dumps ONNX ops
``ONNXRUNTIME_NGRAPH_LRU_CACHE_SIZE``, Modify LRU cache size (``NGRAPH_EP_LRU_CACHE_DEFAULT_SIZE 500``)
\ No newline at end of file
.. inspection/debug_paddle.rst:
.. _debug_paddle:
Debug PaddlePaddle\*
====================
PaddlePaddle has its `own env vars`_.
.. _own env vars: https://github.com/PaddlePaddle/Paddle/blob/cdd46d7e022add8de56995e681fa807982b02124/python/paddle/fluid/__init__.py#L161-L227
\ No newline at end of file
.. inspection/debug_tf:
.. _debug_tf:
Debug TensorFlow\*
==================
.. note:: These flags are all disabled by default
For profiling with TensorFlow\* and ``nbench``, see :ref:`nbench_tf`.
.. csv-table::
:header: "Flag", "Description"
:widths: 20, 35
:escape: ~
``NGRAPH_ENABLE_SERIALIZE=1``,Generate nGraph-level serialized graphs
``NGRAPH_TF_VLOG_LEVEL=5``, Generate ngraph-tf logging info for different passes
``NGRAPH_TF_LOG_PLACEMENT=1``, Generate op placement log at stdout
``NGRAPH_TF_DUMP_CLUSTERS=1``, Dump Encapsulated TF Graphs formatted as ``NGRAPH_cluster_<cluster_num>``
``NGRAPH_TF_DUMP_GRAPHS=1``,"Dump TF graphs for different passes: precapture, capture, unmarked, marked, clustered, declustered, encapsulated"
``TF_CPP_MIN_VLOG_LEVEL=1``, Enable TF CPP logs
``NGRAPH_TF_DUMP_DECLUSTERED_GRAPHS=1``, Dump graphs with final clusters assigned. Use this to view TF computation graph with colored nodes indicating clusters
``NGRAPH_TF_USE_LEGACY_EXECUTOR``, This flag will be obsolete soon.
.. inspection/index:
.. _inspection:
Visualization Tools
###################
nGraph provides serialization and deserialization facilities, along with the
ability to create image formats or a PDF.
When visualization is enabled, ``svg`` files for your graph get generated. The
default can be adjusted by setting the ``NGRAPH_VISUALIZE_TRACING_FORMAT``
flag to another format, like PNG or PDF.
.. note:: Large graphs are usually not legible with formats like PDF.
Large graphs may require additional work to get into a human-readable format.
On the back end, very long edges will need to be cut to make (for example) a
hard-to-render training graph tractable. This can be a tedious process, so
incorporating the help of a rendering engine or third-party tool like those
listed below may be useful.
.. Additional scripts
.. ==================
.. We have provided a script to convert the `most common default output`_, nGraph
.. ``JSON``, to an output that is better able to handle detailed graphs; however,
.. we do not offer user support for this script. The script will produce a
.. ``.graphml`` file that can be imported and inspected with third-party tools
.. like:
#. `Gephi`_
#. `Cytoscape`_
.. #. `Netron`_ support tentatively planned to come soon
.. _CMakeLists.txt: https:github.com/NervanaSystems/ngraph/blob/master/CMakeLists.txt
.. _most common default output: https:github.com/NervanaSystems/ngraph/contrib/tools/graphml/ngraph_json_to_graphml.py
.. _visualize_tree.cpp: https://github.com/NervanaSystems/ngraph/blob/master/src/ngraph/pass/visualize_tree.cpp
.. _Netron: https:github.com/lutzroeder/netron/blob/master/README.md
.. _Gephi: https:gephi.org
.. _Cytoscape: https:cytoscape.org
:orphan:
.. _inspection:
Debug Tools
###########
.. toctree::
:maxdepth: 1
debug_core.rst
debug_tf.rst
debug_onnx.rst
debug_paddle.rst
viz_tools.rst
profiling.rst
.. inspection/profiling.rst:
.. _profiling:
Performance testing with ``nbench``
###################################
The nGraph Compiler stack includes the ``nbench`` tool which
provides additional methods of assessing or debugging performance
issues.
If you follow the build process under :doc:`../buildlb`, the
``NGRAPH_TOOLS_ENABLE`` flag defaults to ``ON`` and automatically
builds ``nbench``. As its name suggests, ``nbench`` can be used
to benchmark any nGraph-serialized model with a given backend.
To benchmark an already-serialized nGraph ``.json`` model with, for
example, a ``CPU`` backend, run ``nbench`` as follows.
.. code-block:: console
$ cd ngraph/build/src/tools
$ nbench/nbench -b CPU - i 1 -f <serialized_json file>
Samples for testing can be found under ``ngraph/test/models``.
.. _nbench:
``nbench``
==========
.. code-block:: none
Benchmark and nGraph JSON model with a given backend.
SYNOPSIS
nbench [-f <filename>] [-b <backend>] [-i <iterations>]
OPTIONS
-f|--file Serialized model file
-b|--backend Backend to use (default: CPU)
-d|--directory Directory to scan for models. All models are benchmarked.
-i|--iterations Iterations (default: 10)
-s|--statistics Display op statistics
-v|--visualize Visualize a model (WARNING: requires Graphviz installed)
--timing_detail Gather detailed timing
-w|--warmup_iterations Number of warm-up iterations
--no_copy_data Disable copy of input/result data every iteration
--dot Generate Graphviz dot file
--double_buffer Double buffer inputs and outputs
.. _nbench_tf:
Use ``nbench`` to ease end-to-end debugging for TensorFlow\*
------------------------------------------------------------
Rather than run a TensorFlow\* model "end-to-end" all the time,
developers who notice a problem with performance or memory usage
can generate a unique serialized model for debugging by using
``NGRAPH_ENABLE_SERIALIZE=1``. This serialized model can then be
run and re-run with ``nbench`` to efficiently experiment with any
changes in ``ngraph`` space; developers can make changes and test
changes without the overhead of a complete end-to-end compilation
for each change.
Find or display version
-----------------------
If you're working with the :doc:`../../python_api/index`, the following command
may be useful:
.. code-block:: console
python3 -c "import ngraph as ng; print('nGraph version: ',ng.__version__)";
To manually build a newer version than is available from the latest `PyPI`_
(:abbr:`Python Package Index (PyPI)`), see our nGraph Python API `BUILDING.md`_
documentation.
.. _PyPI: https://pypi.org/project/ngraph-core/
.. _BUILDING.md: https://github.com/NervanaSystems/ngraph/blob/master/python/BUILDING.md
.. inspection/viz_tools.rst:
.. _viz_tools:
General Visualization Tools
###########################
nGraph provides serialization and deserialization facilities, along with the
ability to create image formats or a PDF.
``NGRAPH_ENABLE_VISUALIZE_TRACING=1`` enables visualization and generates graph
visualization files.
.. note:: Using ``NGRAPH_ENABLE_VISUALIZE_TRACING=1`` will affect performance.
When visualization is enabled, ``svg`` files for your graph get generated. The
default format can be adjusted by setting the ``NGRAPH_VISUALIZE_TRACING_FORMAT``
flag to another format, like PNG or PDF.
.. note:: Large graphs are usually not legible with formats like PDF.
Large graphs may require additional work to get into a human-readable format.
On the back end, very long edges will need to be cut to make (for example) a
hard-to-render training graph tractable. This can be a tedious process, so
incorporating the help of a rendering engine or third-party tool like one
listed below may be useful.
#. `Gephi`_
#. `Cytoscape`_
#. `Netron`_
.. Additional scripts
.. ==================
.. We have provided a script to convert the `most common default output`_, nGraph
.. ``JSON``, to an output that is better able to handle detailed graphs; however,
.. we do not offer user support for this script. The script will produce a
.. ``.graphml`` file that can be imported and inspected with third-party tools
.. like those listed above.
.. _most common default output: https:github.com/NervanaSystems/ngraph/contrib/tools/graphml/ngraph_json_to_graphml.py
.. _Netron: https:github.com/lutzroeder/netron/blob/master/README.md
.. _Gephi: https:gephi.org
.. _Cytoscape: https:cytoscape.org
......@@ -21,22 +21,22 @@ matrix transposition, and also more general cases on higher-rank tensors.
Inputs
------
+-----------------+-------------------------+---------------------------------------------+
| Name | Element Type | Shape |
+=================+=========================+=============================================+
| ``arg`` | Any | Any |
+-----------------+-------------------------+---------------------------------------------+
| ``input_order`` | ``element::i64`` | ``[n]``, where `n`` is the rank of ``arg``. |
+-----------------+-------------------------+---------------------------------------------+
+-----------------+-------------------------+----------------------------------------------+
| Name | Element Type | Shape |
+=================+=========================+==============================================+
| ``arg`` | Any | Any |
+-----------------+-------------------------+----------------------------------------------+
| ``input_order`` | ``element::i64`` | ``[n]``, where ``n`` is the rank of ``arg``. |
+-----------------+-------------------------+----------------------------------------------+
Outputs
-------
+-----------------+-------------------------+-------------------------------------------------------------------------------+
| Name | Element Type | Shape |
+=================+=========================+===============================================================================+
| ``output`` | Same as ``arg`` | ``P(ShapeOf(arg))``, where `P` is the permutation supplied for `input_order`. |
+-----------------+-------------------------+-------------------------------------------------------------------------------+
+-----------------+-------------------------+---------------------------------------------------------------------------------+
| Name | Element Type | Shape |
+=================+=========================+=================================================================================+
| ``output`` | Same as ``arg`` | ``P(ShapeOf(arg))``, where *P* is the permutation supplied for ``input_order``. |
+-----------------+-------------------------+---------------------------------------------------------------------------------+
The input ``input_order`` must be a vector of shape `[n]`, where `n` is the
rank of ``arg``, and must contain every integer in the range ``[0,n-1]``. This
......@@ -69,6 +69,6 @@ Not yet implemented.
C++ Interface
=============
.. doxygenclass:: ngraph::op::v0::Transpose
.. doxygenclass:: ngraph::op::v1::Transpose
:project: ngraph
:members:
......@@ -26,6 +26,7 @@ Core updates for |version|
Latest documentation updates
----------------------------
+ Better debugging documentation
+ Dynamic Shapes and APIs
+ Provenance
+ Add linkages and overview for quantization APIs
......
......@@ -6,7 +6,6 @@
:maxdepth: 1
introduction
tutorials/index.rst
* :ref:`Framework Support <framework_support>`
......@@ -63,12 +62,18 @@
frameworks/validated/list.rst
* :ref:`Debugging Graphs <inspection>`
* :ref:`Diagnostics <inspection>`
.. toctree::
:maxdepth: 1
inspection/index.rst
inspection/debug_core.rst
inspection/debug_tf.rst
inspection/debug_onnx.rst
inspection/debug_paddle.rst
inspection/viz_tools.rst
inspection/profiling.rst
* :ref:`Contribution <contribution_guide>`
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment