.. framework/index: 

#############################
Integrate Generic Frameworks   
#############################

In this section, written for framework architects or engineers who want 
to optimize brand new, generic, or less widely-supported frameworks, we provide 
some of our learnings from our "framework Direct Optimization (framework DO)" 
work and custom bridge code, such as that for our `ngraph tensorflow bridge`_.



.. important:: This section contains articles for framework owners or developers
   who want to incorporate the nGraph library directly into their framework and 
   optimize for some specific compute-time characteristic. 


.. toctree::
   :maxdepth: 1 

   generic.rst



When using a framework to run a model or deploy an algorithm on nGraph 
devices, there are some additional configuration options that can be 
incorporated -- manually on the command line or via scripting -- to improve 
performance. Fine-tuning an nGraph-enabled device is as much of an art as it 
is a science; there are virtually limitless ways to do so. 

Since a framework is typically designed around some feature, such as fast 
training using image data, inference on a mobile device, or support for voice 
and speech pattern recognition, a framework cannot optimize for all 
possibilities at the same time.   

In general, the larger and more complex a framework is, the harder it becomes 
to navigate and extract the best performance; configuration options that are 
enabled by "default" from the framework side can sometimes slow down compilation 
without the developer being any the wiser. Sometimes only `a few small`_ 
adjustments can increase performance. Likewise, a minimalistic framework that 
is designed around one specific kind of model can sometimes offer significant 
performance-improvement opportunities by lowering overhead. 

Right now the preferred way for a data scientist to get better performance is 
to shop around and select the framework that is "already" designed or optimized 
for some characteristic or trait of the model they want to build, test, tweak, 
or run. One challenge of the framework developer, then, is to differentiate from 
the pack by providing a means for the data scientist to obtain reproducible 
results. The other challenge is to provide sufficient documentation, or to 
provide sufficient hints for how to do any "fine-tuning" for specific use cases. 

How this has worked in creating the 
:doc:`the direct optimizations <../framework-integration-guides>` we've shared 
with the developer community, our engineering teams carefully 
`tune the workload to extract best performance`_ 
from a specific :abbr:`DL (Deep Learning)` model embedded in a specific framework 
that is training a specific dataset. Our forks of the frameworks adjust the code 
and/or explain how to set the parameters that achieve reproducible results. 

Some of the ways we attempt to improve performance include: 

* Testing and recording the results of various system-level configuration options
  or enabled or disabled flags,
* Compiling with a mix of custom environment variables, 
* Finding semi-related comparisons for benchmarking [#1]_, and 
* Tuning lower levels of the system so that the machine-learning algorithm can 
  learn faster or more accurately that it did on previous runs, 
* Incorporating various :doc:`../ops/index` to build graphs more efficiently. 

This approach, however, is obviously not a scalable solution for developers on  
the framework side who are trying to support multiple use cases. Nor is it ideal 
for teams looking to pivot or innovate multi-layer solutions based on something 
**other than training speed**, things like accuracy or precision. Chasing 
performance improvements does eventually yield a diminishing 
:abbr:`Return on Investment (ROI)`, though it is up to the framework 
developer to decide when that is for each of their customers.    

For these reasons, we're providing some of the more commonly-used options for 
fine-tuning various code deployments to the nGraph-enabled devices we 
currently support. Watch this section as we enable new devices and post new 
updates. 

.. rubric:: Footnotes

.. [#1] Benchmarking performance of DL systems is a young discipline; it is a
   good idea to be vigilant for results based on atypical distortions in the 
   configuration parameters. Every topology is different, and performance 
   changes can be attributed to multiple causes. Also watch out for the word "theoretical" in comparisons; actual performance should not be 
   compared to theoretical performance.     


.. _ngraph tensorflow bridge: http://ngraph.nervanasys.com/docs/latest/framework-integration-guides.html#tensorflow
.. _tune the workload to extract best performance: https://ai.intel.com/accelerating-deep-learning-training-inference-system-level-optimizations
.. _a few small: https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi
.. _Movidius: https://www.movidius.com/