.. batch_norm_training.rst:

#################
BatchNormTraining
#################

.. code-block:: cpp

   BatchNormTraining  // Compute mean and variance from the input.


Description
===========



Inputs
------

+---------------------+-------------------------+------------------------------+
| Name                | Element Type            | Shape                        |
+=====================+=========================+==============================+
| ``input``           | real                    | :math:`(\bullet, C, \ldots)` |
+---------------------+-------------------------+------------------------------+
| ``gamma``           | same as ``input``       | :math:`(C)`                  |
+---------------------+-------------------------+------------------------------+
| ``beta``            | same as ``input``       | :math:`(C)`                  |
+---------------------+-------------------------+------------------------------+


Attributes
----------

+------------------+--------------------+--------------------------------------------------------+
| Name             | Type               | Notes                                                  |
+==================+====================+========================================================+
| ``epsilon``      | ``double``         | Small bias added to variance to avoid division by 0.   |
+------------------+--------------------+--------------------------------------------------------+

Outputs
-------

+---------------------+-------------------------+-----------------------------+
| Name                | Element Type            | Shape                       |
+=====================+=========================+=============================+
| ``normalized``      | same as ``gamma``       | Same as ``input``           |
+---------------------+-------------------------+-----------------------------+
| ``batch_mean``      | same as ``gamma``       | :math:`(C)`                 |
+---------------------+-------------------------+-----------------------------+
| ``batch_variance``  | same as ``gamma``       | :math:`(C)`                 |
+---------------------+-------------------------+-----------------------------+

The ``batch_mean`` and ``batch_variance`` outputs are computed per-channel from 
``input``.


Mathematical Definition
=======================

The axes of the input fall into two categories: positional and channel, with 
channel being axis 1. For each position, there are :math:`C` channel values, 
each normalized independently.

Normalization of a channel sample is controlled by two values:

*  the `batch_mean` :math:`\mu`, and
   
*  the `batch_variance` :math:`\sigma^2`; 

and by two scaling attributes: :math:`\gamma` and :math:`\beta`. 

The values for :math:`\mu` and :math:`\sigma^2` come from computing the 
mean and variance of ``input``.

.. math::

   \mu_c &= \mathop{\mathbb{E}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\
   \sigma^2_c &= \mathop{\mathtt{Var}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\
   \mathtt{normlized}_{\bullet, c, \ldots} &= \frac{\mathtt{input}_{\bullet, c, \ldots}-\mu_c}{\sqrt{\sigma^2_c+\epsilon}}\gamma_c+\beta_c

Backprop
========

.. math::

   [\overline{\texttt{input}}, \overline{\texttt{gamma}}, \overline{\texttt{beta}}]=\\
   \mathop{\texttt{BatchNormTrainingBackprop}}(\texttt{input},\texttt{gamma},\texttt{beta},\texttt{mean},\texttt{variance},\overline{\texttt{normed_input}}).



C++ Interface
==============

.. doxygenclass:: ngraph::op::BatchNormTraining
   :project: ngraph
   :members: