avg_pool.rst 7.03 KB
Newer Older
Scott Cyphers's avatar
Scott Cyphers committed
1 2 3 4 5 6
.. avg_pool.rst:


7 8 9 10 11
.. code-block:: cpp

   AvgPool  // Average Pooling operation

Scott Cyphers's avatar
Scott Cyphers committed
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Average pooling windows its input and produces an average for each window.


| Name            | Element Type   | Shape                          | Notes              |
| ``data``        | Any            | :math:`(N,C,d_1,\ldots,d_n)`   | :math:`n>0, d_i>0` |


| Name                 | Type            | Notes                            |
| ``w``                | ``Shape[n]``    | Window shape. :math:`w_i\le d_i` |
| ``s``                | ``Strides[n]``  | Window strides.                  |
| ``p``                | ``Shape[n]``    | Padding below.                   |
| ``q``                | ``Shape[n]``    | Padding above.                   |
41 42
| ``i``                | ``Boolean``     | Include padding in average.      |
Scott Cyphers's avatar
Scott Cyphers committed
43 44 45 46 47 48 49 50 51 52 53


| Name            | Element Type            | Shape                          |
| ``output``      | Any                     | :math:`(N,C,d'_1,\ldots,d'_n)` |

54 55
Average pooling takes as its input, a batch tensor `data` of shape
:math:`(N,C,d_1,\ldots,d_n)`, where  where :math:`N` is the batch
Scott Cyphers's avatar
Scott Cyphers committed
size, and :math:`C > 0` is the
number of channels (sometimes called features). The dimensions
Scott Cyphers's avatar
Scott Cyphers committed
58 59 60 61
:math:`(d_1,\ldots,d_n)` correspond to the shape of an
:math:`n`-dimensional data item in a batch. For example, where
:math:`n=2`, the data may represent a two-dimensional image. It also
takes four attributes:

Scott Cyphers's avatar
Scott Cyphers committed
63 64 65
1. *window shape*,
2. *window movement strides*, (optional)
3. *padding below*, (optional)
66 67 68
4. *padding above*, (optional)
5. *include padding in average*

Scott Cyphers's avatar
Scott Cyphers committed
69 70
The shape of `output` is :math:`(N,C,d'_1,\ldots,d'_n)`, where
:math:`d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil`.
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129

**Informal definition:**
If :math:`\textit{i}` is :math:`\textit{true}`, then averages are computed as though the
padding region contained regular elements of value zero.
If :math:`\textit{i}` is :math:`\textit{false}`, then averages are computed using only the non-padding
tensor elements that are present in each window.

*Example:* Consider two instances of this operator with the following attributes:
:math:`\textit{w} = (2,2)`,
:math:`\textit{s} = (1,1)`,
:math:`\textit{p} = (1,1)`,
and (in one instance) :math:`\textit{i} = false` or (in the other instance) :math:`\textit{i} = true`.

Consider how those two operator instances would handle this input tensor:

.. math::

  T_\textit{in} = \begin{bmatrix}
     1     &  3     &  5     & \ldots \\
     7     & 11     & 13     & \ldots \\
    17     & 19     & 23     & \ldots \\
    \vdots & \vdots & \vdots & \ddots

Applying the padding indicated by the value of :math:`\textit{p}`, we have the padded image of :math:`T_\textit{in}`
as follows:

.. math::

  T_\textit{in,padded} = \begin{bmatrix}
   (0) & (0)     & (0)    & (0)      & \ldots \\
   (0) &   1     &   3    &   5      & \ldots \\
   (0) &   7     &  11    &  13      & \ldots \\
   (0) &  17     &  19    &  23      & \ldots \\
   (0) &  \vdots & \vdots &  \vdots  & \ddots

Now consider how the two variations of this example's *AvgPool* operator will compute the "average" value
of the top-left window, which contains exactly the elements:

.. math::

   (0) & (0)   \\
   (0) &   1

If :math:`\textit{i} = false`, then the operator simply ignores the padding elements.  It therefore computes the
average of the single-element set :math:`\{ 1 \}`, yielding :math:`1.0`.

If :math:`\textit{i} = true`, then the operator computes the average of the set :math:`\{ 0, 0, 0, 1\}`,
yielding `0.25`.

*Note:* This operator is ill-defined when *both* of the following conditions hold:
(1) :math:`\textit{i} = false`, and (2) the operator's other attribute values indicate
that at least one window will contain only padding elements.

**Formal definition:**
Scott Cyphers's avatar
Scott Cyphers committed
*In the absence of padding*, given an input data batch tensor
:math:`T_\textit{in}`, the output tensor is defined by the equation
Scott Cyphers's avatar
Scott Cyphers committed
132 133

.. math::

Scott Cyphers's avatar
Scott Cyphers committed
135 136 137 138 139 140 141 142
   T_\textit{out}[a,c,i_1,\ldots,i_n] =
   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}

*In the presence of padding*, we do not always want to divide by a
reciprocal equal to the number of elements in the window, since some
of the output points are determined by a window that is partly hanging
beyond the edge of the tensor. In this case we can define the output
143 144 145

In this case we can define the output
Scott Cyphers's avatar
Scott Cyphers committed
via a few intermediate steps.

Scott Cyphers's avatar
Scott Cyphers committed
148 149 150 151 152 153 154 155
First define the *sum tensor* :math:`T_\textit{sum}`, with shape
:math:`(N,C,d'_1,\ldots,d'_n)`, as follows.

.. math::

   T_\textit{sum}[a,c,i_1,\ldots,i_n] =
   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}

Scott Cyphers's avatar
Scott Cyphers committed
157 158 159 160 161 162 163 164 165

.. math::

   \textit{val}[a,c,j_1,\ldots,j_n] =
   T_\textit{in}[a,c,j_1,\ldots,j_n]&\text{if for all } k, p_k \le j_k < p_k + d_k\\

Scott Cyphers's avatar
Scott Cyphers committed
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
Second, define the *divisor tensor* :math:`T_\textit{div}`, with shape :math:`(N,C,d'_1,\ldots,d'_n)`, as follows.

.. math::

   T_\textit{div}[a,c,i_1,\ldots,i_n] =
   \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1}


.. math::

   \textit{val}[a,c,j_1,\ldots,j_n] =
   1&\text{if for all }k, p_k \le j_k < p_k + d_k\\

Scott Cyphers's avatar
Scott Cyphers committed
185 186 187 188 189 190 191 192 193 194 195 196 197
Finally, define :math:`T_\textit{out}` as the result of elementwise
dividing :math:`T_\textit{sum}` by :math:`T_\textit{div}`.  Note that
at positions where :math:`T_\textit{div}` is zero, values may be
infinity or nan.  (This corresponds to a condition where the pooling
window is completely out of bounds, encompassing no valid values.)


C++ Interface

.. doxygenclass:: ngraph::op::v0::AvgPool
   :project: ngraph
Scott Cyphers's avatar
Scott Cyphers committed
200 201