fixed gpu docs (broken links, missing description, etc)

d888b810 · Vladislav Vinogradov · e7579b90 · d888b810 · d888b810 · d888b810
Commit d888b810 authored Mar 23, 2011 by Vladislav Vinogradov
13 changed files
--- a/modules/core/doc/basic_structures.rst
+++ b/modules/core/doc/basic_structures.rst
@@ -343,6 +343,8 @@ The class ``RotatedRect`` replaces the old ``CvBox2D`` and fully compatible with
 TermCriteria
 ------------
+.. c:type:: TermCriteria
 Termination criteria for iterative algorithms ::
    class TermCriteria
@@ -634,6 +636,8 @@ However, if the object is deallocated in a different way, then the specialized m
 Mat
 ---
+.. c:type:: Mat
 OpenCV C++ n-dimensional dense array class. ::
    class CV_EXPORTS Mat

--- a/modules/gpu/doc/camera_calibration_and_3d_reconstruction.rst
+++ b/modules/gpu/doc/camera_calibration_and_3d_reconstruction.rst
@@ -3,13 +3,13 @@ Camera Calibration and 3d Reconstruction
 .. highlight:: cpp
-.. index:: gpu::StereoBM_GPU
-.. _gpu::StereoBM_GPU:
+.. index:: gpu::StereoBM_GPU
 gpu::StereoBM_GPU
 -----------------
-.. c:type:: gpu::StereoBM_GPU
+.. cpp:class:: gpu::StereoBM_GPU
 The class for computing stereo correspondence using block matching algorithm. ::
@@ -40,22 +40,24 @@ The class for computing stereo correspondence using block matching algorithm. ::
        ...
    };
+This class computes the disparity map using block matching algorithm. The class also performs pre- and post- filtering steps: sobel prefiltering (if ``PREFILTER_XSOBEL`` flag is set) and low textureness filtering (if ``averageTexThreshols`` :math:`>` 0). If ``avergeTexThreshold = 0`` low textureness filtering is disabled, otherwise disparity is set to 0 in each point ``(x, y)`` where for left image
-This class computes the disparity map using block matching algorithm. The class also performs pre- and post- filtering steps: sobel prefiltering (if PREFILTER_XSOBEL flag is set) and low textureness filtering (if averageTexThreshols
+.. math::
-:math:`>` 0). If ``avergeTexThreshold = 0`` low textureness filtering is disabled, otherwise disparity is set to 0 in each point ``(x, y)`` where for left image
+    \sum HorizontalGradiensInWindow(x, y, winSize) < (winSize \cdot winSize) \cdot avergeTexThreshold 
-:math:`\sum HorizontalGradiensInWindow(x, y, winSize) < (winSize \cdot winSize) \cdot avergeTexThreshold` i.e. input left image is low textured.
+i.e. input left image is low textured.
-.. index:: gpu::StereoBM_GPU::StereoBM_GPU
-.. _gpu::StereoBM_GPU::StereoBM_GPU:
+.. index:: gpu::StereoBM_GPU::StereoBM_GPU
 gpu::StereoBM_GPU::StereoBM_GPU
-----------------------------------_
+-----------------------------------
-.. c:function:: StereoBM_GPU::StereoBM_GPU()
+.. cpp:function:: gpu::StereoBM_GPU::StereoBM_GPU()
-.. c:function:: StereoBM_GPU::StereoBM_GPU(int preset,  int ndisparities = DEFAULT_NDISP,  int winSize = DEFAULT_WINSZ)
+.. cpp:function:: gpu::StereoBM_GPU::StereoBM_GPU(int preset, int ndisparities = DEFAULT_NDISP, int winSize = DEFAULT_WINSZ)
-    StereoBMGPU constructors.
+    ``StereoBM_GPU`` constructors.
    :param preset: Preset:
@@ -67,15 +69,15 @@ gpu::StereoBM_GPU::StereoBM_GPU
    :param winSize: Block size.
-.. index:: gpu::StereoBM_GPU::operator ()
-.. _gpu::StereoBM_GPU::operator ():
+.. index:: gpu::StereoBM_GPU::operator ()
 gpu::StereoBM_GPU::operator ()
 ----------------------------------
-.. c:function:: void StereoBM_GPU::operator() (const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity)
+.. cpp:function:: void gpu::StereoBM_GPU::operator() (const GpuMat& left, const GpuMat& right, GpuMat& disparity)
-.. c:function:: void StereoBM_GPU::operator() (const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity, const Stream\& stream)
+.. cpp:function:: void gpu::StereoBM_GPU::operator() (const GpuMat& left, const GpuMat& right, GpuMat& disparity, const Stream& stream)
    The stereo correspondence operator. Finds the disparity for the specified rectified stereo pair.
@@ -87,23 +89,23 @@ gpu::StereoBM_GPU::operator ()
    :param stream: Stream for the asynchronous version.
-.. index:: gpu::StereoBM_GPU::checkIfGpuCallReasonable
-.. _gpu::StereoBM_GPU::checkIfGpuCallReasonable:
+.. index:: gpu::StereoBM_GPU::checkIfGpuCallReasonable
 gpu::StereoBM_GPU::checkIfGpuCallReasonable
 -----------------------------------------------
-.. c:function:: bool StereoBM_GPU::checkIfGpuCallReasonable()
+.. cpp:function:: bool gpu::StereoBM_GPU::checkIfGpuCallReasonable()
    Some heuristics that tries to estmate if the current GPU will be faster then CPU in this algorithm. It queries current active device.
-.. index:: gpu::StereoBeliefPropagation
-.. _gpu::StereoBeliefPropagation:
+.. index:: gpu::StereoBeliefPropagation
 gpu::StereoBeliefPropagation
 ----------------------------
-.. c:type:: gpu::StereoBeliefPropagation
+.. cpp:class:: gpu::StereoBeliefPropagation
 The class for computing stereo correspondence using belief propagation algorithm. ::
@@ -148,34 +150,33 @@ The class for computing stereo correspondence using belief propagation algorithm
        ...
    };
+The class implements Pedro F. Felzenszwalb algorithm [Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1), October 2006.]. It can compute own data cost (using truncated linear model) or use user-provided data cost.
-The class implements Pedro F. Felzenszwalb algorithm
-felzenszwalb_bp
-. It can compute own data cost (using truncated linear model) or use user-provided data cost.
 **Please note:** ``StereoBeliefPropagation`` requires a lot of memory:
 .. math::
-    width \_ step  \cdot height  \cdot ndisp  \cdot 4  \cdot (1 + 0.25)
+    width\_step \cdot height \cdot ndisp \cdot 4 \cdot (1 + 0.25)
 for message storage and
 .. math::
-    width \_ step  \cdot height  \cdot ndisp  \cdot (1 + 0.25 + 0.0625 +  \dotsm +  \frac{1}{4^{levels}}
+    width\_step \cdot height \cdot ndisp \cdot (1 + 0.25 + 0.0625 +  \dotsm + \frac{1}{4^{levels}}
 for data cost storage. ``width_step`` is the number of bytes in a line including the padding.
 .. index:: gpu::StereoBeliefPropagation::StereoBeliefPropagation
 gpu::StereoBeliefPropagation::StereoBeliefPropagation
 ---------------------------------------------------------
-.. c:function:: StereoBeliefPropagation::StereoBeliefPropagation( int ndisp = DEFAULT_NDISP, int iters = DEFAULT_ITERS,  int levels = DEFAULT_LEVELS, int msg_type = CV_32F)
+.. cpp:function:: gpu::StereoBeliefPropagation::StereoBeliefPropagation(int ndisp = DEFAULT_NDISP, int iters = DEFAULT_ITERS, int levels = DEFAULT_LEVELS, int msg_type = CV_32F)
-.. c:function:: StereoBeliefPropagation::StereoBeliefPropagation( int ndisp, int iters, int levels,  float max_data_term, float data_weight,  float max_disc_term, float disc_single_jump,  int msg_type = CV_32F)
+.. cpp:function:: gpu::StereoBeliefPropagation::StereoBeliefPropagation(int ndisp, int iters, int levels, float max_data_term, float data_weight, float max_disc_term, float disc_single_jump, int msg_type = CV_32F)
-    StereoBeliefPropagation constructors.
+    ``StereoBeliefPropagation`` constructors.
    :param ndisp: Number of disparities.
@@ -193,70 +194,72 @@ gpu::StereoBeliefPropagation::StereoBeliefPropagation
    :param msg_type: Type for messages. Supports ``CV_16SC1`` and ``CV_32FC1``.
-``StereoBeliefPropagation`` uses truncated linear model for the data cost and discontinuity term:
+:cpp:class:`StereoBeliefPropagation` uses truncated linear model for the data cost and discontinuity term:
 .. math::
-    DataCost = data \_ weight  \cdot \min ( \lvert I_2-I_1  \rvert , max \_ data \_ term)
+    DataCost = data\_weight \cdot \min(\lvert I_2-I_1 \rvert, max\_data\_term)
 .. math::
-    DiscTerm =  \min (disc \_ single \_ jump  \cdot \lvert f_1-f_2  \rvert , max \_ disc \_ term)
+    DiscTerm =  \min(disc\_single\_jump \cdot \lvert f_1-f_2 \rvert, max\_disc\_term)
-For more details please see
+For more details please see [Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1), October 2006.].
-felzenszwalb_bp
-.
-By default ``StereoBeliefPropagation`` uses floating-point arithmetics and ``CV_32FC1`` type for messages. But also it can use fixed-point arithmetics and ``CV_16SC1`` type for messages for better perfomance. To avoid overflow in this case, the parameters must satisfy
+By default :cpp:class:`StereoBeliefPropagation` uses floating-point arithmetics and ``CV_32FC1`` type for messages. But also it can use fixed-point arithmetics and ``CV_16SC1`` type for messages for better perfomance. To avoid overflow in this case, the parameters must satisfy
 .. math::
-    10  \cdot 2^{levels-1}  \cdot max \_ data \_ term < SHRT \_ MAX
+    10 \cdot 2^{levels-1} \cdot max\_data\_term < SHRT\_MAX
 .. index:: gpu::StereoBeliefPropagation::estimateRecommendedParams
 gpu::StereoBeliefPropagation::estimateRecommendedParams
 -----------------------------------------------------------
-.. c:function:: void StereoBeliefPropagation::estimateRecommendedParams( int width, int height, int\& ndisp, int\& iters, int\& levels)
+.. cpp:function:: void gpu::StereoBeliefPropagation::estimateRecommendedParams(int width, int height, int& ndisp, int& iters, int& levels)
+    Some heuristics that tries to compute recommended parameters (``ndisp``, ``iters`` and ``levels``) for specified image size (``width`` and ``height``).
-    Some heuristics that tries to compute recommended parameters (ndisp, itersand levels) for specified image size (widthand height).
 .. index:: gpu::StereoBeliefPropagation::operator ()
 gpu::StereoBeliefPropagation::operator ()
 ---------------------------------------------
-.. c:function:: void StereoBeliefPropagation::operator()( const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity)
+.. cpp:function:: void gpu::StereoBeliefPropagation::operator()(const GpuMat& left, const GpuMat& right, GpuMat& disparity)
-.. c:function:: void StereoBeliefPropagation::operator()( const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity, Stream\& stream)
+.. cpp:function:: void gpu::StereoBeliefPropagation::operator()(const GpuMat& left, const GpuMat& right, GpuMat& disparity, Stream& stream)
    The stereo correspondence operator. Finds the disparity for the specified rectified stereo pair or data cost.
-    :param left: Left image; supports  ``CV_8UC1`` ,  ``CV_8UC3``  and  ``CV_8UC4``  types.
+    :param left: Left image; supports ``CV_8UC1``, ``CV_8UC3`` and ``CV_8UC4`` types.
    :param right: Right image with the same size and the same type as the left one.
-    :param disparity: Output disparity map. If  ``disparity``  is empty output type will be  ``CV_16SC1`` , otherwise output type will be  ``disparity.type()`` .
+    :param disparity: Output disparity map. If ``disparity`` is empty output type will be ``CV_16SC1``, otherwise output type will be ``disparity.type()``.
    :param stream: Stream for the asynchronous version.
-.. c:function:: void StereoBeliefPropagation::operator()( const GpuMat\& data, GpuMat\& disparity)
+.. cpp:function:: void StereoBeliefPropagation::operator()(const GpuMat& data, GpuMat& disparity)
-.. c:function:: void StereoBeliefPropagation::operator()( const GpuMat\& data, GpuMat\& disparity, Stream\& stream)
+.. cpp:function:: void StereoBeliefPropagation::operator()(const GpuMat& data, GpuMat& disparity, Stream& stream)
-    * **data** The user specified data cost. It must have  ``msg_type``  type and  :math:`\texttt{imgRows} \cdot \texttt{ndisp} \times \texttt{imgCols}`  size.
+    :param data: The user specified data cost. It must have ``msg_type`` type and :math:`\texttt{imgRows} \cdot \texttt{ndisp} \times \texttt{imgCols}` size.
-    * **disparity** Output disparity map. If  ``disparity``  is empty output type will be  ``CV_16SC1`` , otherwise output type will be  ``disparity.type()`` .
+    :param disparity: Output disparity map. If ``disparity`` is empty output type will be ``CV_16SC1``, otherwise output type will be ``disparity.type()``.
+    :param stream: Stream for the asynchronous version.
-    * **stream** Stream for the asynchronous version.
-.. index:: gpu::StereoConstantSpaceBP
-.. _gpu::StereoConstantSpaceBP:
+.. index:: gpu::StereoConstantSpaceBP
 gpu::StereoConstantSpaceBP
 --------------------------
-.. c:type:: gpu::StereoConstantSpaceBP
+.. cpp:class:: gpu::StereoConstantSpaceBP
 The class for computing stereo correspondence using constant space belief propagation algorithm. ::
@@ -309,19 +312,19 @@ The class for computing stereo correspondence using constant space belief propag
    };
-The class implements Q. Yang algorithm
+The class implements Q. Yang algorithm [Q. Yang, L. Wang, and N. Ahuja. A constant-space belief propagation algorithm for stereo matching. In CVPR, 2010]. ``StereoConstantSpaceBP`` supports both local minimum and global minimum data cost initialization algortihms. For more details please see the paper. By default local algorithm is used, and to enable global algorithm set ``use_local_init_data_cost`` to false.
-qx_csbp
-. ``StereoConstantSpaceBP`` supports both local minimum and global minimum data cost initialization algortihms. For more details please see the paper. By default local algorithm is used, and to enable global algorithm set ``use_local_init_data_cost`` to false.
 .. index:: gpu::StereoConstantSpaceBP::StereoConstantSpaceBP
 gpu::StereoConstantSpaceBP::StereoConstantSpaceBP
 -----------------------------------------------------
-.. c:function:: StereoConstantSpaceBP::StereoConstantSpaceBP(int ndisp = DEFAULT_NDISP,  int iters = DEFAULT_ITERS, int levels = DEFAULT_LEVELS,  int nr_plane = DEFAULT_NR_PLANE, int msg_type = CV_32F)
+.. cpp:function:: gpu::StereoConstantSpaceBP::StereoConstantSpaceBP(int ndisp = DEFAULT_NDISP, int iters = DEFAULT_ITERS, int levels = DEFAULT_LEVELS, int nr_plane = DEFAULT_NR_PLANE, int msg_type = CV_32F)
-.. c:function:: StereoConstantSpaceBP::StereoConstantSpaceBP(int ndisp, int iters,  int levels, int nr_plane,  float max_data_term, float data_weight,  float max_disc_term, float disc_single_jump,  int min_disp_th = 0, int msg_type = CV_32F)
+.. cpp:function:: gpu::StereoConstantSpaceBP::StereoConstantSpaceBP(int ndisp, int iters, int levels, int nr_plane, float max_data_term, float data_weight, float max_disc_term, float disc_single_jump, int min_disp_th = 0, int msg_type = CV_32F)
-    StereoConstantSpaceBP constructors.
+    ``StereoConstantSpaceBP`` constructors.
    :param ndisp: Number of disparities.
@@ -341,66 +344,67 @@ gpu::StereoConstantSpaceBP::StereoConstantSpaceBP
    :param min_disp_th: Minimal disparity threshold.
-    :param msg_type: Type for messages. Supports  ``CV_16SC1``  and  ``CV_32FC1`` .
+    :param msg_type: Type for messages. Supports ``CV_16SC1`` and ``CV_32FC1``.
-``StereoConstantSpaceBP`` uses truncated linear model for the data cost and discontinuity term:
+:cpp:class:`StereoConstantSpaceBP` uses truncated linear model for the data cost and discontinuity term:
 .. math::
-    DataCost = data \_ weight  \cdot \min ( \lvert I_2-I_1  \rvert , max \_ data \_ term)
+    DataCost = data\_weight \cdot \min(\lvert I_2-I_1 \rvert, max\_data\_term)
 .. math::
-    DiscTerm =  \min (disc \_ single \_ jump  \cdot \lvert f_1-f_2  \rvert , max \_ disc \_ term)
+    DiscTerm =  \min(disc\_single\_jump \cdot \lvert f_1-f_2 \rvert, max\_disc\_term)
-For more details please see
+For more details please see [Q. Yang, L. Wang, and N. Ahuja. A constant-space belief propagation algorithm for stereo matching. In CVPR, 2010].
-qx_csbp
-.
-By default ``StereoConstantSpaceBP`` uses floating-point arithmetics and ``CV_32FC1`` type for messages. But also it can use fixed-point arithmetics and ``CV_16SC1`` type for messages for better perfomance. To avoid overflow in this case, the parameters must satisfy
+By default :cpp:class:`StereoConstantSpaceBP` uses floating-point arithmetics and ``CV_32FC1`` type for messages. But also it can use fixed-point arithmetics and ``CV_16SC1`` type for messages for better perfomance. To avoid overflow in this case, the parameters must satisfy
 .. math::
-    10  \cdot 2^{levels-1}  \cdot max \_ data \_ term < SHRT \_ MAX
+    10 \cdot 2^{levels-1} \cdot max\_data\_term < SHRT\_MAX
 .. index:: gpu::StereoConstantSpaceBP::estimateRecommendedParams
 gpu::StereoConstantSpaceBP::estimateRecommendedParams
 ---------------------------------------------------------
-.. c:function:: void StereoConstantSpaceBP::estimateRecommendedParams( int width, int height,  int\& ndisp, int\& iters, int\& levels, int\& nr_plane)
+.. cpp:function:: void gpu::StereoConstantSpaceBP::estimateRecommendedParams( int width, int height, int& ndisp, int& iters, int& levels, int& nr_plane)
+    Some heuristics that tries to compute parameters (``ndisp``, ``iters``, ``levels`` and ``nr_plane``) for specified image size (``width`` and ``height``).
-    Some heuristics that tries to compute parameters (ndisp, iters, levelsand nrplane) for specified image size (widthand height).
 .. index:: gpu::StereoConstantSpaceBP::operator ()
 gpu::StereoConstantSpaceBP::operator ()
 -------------------------------------------
-.. c:function:: void StereoConstantSpaceBP::operator()( const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity)
+.. cpp:function:: void gpu::StereoConstantSpaceBP::operator()(const GpuMat& left, const GpuMat& right, GpuMat& disparity)
-.. c:function:: void StereoConstantSpaceBP::operator()( const GpuMat\& left, const GpuMat\& right,  GpuMat\& disparity, Stream\& stream)
+.. cpp:function:: void gpu::StereoConstantSpaceBP::operator()(const GpuMat& left, const GpuMat& right, GpuMat& disparity, Stream& stream)
    The stereo correspondence operator. Finds the disparity for the specified rectified stereo pair.
-    :param left: Left image; supports  ``CV_8UC1`` ,  ``CV_8UC3``  and  ``CV_8UC4``  types.
+    :param left: Left image; supports ``CV_8UC1``, ``CV_8UC3`` and ``CV_8UC4`` types.
    :param right: Right image with the same size and the same type as the left one.
-    :param disparity: Output disparity map. If  ``disparity``  is empty output type will be  ``CV_16SC1`` , otherwise output type will be  ``disparity.type()`` .
+    :param disparity: Output disparity map. If ``disparity`` is empty output type will be ``CV_16SC1``, otherwise output type will be ``disparity.type()``.
    :param stream: Stream for the asynchronous version.
-.. index:: gpu::DisparityBilateralFilter
-.. _gpu::DisparityBilateralFilter:
+.. index:: gpu::DisparityBilateralFilter
 gpu::DisparityBilateralFilter
 -----------------------------
-.. c:type:: gpu::DisparityBilateralFilter
+.. cpp:class:: gpu::DisparityBilateralFilter
 The class for disparity map refinement using joint bilateral filtering. ::
-    class CV_EXPORTS DisparityBilateralFilter
+    class DisparityBilateralFilter
    {
    public:
        enum { DEFAULT_NDISP  = 64 };
@@ -423,19 +427,19 @@ The class for disparity map refinement using joint bilateral filtering. ::
    };
-The class implements Q. Yang algorithm
+The class implements Q. Yang algorithm [Q. Yang, L. Wang, and N. Ahuja. A constant-space belief propagation algorithm for stereo matching. In CVPR, 2010].
-qx_csbp
-.
 .. index:: gpu::DisparityBilateralFilter::DisparityBilateralFilter
 gpu::DisparityBilateralFilter::DisparityBilateralFilter
 -----------------------------------------------------------
-.. c:function:: DisparityBilateralFilter::DisparityBilateralFilter( int ndisp = DEFAULT_NDISP, int radius = DEFAULT_RADIUS,  int iters = DEFAULT_ITERS)
+.. cpp:function:: gpu::DisparityBilateralFilter::DisparityBilateralFilter(int ndisp = DEFAULT_NDISP, int radius = DEFAULT_RADIUS, int iters = DEFAULT_ITERS)
-.. c:function:: DisparityBilateralFilter::DisparityBilateralFilter( int ndisp, int radius, int iters,  float edge_threshold, float max_disc_threshold,  float sigma_range)
+.. cpp:function:: gpu::DisparityBilateralFilter::DisparityBilateralFilter(int ndisp, int radius, int iters, float edge_threshold, float max_disc_threshold, float sigma_range)
-    DisparityBilateralFilter constructors.
+    ``DisparityBilateralFilter`` constructors.
    :param ndisp: Number of disparities.
@@ -449,13 +453,15 @@ gpu::DisparityBilateralFilter::DisparityBilateralFilter
    :param sigma_range: Filter range.
 .. index:: gpu::DisparityBilateralFilter::operator ()
 gpu::DisparityBilateralFilter::operator ()
 ----------------------------------------------
-.. c:function:: void DisparityBilateralFilter::operator()( const GpuMat\& disparity, const GpuMat\& image, GpuMat\& dst)
+.. cpp:function:: void gpu::DisparityBilateralFilter::operator()(const GpuMat& disparity, const GpuMat& image, GpuMat& dst)
-.. c:function:: void DisparityBilateralFilter::operator()( const GpuMat\& disparity, const GpuMat\& image, GpuMat\& dst,  Stream\& stream)
+.. cpp:function:: void gpu::DisparityBilateralFilter::operator()(const GpuMat& disparity, const GpuMat& image, GpuMat& dst, Stream& stream)
    Refines disparity map using joint bilateral filtering.
@@ -463,17 +469,19 @@ gpu::DisparityBilateralFilter::operator ()
    :param image: Input image; supports ``CV_8UC1`` and ``CV_8UC3`` types.
-    :param dst: Destination disparity map; will have the same size and type as  ``disparity`` .
+    :param dst: Destination disparity map; will have the same size and type as ``disparity``.
    :param stream: Stream for the asynchronous version.
 .. index:: gpu::drawColorDisp
 gpu::drawColorDisp
 ----------------------
-.. c:function:: void gpu::drawColorDisp(const GpuMat\& src_disp, GpuMat\& dst_disp, int ndisp)
+.. cpp:function:: void gpu::drawColorDisp(const GpuMat& src_disp, GpuMat& dst_disp, int ndisp)
-.. c:function:: void gpu::drawColorDisp(const GpuMat\& src_disp, GpuMat\& dst_disp, int ndisp,  const Stream\& stream)
+.. cpp:function:: void gpu::drawColorDisp(const GpuMat& src_disp, GpuMat& dst_disp, int ndisp, const Stream& stream)
    Does coloring of disparity image.
@@ -485,37 +493,38 @@ gpu::drawColorDisp
    :param stream: Stream for the asynchronous version.
-This function converts
+This function converts :math:`[0..ndisp)` interval to :math:`[0..240, 1, 1]` in ``HSV`` color space, than convert ``HSV`` color space to ``RGB``.
-:math:`[0..ndisp)` interval to
-:math:`[0..240, 1, 1]` in ``HSV`` color space, than convert ``HSV`` color space to ``RGB`` .
 .. index:: gpu::reprojectImageTo3D
 gpu::reprojectImageTo3D
 ---------------------------
-.. c:function:: void gpu::reprojectImageTo3D(const GpuMat\& disp, GpuMat\& xyzw,  const Mat\& Q)
+.. cpp:function:: void gpu::reprojectImageTo3D(const GpuMat& disp, GpuMat& xyzw, const Mat& Q)
-.. c:function:: void gpu::reprojectImageTo3D(const GpuMat\& disp, GpuMat\& xyzw,  const Mat\& Q, const Stream\& stream)
+.. cpp:function:: void gpu::reprojectImageTo3D(const GpuMat& disp, GpuMat& xyzw, const Mat& Q, const Stream& stream)
    Reprojects disparity image to 3D space.
    :param disp: Input disparity image; supports ``CV_8U`` and ``CV_16S`` types.
-    :param xyzw: Output 4-channel floating-point image of the same size as  ``disp`` . Each element of  ``xyzw(x,y)``  will contain the 3D coordinates  ``(x,y,z,1)``  of the point  ``(x,y)`` , computed from the disparity map.
+    :param xyzw: Output 4-channel floating-point image of the same size as ``disp``. Each element of ``xyzw(x,y)`` will contain the 3D coordinates ``(x,y,z,1)`` of the point ``(x,y)``, computed from the disparity map.
-    :param Q: :math:`4 \times 4`  perspective transformation matrix that can be obtained via  :ref:`StereoRectify` .
+    :param Q: :math:`4 \times 4` perspective transformation matrix that can be obtained via :c:func:`stereoRectify`.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`reprojectImageTo3D`.
-:func:`reprojectImageTo3D` .
 .. index:: gpu::solvePnPRansac
 gpu::solvePnPRansac
 -------------------
-.. c:function:: void gpu::solvePnPRansac(const Mat& object, const Mat& image, const Mat& camera_mat, const Mat& dist_coef, Mat& rvec, Mat& tvec, bool use_extrinsic_guess=false, int num_iters=100, float max_dist=8.0, int min_inlier_count=100, vector<int>* inliers=NULL)
+.. cpp:function:: void gpu::solvePnPRansac(const Mat& object, const Mat& image, const Mat& camera_mat, const Mat& dist_coef, Mat& rvec, Mat& tvec, bool use_extrinsic_guess=false, int num_iters=100, float max_dist=8.0, int min_inlier_count=100, vector<int>* inliers=NULL)
    Finds the object pose from the 3D-2D point correspondences.

--- a/modules/gpu/doc/data_structures.rst
+++ b/modules/gpu/doc/data_structures.rst
@@ -3,13 +3,13 @@ Data Structures
 .. highlight:: cpp
-.. index:: gpu::DevMem2D\_
-.. _gpu::DevMem2D_:
+.. index:: gpu::DevMem2D_
 gpu::DevMem2D\_ 
 ---------------
-.. c:type:: gpu::DevMem2D\_
+.. cpp:class:: gpu::DevMem2D_
 This is a simple lightweight class that encapsulate pitched memory on GPU. It is intended to pass to nvcc-compiled code, i.e. CUDA kernels. So it is used internally by OpenCV and by users writes own device code. Its members can be called both from host and from device code. ::
@@ -36,16 +36,19 @@ This is a simple lightweight class that encapsulate pitched memory on GPU. It is
        __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;
    };
+    typedef DevMem2D_<unsigned char> DevMem2D;
+    typedef DevMem2D_<float> DevMem2Df;
+    typedef DevMem2D_<int> DevMem2Di;
-.. index:: gpu::PtrStep\_
-.. gpu::PtrStep\_:
+.. index:: gpu::PtrStep_
 gpu::PtrStep\_
 --------------
-.. c:type:: gpu::PtrStep\_
+.. cpp:class:: gpu::PtrStep_
-This is structure is similar to DevMem2D\_ but contains only pointer and row step. Width and height fields are excluded due to performance reasons. The structure is for internal use or for users who write own device code. ::
+This is structure is similar to :cpp:class:`gpu::DevMem2D_` but contains only pointer and row step. Width and height fields are excluded due to performance reasons. The structure is for internal use or for users who write own device code. ::
    template<typename T> struct PtrStep_
    {
@@ -63,16 +66,19 @@ This is structure is similar to DevMem2D\_ but contains only pointer and row ste
        __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;
    };
+    typedef PtrStep_<unsigned char> PtrStep;
+    typedef PtrStep_<float> PtrStepf;
+    typedef PtrStep_<int> PtrStepi;
-.. index:: gpu::PtrElemStrp\_
-.. gpu::PtrElemStrp\_:
-gpu::PtrElemStrp\_
+.. index:: gpu::PtrElemStep_
+gpu::PtrElemStep\_
 ------------------
-.. c:type:: gpu::PtrElemStrp\_
+.. cpp:class:: gpu::PtrElemStep_
-This is structure is similar to DevMem2D_but contains only pointer and row step in elements. Width and height fields are excluded due to performance reasons. This class is can only be constructed if sizeof(T) is a multiple of 256. The structure is for internal use or for users who write own device code. ::
+This is structure is similar to :cpp:class:`gpu::DevMem2D_` but contains only pointer and row step in elements. Width and height fields are excluded due to performance reasons. This class is can only be constructed if ``sizeof(T)`` is a multiple of 256. The structure is for internal use or for users who write own device code. ::
    template<typename T> struct PtrElemStep_ : public PtrStep_<T>
    {
@@ -81,23 +87,23 @@ This is structure is similar to DevMem2D_but contains only pointer and row step
        __CV_GPU_HOST_DEVICE__ const T* ptr(int y = 0) const;
    };
+    typedef PtrElemStep_<unsigned char> PtrElemStep;
+    typedef PtrElemStep_<float> PtrElemStepf;
+    typedef PtrElemStep_<int> PtrElemStepi;
 .. index:: gpu::GpuMat
 gpu::GpuMat
 -----------
-.. c:type:: gpu::GpuMat
+.. cpp:class:: gpu::GpuMat
-The base storage class for GPU memory with reference counting. Its interface is almost
+The base storage class for GPU memory with reference counting. Its interface is almost :c:type:`Mat` interface with some limitations, so using it won't be a problem. The limitations are no arbitrary dimensions support (only 2D), no functions that returns references to its data (because references on GPU are not valid for CPU), no expression templates technique support. Because of last limitation please take care with overloaded matrix operators - they cause memory allocations. The ``GpuMat`` class is convertible to :cpp:class:`gpu::DevMem2D_` and :cpp:class:`gpu::PtrStep_` so it can be passed to directly to kernel.
-:func:`Mat` interface with some limitations, so using it won't be a problem. The limitations are no arbitrary dimensions support (only 2D), no functions that returns references to its data (because references on GPU are not valid for CPU), no expression templates technique support. Because of last limitation please take care with overloaded matrix operators - they cause memory allocations. The GpuMat class is convertible to
-and
-so it can be passed to directly to kernel.
-**Please note:**
+**Please note:** In contrast with :c:type:`Mat`, in most cases ``GpuMat::isContinuous() == false`` , i.e. rows are aligned to size depending on hardware. Also single row ``GpuMat`` is always a continuous matrix. ::
-In contrast with
-:func:`Mat` , In most cases ``GpuMat::isContinuous() == false`` , i.e. rows are aligned to size depending on hardware. Also single row GpuMat is always a continuous matrix. ::
-    class CV_EXPORTS GpuMat
+    class GpuMat
    {
    public:
        //! default constructor
@@ -129,20 +135,19 @@ In contrast with
    };
-**Please note:**
+**Please note:** Is it a bad practice to leave static or global ``GpuMat`` variables allocated, i.e. to rely on its destructor. That is because destruction order of such variables and CUDA context is undefined and GPU memory release function returns error if CUDA context has been destroyed before.
-Is it a bad practice to leave static or global GpuMat variables allocated, i.e. to rely on its destructor. That is because destruction order of such variables and CUDA context is undefined and GPU memory release function returns error if CUDA context has been destroyed before.
+See also: :c:type:`Mat`.
-See also:
-:func:`Mat`
 .. index:: gpu::CudaMem
 gpu::CudaMem
 ------------
-.. c:type:: gpu::CudaMem
+.. cpp:class:: gpu::CudaMem
-This is a class with reference counting that wraps special memory type allocation functions from CUDA. Its interface is also
+This is a class with reference counting that wraps special memory type allocation functions from CUDA. Its interface is also :c:type:`Mat`-like but with additional memory type parameter:
-:func:`Mat` -like but with additional memory type parameter:
 * ``ALLOC_PAGE_LOCKED``     Set page locked memory type, used commonly for fast and asynchronous upload/download data from/to GPU.
@@ -150,9 +155,9 @@ This is a class with reference counting that wraps special memory type allocatio
 * ``ALLOC_WRITE_COMBINED``  Sets write combined buffer which is not cached by CPU. Such buffers are used to supply GPU with data when GPU only reads it. The advantage is better CPU cache utilization.
-Please note that allocation size of such memory types is usually limited. For more details please see "CUDA 2.2 Pinned Memory APIs" document or "CUDA_C Programming Guide". ::
+**Please note:** Allocation size of such memory types is usually limited. For more details please see "CUDA 2.2 Pinned Memory APIs" document or "CUDA_C Programming Guide". ::
-    class CV_EXPORTS CudaMem
+    class CudaMem
    {
    public:
        enum  { ALLOC_PAGE_LOCKED = 1, ALLOC_ZEROCOPY = 2,
@@ -182,53 +187,54 @@ Please note that allocation size of such memory types is usually limited. For mo
    };
 .. index:: gpu::CudaMem::createMatHeader
 gpu::CudaMem::createMatHeader
 ---------------------------------
-.. cpp:function:: Mat CudaMem::createMatHeader() const
+.. cpp:function:: Mat gpu::CudaMem::createMatHeader() const
+.. cpp:function:: gpu::CudaMem::operator Mat() const
+    Creates header without reference counting to :cpp:class:`gpu::CudaMem` data.
-.. cpp:function:: CudaMem::operator Mat() const
-    Creates header without reference counting to CudaMem data.
 .. index:: gpu::CudaMem::createGpuMatHeader
 gpu::CudaMem::createGpuMatHeader
 ------------------------------------
-:func:`gpu::GpuMat` ``_``
-.. c:function:: GpuMat CudaMem::createGpuMatHeader() const
-.. c:function:: CudaMem::operator GpuMat() const
+.. cpp:function:: GpuMat gpu::CudaMem::createGpuMatHeader() const
+.. cpp:function:: gpu::CudaMem::operator GpuMat() const
+    Maps CPU memory to GPU address space and creates :cpp:class:`gpu::GpuMat` header without reference counting for it. This can be done only if memory was allocated with ``ALLOC_ZEROCOPY`` flag and if it is supported by hardware (laptops often share video and CPU memory, so address spaces can be mapped, and that eliminates extra copy).
-    Maps CPU memory to GPU address space and creates header without reference counting for it. This can be done only if memory was allocated with ALLOCZEROCOPYflag and if it is supported by hardware (laptops often share video and CPU memory, so address spaces can be mapped, and that eliminates extra copy).
 .. index:: gpu::CudaMem::canMapHostMemory
 gpu::CudaMem::canMapHostMemory
 ----------------------------------
-.. c:function:: static bool CudaMem::canMapHostMemory()
+.. cpp:function:: static bool gpu::CudaMem::canMapHostMemory()
+    Returns true if the current hardware supports address space mapping and ``ALLOC_ZEROCOPY`` memory allocation.
-    Returns true if the current hardware supports address space mapping and ALLOCZEROCOPYmemory allocation
 .. index:: gpu::Stream
 gpu::Stream
 -----------
-.. c:type:: gpu::Stream
+.. cpp:class:: gpu::Stream
-This class encapsulated queue of the asynchronous calls. Some functions have overloads with additional
+This class encapsulated queue of the asynchronous calls. Some functions have overloads with additional ``gpu::Stream`` parameter. The overloads do initialization work (allocate output buffers, upload constants, etc.), start GPU kernel and return before results are ready. A check if all operation are complete can be performed via :cpp:func:`gpu::Stream::queryIfComplete`. Asynchronous upload/download have to be performed from/to page-locked buffers, i.e. using :cpp:class:`gpu::CudaMem` or :c:type:`Mat` header that points to a region of :cpp:class:`gpu::CudaMem`.
-:func:`gpu::Stream` parameter. The overloads do initialization work (allocate output buffers, upload constants, etc.), start GPU kernel and return before results are ready. A check if all operation are complete can be performed via
-:func:`gpu::Stream::queryIfComplete()` .  Asynchronous upload/download have to be performed from/to page-locked buffers, i.e. using
-:func:`gpu::CudaMem` or
-:func:`Mat` header that points to a region of
-:func:`gpu::CudaMem` .
-**Please note the limitation**
+**Please note the limitation**: currently it is not guaranteed that all will work properly if one operation will be enqueued twice with different data. Some functions use constant GPU memory and next call may update the memory before previous has been finished. But calling asynchronously different operations is safe because each operation has own constant buffer. Memory copy/upload/download/set operations to buffers hold by user are also safe. ::
-: currently it is not guaranteed that all will work properly if one operation will be enqueued twice with different data. Some functions use constant GPU memory and next call may update the memory before previous has been finished. But calling asynchronously different operations is safe because each operation has own constant buffer. Memory copy/upload/download/set operations to buffers hold by user are also safe. ::
-    class CV_EXPORTS Stream
+    class Stream
    {
    public:
        Stream();
@@ -263,44 +269,47 @@ This class encapsulated queue of the asynchronous calls. Some functions have ove
    };
 .. index:: gpu::Stream::queryIfComplete
 gpu::Stream::queryIfComplete
 --------------------------------
-.. c:function:: bool Stream::queryIfComplete()
+.. cpp:function:: bool gpu::Stream::queryIfComplete()
    Returns true if the current stream queue is finished, otherwise false.
 .. index:: gpu::Stream::waitForCompletion
 gpu::Stream::waitForCompletion
 ----------------------------------
-.. c:function:: void Stream::waitForCompletion()
+.. cpp:function:: void gpu::Stream::waitForCompletion()
    Blocks until all operations in the stream are complete.
-.. index:: gpu::StreamAccessor
-.. _gpu::StreamAccessor:
+.. index:: gpu::StreamAccessor
 gpu::StreamAccessor
 -------------------
 .. c:type:: gpu::StreamAccessor
-This class provides possibility to get ``cudaStream_t`` from
+This class provides possibility to get ``cudaStream_t`` from :cpp:class:`gpu::Stream`. This class is declared in ``stream_accessor.hpp`` because that is only public header that depend on Cuda Runtime API. Including it will bring the dependency to your code. ::
-:func:`gpu::Stream` . This class is declared in ``stream_accessor.hpp`` because that is only public header that depend on Cuda Runtime API. Including it will bring the dependency to your code. ::
    struct StreamAccessor
    {
-            CV_EXPORTS static cudaStream_t getStream(const Stream& stream);
+        static cudaStream_t getStream(const Stream& stream);
    };
 .. index:: gpu::createContinuous
 gpu::createContinuous
 -------------------------
-.. c:function:: void createContinuous(int rows, int cols, int type, GpuMat\& m)
+.. cpp:function:: void gpu::createContinuous(int rows, int cols, int type, GpuMat& m)
    Creates continuous matrix in GPU memory.
@@ -310,23 +319,25 @@ gpu::createContinuous
    :param type: Type of the matrix.
-    :param m: Destination matrix. Will be only reshaped if it has proper type and area ( ``rows``   :math:`\times`   ``cols`` ).
+    :param m: Destination matrix. Will be only reshaped if it has proper type and area (``rows`` :math:`\times` ``cols``).
 Also the following wrappers are available:
-.. c:function:: GpuMat createContinuous(int rows, int cols, int type)
+.. cpp:function:: GpuMat gpu::createContinuous(int rows, int cols, int type)
-.. c:function:: void createContinuous(Size size, int type, GpuMat\& m)
+.. cpp:function:: void gpu::createContinuous(Size size, int type, GpuMat& m)
-.. c:function:: GpuMat createContinuous(Size size, int type)
+.. cpp:function:: GpuMat gpu::createContinuous(Size size, int type)
 Matrix is called continuous if its elements are stored continuously, i.e. wuthout gaps in the end of each row.
 .. index:: gpu::ensureSizeIsEnough
 gpu::ensureSizeIsEnough
 ---------------------------
-.. c:function:: void ensureSizeIsEnough(int rows, int cols, int type, GpuMat\& m)
+.. cpp:function:: void gpu::ensureSizeIsEnough(int rows, int cols, int type, GpuMat& m)
    Ensures that size of matrix is big enough and matrix has proper type. The function doesn't reallocate memory if the matrix has proper attributes already.
@@ -340,5 +351,4 @@ gpu::ensureSizeIsEnough
 Also the following wrapper is available:
-.. c:function:: void ensureSizeIsEnough(Size size, int type, GpuMat\& m)
+.. cpp:function:: void gpu::ensureSizeIsEnough(Size size, int type, GpuMat& m)
--- a/modules/gpu/doc/feature_detection_and_description.rst
+++ b/modules/gpu/doc/feature_detection_and_description.rst
@@ -3,13 +3,13 @@ Feature Detection and Description
 .. highlight:: cpp
-.. index:: gpu::SURF_GPU
-.. gpu::SURF_GPU:
+.. index:: gpu::SURF_GPU
 gpu::SURF_GPU
 -------------
-.. c:type:: gpu::SURF_GPU
+.. cpp:class:: gpu::SURF_GPU
 Class for extracting Speeded Up Robust Features from an image. ::
@@ -20,7 +20,8 @@ Class for extracting Speeded Up Robust Features from an image. ::
        SURF_GPU();
        //! the full constructor taking all the necessary parameters
        explicit SURF_GPU(double _hessianThreshold, int _nOctaves=4,
-             int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f);
+             int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f, 
+             bool _upright = false);
        //! returns the descriptor size in float's (64 or 128)
        int descriptorSize() const;
@@ -61,6 +62,8 @@ Class for extracting Speeded Up Robust Features from an image. ::
        //! max keypoints = keypointsRatio * img.size().area()
        float keypointsRatio;
+        bool upright;
        GpuMat sum, mask1, maskSum, intBuffer;
        GpuMat det, trace;
@@ -70,25 +73,21 @@ Class for extracting Speeded Up Robust Features from an image. ::
        GpuMat keypointsBuffer;
    };
 The class ``SURF_GPU`` implements Speeded Up Robust Features descriptor. There is fast multi-scale Hessian keypoint detector that can be used to find the keypoints (which is the default option), but the descriptors can be also computed for the user-specified keypoints. Supports only 8 bit grayscale images.
-The class ``SURF_GPU`` can store results to GPU and CPU memory and provides functions to convert results between CPU and GPU version ( ``uploadKeypoints``,``downloadKeypoints``,``downloadDescriptors`` ). CPU results has the same format as ``SURF``
+The class ``SURF_GPU`` can store results to GPU and CPU memory and provides functions to convert results between CPU and GPU version (``uploadKeypoints``, ``downloadKeypoints``, ``downloadDescriptors``). CPU results has the same format as :c:type:`SURF` results. GPU results are stored to :cpp:class:`gpu::GpuMat`. ``keypoints`` matrix is one row matrix with ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian``. ``descriptors`` matrix is ``nFeatures`` :math:`\times` ``descriptorSize`` matrix with ``CV_32FC1`` type.
-results. GPU results are stored to ``GpuMat`` . ``keypoints`` matrix is one row matrix with ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian`` . ``descriptors`` matrix is
-:math:`\texttt{nFeatures} \times \texttt{descriptorSize}` matrix with ``CV_32FC1`` type.
 The class ``SURF_GPU`` uses some buffers and provides access to it. All buffers can be safely released between function calls.
-See also:
+See also: :c:type:`SURF`.
-.
-.. index:: gpu::BruteForceMatcher_GPU
-.. gpu::BruteForceMatcher_GPU:
+.. index:: gpu::BruteForceMatcher_GPU
 gpu::BruteForceMatcher_GPU
 --------------------------
-.. c:type:: gpu::BruteForceMatcher_GPU
+.. cpp:class:: gpu::BruteForceMatcher_GPU
 Brute-force descriptor matcher. For each descriptor in the first set, this matcher finds the closest descriptor in the second set by trying each one. This descriptor matcher supports masking permissible matches between descriptor sets. ::
@@ -174,182 +173,171 @@ Brute-force descriptor matcher. For each descriptor in the first set, this match
        std::vector<GpuMat> trainDescCollection;
    };
+The class ``BruteForceMatcher_GPU`` has the similar interface to class :c:type:`DescriptorMatcher`. It has two groups of match methods: for matching descriptors of one image with other image or with image set. Also all functions have alternative: save results to GPU memory or to CPU memory.
-The class ``BruteForceMatcher_GPU`` has the similar interface to class. It has two groups of match methods: for matching descriptors of one image with other image or with image set. Also all functions have alternative: save results to GPU memory or to CPU memory. ``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only ``L1<float>`` and ``L2<float>`` distance types.
+``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only ``L1<float>`` and ``L2<float>`` distance types.
-.. index:: gpu::BruteForceMatcher_GPU::match
+See also: :c:type:`DescriptorMatcher`, :c:type:`BruteForceMatcher`.
-.. gpu::BruteForceMatcher_GPU::match:
+.. index:: gpu::BruteForceMatcher_GPU::match
 gpu::BruteForceMatcher_GPU::match
 -------------------------------------
-.. c:function:: void match(const GpuMat&queryDescs,  const GpuMat&trainDescs,  std::vector<DMatch>&matches,  const GpuMat&mask = GpuMat())
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::match(const GpuMat& queryDescs, const GpuMat& trainDescs, vector<DMatch>& matches, const GpuMat& mask = GpuMat())
-.. c:function:: void match(const GpuMat&queryDescs,  std::vector<DMatch>&matches,  const std::vector<GpuMat>&masks = std::vector<GpuMat>())
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::match(const GpuMat& queryDescs, vector<DMatch>& matches, const vector<GpuMat>& masks = vector<GpuMat>())
    Finds the best match for each descriptor from a query set with train descriptors.
-See also:
+See also: :c:func:`DescriptorMatcher::match`.
-:func:`DescriptorMatcher::match` .
-.. index:: gpu::BruteForceMatcher_GPU::matchSingle
-.. gpu::BruteForceMatcher_GPU::matchSingle:
+.. index:: gpu::BruteForceMatcher_GPU::matchSingle
 gpu::BruteForceMatcher_GPU::matchSingle
 -------------------------------------------
-.. c:function:: void matchSingle(const GpuMat&queryDescs,  const GpuMat&trainDescs,  GpuMat&trainIdx,  GpuMat&distance,  const GpuMat&mask = GpuMat())
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::matchSingle(const GpuMat& queryDescs, const GpuMat& trainDescs, GpuMat& trainIdx, GpuMat& distance, const GpuMat& mask = GpuMat())
    Finds the best match for each query descriptor. Results will be stored to GPU memory.
-    {Query set of descriptors.}
+    :param queryDescs: Query set of descriptors.
-    {Train set of descriptors. This will not be added to train descriptors collection stored in class object.}
-    {One row ``CV_32SC1``     matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``mask``     it will contain -1.}
+    :param trainDescs: Train set of descriptors. This will not be added to train descriptors collection stored in class object.
-    {One row ``CV_32FC1``     matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``mask``     it will contain ``FLT_MAX``     .}
+    :param trainIdx: One row ``CV_32SC1`` matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``mask`` it will contain -1.
+    :param distance: One row ``CV_32FC1`` matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``mask`` it will contain ``FLT_MAX``.
    :param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
-.. index:: gpu::BruteForceMatcher_GPU::matchCollection
-.. gpu::BruteForceMatcher_GPU::matchCollection:
+.. index:: gpu::BruteForceMatcher_GPU::matchCollection
 gpu::BruteForceMatcher_GPU::matchCollection
 -----------------------------------------------
-.. c:function:: void matchCollection(const GpuMat&queryDescs,  const GpuMat&trainCollection,  GpuMat&trainIdx,  GpuMat&imgIdx,  GpuMat&distance,  const GpuMat&maskCollection)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::matchCollection(const GpuMat& queryDescs, const GpuMat& trainCollection, GpuMat& trainIdx, GpuMat& imgIdx, GpuMat& distance, const GpuMat& maskCollection)
    Find the best match for each query descriptor from train collection. Results will be stored to GPU memory.
-    {Query set of descriptors.}
+    :param queryDescs: Query set of descriptors.
-    { ``GpuMat``     containing train collection. It can be obtained from train descriptors collection that was set using ``add``     method by
-    . Or it can contain user defined collection. It must be one row matrix, each element is a ``DevMem2D``     that points to one train descriptors matrix.}
-    {One row ``CV_32SC1``     matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``maskCollection``     it will contain -1.}
-    {One row ``CV_32SC1``     matrix. Will contain image train index for each query. If some query descriptors are masked out in ``maskCollection``     it will contain -1.}
-    {One row ``CV_32FC1``     matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``maskCollection``     it will contain ``FLT_MAX``     .}
-    :param maskCollection: ``GpuMat``  containing set of masks. It can be obtained from  ``std::vector<GpuMat>``  by  . Or it can contain user defined mask set. It must be empty matrix or one row matrix, each element is a  ``PtrStep``  that points to one mask.
+    :param trainCollection: :cpp:class:`gpu::GpuMat` containing train collection. It can be obtained from train descriptors collection that was set using ``add`` method by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it can contain user defined collection. It must be one row matrix, each element is a :cpp:class:`gpu::DevMem2D_` that points to one train descriptors matrix.
-.. index:: gpu::BruteForceMatcher_GPU::makeGpuCollection
+    :param trainIdx: One row ``CV_32SC1`` matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``maskCollection`` it will contain -1.
+    :param imgIdx: One row ``CV_32SC1`` matrix. Will contain image train index for each query. If some query descriptors are masked out in ``maskCollection`` it will contain -1.
+    :param distance: One row ``CV_32FC1`` matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``maskCollection`` it will contain ``FLT_MAX``.
+    :param maskCollection: :cpp:class:`gpu::GpuMat` containing set of masks. It can be obtained from ``vector<GpuMat>`` by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it can contain user defined mask set. It must be empty matrix or one row matrix, each element is a :cpp:class:`gpu::PtrStep_` that points to one mask.
-.. gpu::BruteForceMatcher_GPU::makeGpuCollection:
+.. index:: gpu::BruteForceMatcher_GPU::makeGpuCollection
 gpu::BruteForceMatcher_GPU::makeGpuCollection
 -------------------------------------------------
-.. c:function:: void makeGpuCollection(GpuMat&trainCollection,  GpuMat&maskCollection,  const vector<GpuMat>&masks = std::vector<GpuMat>())
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::makeGpuCollection(GpuMat& trainCollection, GpuMat& maskCollection, const vector<GpuMat>& masks = vector<GpuMat>())
+    Makes gpu collection of train descriptors and masks in suitable format for :cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` function.
-    Makes gpu collection of train descriptors and masks in suitable format for function.
-.. index:: gpu::BruteForceMatcher_GPU::matchDownload
-.. gpu::BruteForceMatcher_GPU::matchDownload:
+.. index:: gpu::BruteForceMatcher_GPU::matchDownload
 gpu::BruteForceMatcher_GPU::matchDownload
 ---------------------------------------------
-.. c:function:: void matchDownload(const GpuMat&trainIdx,  const GpuMat&distance,  std::vector<DMatch>&matches)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::matchDownload(const GpuMat& trainIdx, const GpuMat& distance, vector<DMatch>& matches)
-.. c:function:: void matchDownload(const GpuMat&trainIdx,  GpuMat&imgIdx,  const GpuMat&distance,  std::vector<DMatch>&matches)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::matchDownload(const GpuMat& trainIdx, GpuMat&imgIdx, const GpuMat& distance, vector<DMatch>& matches)
-    Downloads trainIdx, imgIdxand distancematrices obtained via or to CPU vector with .
+    Downloads ``trainIdx``, ``imgIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::matchSingle` or :cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` to CPU vector with :c:type:`DMatch`.
-.. index:: gpu::BruteForceMatcher_GPU::knnMatch
-.. gpu::BruteForceMatcher_GPU::knnMatch:
+.. index:: gpu::BruteForceMatcher_GPU::knnMatch
 gpu::BruteForceMatcher_GPU::knnMatch
 ----------------------------------------
-.. c:function:: void knnMatch(const GpuMat&queryDescs,  const GpuMat&trainDescs,  std::vector< std::vector<DMatch> >&matches,  int k,  const GpuMat&mask = GpuMat(),  bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::knnMatch(const GpuMat& queryDescs, const GpuMat& trainDescs, vector< vector<DMatch> >& matches, int k, const GpuMat& mask = GpuMat(), bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::knnMatch(const GpuMat& queryDescs, vector< vector<DMatch> >& matches, int k, const vector<GpuMat>& masks = vector<GpuMat>(), bool compactResult = false)
    Finds the k best matches for each descriptor from a query set with train descriptors. Found k (or less if not possible) matches are returned in distance increasing order.
-.. c:function:: void knnMatch(const GpuMat&queryDescs,  std::vector< std::vector<DMatch> >&matches,  int k,  const std::vector<GpuMat>&masks = std::vector<GpuMat>(),  bool compactResult = false )
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::knnMatch(const GpuMat& queryDescs, const GpuMat& trainDescs, GpuMat& trainIdx, GpuMat& distance, GpuMat& allDist, int k, const GpuMat& mask = GpuMat())
-See also:
+    Finds the k best matches for each descriptor from a query set with train descriptors. Found k (or less if not possible) matches are returned in distance increasing order. Results will be stored to GPU memory.
-:func:`DescriptorMatcher::knnMatch` .
-.. index:: gpu::BruteForceMatcher_GPU::knnMatch
+    :param queryDescs: Query set of descriptors.
-.. gpu::BruteForceMatcher_GPU::knnMatch:
+    :param trainDescs; Train set of descriptors. This will not be added to train descriptors collection stored in class object.
-gpu::BruteForceMatcher_GPU::knnMatch
+    :param trainIdx: Matrix with ``nQueries`` :math:`\times` ``k`` size and ``CV_32SC1`` type. ``trainIdx.at<int>(queryIdx, i)`` will contain index of the i'th best trains. If some query descriptors are masked out in ``mask`` it will contain -1.
----------------------------------------
-.. c:function:: void knnMatch(const GpuMat&queryDescs,  const GpuMat&trainDescs,  GpuMat&trainIdx,  GpuMat&distance,  GpuMat&allDist,  int k,  const GpuMat&mask = GpuMat())
-    Finds the k best matches for each descriptor from a query set with train descriptors. Found k (or less if not possible) matches are returned in distance increasing order. Results will be stored to GPU memory.
+    :param distance: Matrix with ``nQuery`` :math:`\times` ``k`` and ``CV_32FC1`` type. Will contain distance for each query and the i'th best trains. If some query descriptors are masked out in ``mask`` it will contain ``FLT_MAX``.
-    {Query set of descriptors.}
+    :param allDist: Buffer to store all distances between query descriptors and train descriptors. It will have ``nQuery`` :math:`\times` ``nTrain`` size and ``CV_32FC1`` type. ``allDist.at<float>(queryIdx, trainIdx)`` will contain ``FLT_MAX``, if ``trainIdx`` is one from k best, otherwise it will contain distance between ``queryIdx`` and ``trainIdx`` descriptors.
-    {Train set of descriptors. This will not be added to train descriptors collection stored in class object.}
-    {Matrix with
-    :math:`\texttt{nQueries} \times \texttt{k}`     size and ``CV_32SC1``     type. ``trainIdx.at<int>(queryIdx, i)``     will contain index of the i'th best trains. If some query descriptors are masked out in ``mask``     it will contain -1.}
-    {Matrix with
-    :math:`\texttt{nQuery} \times \texttt{k}`     and ``CV_32FC1``     type. Will contain distance for each query and the i'th best trains. If some query descriptors are masked out in ``mask``     it will contain ``FLT_MAX``     .}
-    {Buffer to store all distances between query descriptors and train descriptors. It will have
-    :math:`\texttt{nQuery} \times \texttt{nTrain}`     size and ``CV_32FC1``     type. ``allDist.at<float>(queryIdx, trainIdx)``     will contain ``FLT_MAX``     , if ``trainIdx``     is one from k best, otherwise it will contain distance between ``queryIdx``     and ``trainIdx``     descriptors.}
    :param k: Number of the best matches will be found per each query descriptor (or less if it's not possible).
    :param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
-.. index:: gpu::BruteForceMatcher_GPU::knnMatchDownload
+See also: :c:func:`DescriptorMatcher::knnMatch`.
-.. gpu::BruteForceMatcher_GPU::knnMatchDownload:
+.. index:: gpu::BruteForceMatcher_GPU::knnMatchDownload
 gpu::BruteForceMatcher_GPU::knnMatchDownload
 ------------------------------------------------
-.. c:function:: void knnMatchDownload(const GpuMat&trainIdx,  const GpuMat&distance,  std::vector< std::vector<DMatch> >&matches,  bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::knnMatchDownload(const GpuMat& trainIdx, const GpuMat& distance, vector< vector<DMatch> >& matches, bool compactResult = false)
-    Downloads trainIdxand distancematrices obtained via to CPU vector with . If compactResultis true matchesvector will not contain matches for fully masked out query descriptors.
+    Downloads ``trainIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::knnMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
-.. index:: gpu::BruteForceMatcher_GPU::radiusMatch
-.. gpu::BruteForceMatcher_GPU::radiusMatch:
+.. index:: gpu::BruteForceMatcher_GPU::radiusMatch
 gpu::BruteForceMatcher_GPU::radiusMatch
 -------------------------------------------
-.. c:function:: void radiusMatch(const GpuMat&queryDescs,  const GpuMat&trainDescs,  std::vector< std::vector<DMatch> >&matches,  float maxDistance,  const GpuMat&mask = GpuMat(),  bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatch(const GpuMat& queryDescs, const GpuMat& trainDescs, vector< vector<DMatch> >& matches, float maxDistance, const GpuMat& mask = GpuMat(), bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatch(const GpuMat& queryDescs, vector< vector<DMatch> >& matches, float maxDistance, const vector<GpuMat>& masks = vector<GpuMat>(), bool compactResult = false)
    Finds the best matches for each query descriptor which have distance less than given threshold. Found matches are returned in distance increasing order.
-.. c:function:: void radiusMatch(const GpuMat&queryDescs,  std::vector< std::vector<DMatch> >&matches,  float maxDistance,  const std::vector<GpuMat>&masks = std::vector<GpuMat>(),  bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatch(const GpuMat&queryDescs,  const GpuMat&trainDescs,  GpuMat&trainIdx,  GpuMat&nMatches,  GpuMat&distance,  float maxDistance,  const GpuMat&mask = GpuMat())
-This function works only on devices with Compute Capability
+    Finds the best matches for each query descriptor which have distance less than given threshold. Results will be stored to GPU memory. Results are not sorted by distance increasing order.
-:math:`>=` 1.1.
-See also:
+    :param queryDescs: Query set of descriptors.
-:func:`DescriptorMatcher::radiusMatch` .
-.. index:: gpu::BruteForceMatcher_GPU::radiusMatch
+    :param trainDescs: Train set of descriptors. This will not be added to train descriptors collection stored in class object.
-.. gpu::BruteForceMatcher_GPU::radiusMatch:
+    :param trainIdx: ``trainIdx.at<int>(queryIdx, i)`` will contain i'th train index ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``. If ``trainIdx`` is empty, it will be created with size ``nQuery`` :math:`\times` ``nTrain``. Or it can be allocated by user (it must have ``nQuery`` rows and ``CV_32SC1`` type). Cols can be less than ``nTrain``, but it can be that matcher won't find all matches, because it haven't enough memory to store results.
-gpu::BruteForceMatcher_GPU::radiusMatch
+    :param nMatches: ``nMatches.at<unsigned int>(0, queryIdx)`` will contain matches count for ``queryIdx``. Carefully, ``nMatches`` can be greater than ``trainIdx.cols`` - it means that matcher didn't find all matches, because it didn't have enough memory.
-------------------------------------------
-.. c:function:: void radiusMatch(const GpuMat&queryDescs,  const GpuMat&trainDescs,  GpuMat&trainIdx,  GpuMat&nMatches,  GpuMat&distance,  float maxDistance,  const GpuMat&mask = GpuMat())
-    Finds the best matches for each query descriptor which have distance less than given threshold. Results will be stored to GPU memory.
-    {Query set of descriptors.}
+    :param distance: ``distance.at<int>(queryIdx, i)`` will contain i'th distance ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``. If ``trainIdx`` is empty, it will be created with size ``nQuery`` :math:`\times` ``nTrain``. Otherwise it must be also allocated by user (it must have the same size as ``trainIdx`` and ``CV_32FC1`` type).
-    {Train set of descriptors. This will not be added to train descriptors collection stored in class object.}
-    { ``trainIdx.at<int>(queryIdx, i)``     will contain i'th train index ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``     . If ``trainIdx``     is empty, it will be created with size
-    :math:`\texttt{nQuery} \times \texttt{nTrain}`     . Or it can be allocated by user (it must have ``nQuery``     rows and ``CV_32SC1``     type). Cols can be less than ``nTrain``     , but it can be that matcher won't find all matches, because it haven't enough memory to store results.}
-    { ``nMatches.at<unsigned int>(0, queryIdx)``     will contain matches count for ``queryIdx``     . Carefully, ``nMatches``     can be greater than ``trainIdx.cols``     - it means that matcher didn't find all matches, because it didn't have enough memory.}
-    { ``distance.at<int>(queryIdx, i)``     will contain i'th distance ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``     . If ``trainIdx``     is empty, it will be created with size
-    :math:`\texttt{nQuery} \times \texttt{nTrain}`     . Otherwise it must be also allocated by user (it must have the same size as ``trainIdx``     and ``CV_32FC1``     type).}
    :param maxDistance: Distance threshold.
    :param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
-In contrast to results are not sorted by distance increasing order.
+**Please note:** This function works only on devices with Compute Capability :math:`>=` 1.1.
-This function works only on devices with Compute Capability
+See also: :c:func:`DescriptorMatcher::radiusMatch`.
-:math:`>=` 1.1.
-.. index:: gpu::BruteForceMatcher_GPU::radiusMatchDownload
-.. gpu::BruteForceMatcher_GPU::radiusMatchDownload:
+.. index:: gpu::BruteForceMatcher_GPU::radiusMatchDownload
 gpu::BruteForceMatcher_GPU::radiusMatchDownload
 ---------------------------------------------------
-.. c:function:: void radiusMatchDownload(const GpuMat&trainIdx,  const GpuMat&nMatches,  const GpuMat&distance,  std::vector< std::vector<DMatch> >&matches,  bool compactResult = false)
+.. cpp:function:: void gpu::BruteForceMatcher_GPU::radiusMatchDownload(const GpuMat& trainIdx, const GpuMat& nMatches, const GpuMat& distance, vector< vector<DMatch> >& matches, bool compactResult = false)
-    Downloads trainIdx, nMatchesand distancematrices obtained via to CPU vector with . If compactResultis true matchesvector will not contain matches for fully masked out query descriptors.
+    Downloads ``trainIdx``, ``nMatches`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::radiusMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
--- a/modules/gpu/doc/image_filtering.rst
+++ b/modules/gpu/doc/image_filtering.rst
@@ -3,15 +3,19 @@ Image Filtering
 .. highlight:: cpp
 Functions and classes described in this section are used to perform various linear or non-linear filtering operations on 2D images.
-See also:
+See also: :ref:`ImageFiltering`.
 .. index:: gpu::BaseRowFilter_GPU
 gpu::BaseRowFilter_GPU
 ----------------------
-.. c:type:: gpu::BaseRowFilter_GPU
+.. cpp:class:: gpu::BaseRowFilter_GPU
 The base class for linear or non-linear filters that processes rows of 2D arrays. Such filters are used for the "horizontal" filtering passes in separable filters. ::
@@ -24,16 +28,15 @@ The base class for linear or non-linear filters that processes rows of 2D arrays
        int ksize, anchor;
    };
+**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
-**Please note:**
-This class doesn't allocate memory for destination image. Usually this class is used inside
-.
 .. index:: gpu::BaseColumnFilter_GPU
 gpu::BaseColumnFilter_GPU
 -------------------------
-.. c:type:: gpu::BaseColumnFilter_GPU
+.. cpp:class:: gpu::BaseColumnFilter_GPU
 The base class for linear or non-linear filters that processes columns of 2D arrays. Such filters are used for the "vertical" filtering passes in separable filters. ::
@@ -46,16 +49,15 @@ The base class for linear or non-linear filters that processes columns of 2D arr
        int ksize, anchor;
    };
+**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
-**Please note:**
-This class doesn't allocate memory for destination image. Usually this class is used inside
-.
 .. index:: gpu::BaseFilter_GPU
 gpu::BaseFilter_GPU
 -------------------
-.. c:type:: gpu::BaseFilter_GPU
+.. cpp:class:: gpu::BaseFilter_GPU
 The base class for non-separable 2D filters. ::
@@ -70,15 +72,15 @@ The base class for non-separable 2D filters. ::
    };
-**Please note:**
+**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
-This class doesn't allocate memory for destination image. Usually this class is used inside
-.
 .. index:: gpu::FilterEngine_GPU
 gpu::FilterEngine_GPU
 ---------------------
-.. c:type:: gpu::FilterEngine_GPU
+.. cpp:class:: gpu::FilterEngine_GPU
 The base class for Filter Engine. ::
@@ -91,9 +93,7 @@ The base class for Filter Engine. ::
                           Rect roi = Rect(0,0,-1,-1)) = 0;
    };
+The class can be used to apply an arbitrary filtering operation to an image. It contains all the necessary intermediate buffers. Pointers to the initialized ``FilterEngine_GPU`` instances are returned by various ``create*Filter_GPU`` functions, see below, and they are used inside high-level functions such as :cpp:func:`gpu::filter2D`, :cpp:func:`gpu::erode`, :cpp:func:`gpu::Sobel` etc.
-The class can be used to apply an arbitrary filtering operation to an image. It contains all the necessary intermediate buffers. Pointers to the initialized ``FilterEngine_GPU`` instances are returned by various ``create*Filter_GPU`` functions, see below, and they are used inside high-level functions such as
-:func:`gpu::filter2D`,:func:`gpu::erode`,:func:`gpu::Sobel` etc.
 By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary memory allocation for intermediate buffers and get much better performance: ::
@@ -117,52 +117,59 @@ By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary mem
    // Release buffers only once
    filter.release();
- ``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``,``FilterEngine_GPU`` processes inner region of image ( ``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)`` ), because some filters doesn't check if indices are outside the image for better perfomace. See below which filters supports processing the whole image and which not and image type limitations.
+``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``, ``FilterEngine_GPU`` processes inner region of image (``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)``), because some filters doesn't check if indices are outside the image for better perfomace. See below which filters supports processing the whole image and which not and image type limitations.
+**Please note:** The GPU filters doesn't support the in-place mode.
+See also: :cpp:class:`gpu::BaseRowFilter_GPU`, :cpp:class:`gpu::BaseColumnFilter_GPU`, :cpp:class:`gpu::BaseFilter_GPU`, :cpp:func:`gpu::createFilter2D_GPU`, :cpp:func:`gpu::createSeparableFilter_GPU`, :cpp:func:`gpu::createBoxFilter_GPU`, :cpp:func:`gpu::createMorphologyFilter_GPU`, :cpp:func:`gpu::createLinearFilter_GPU`, :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :cpp:func:`gpu::createDerivFilter_GPU`, :cpp:func:`gpu::createGaussianFilter_GPU`.
-**Please note:**
-The GPU filters doesn't support the in-place mode.
 .. index:: gpu::createFilter2D_GPU
 gpu::createFilter2D_GPU
 ---------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createFilter2D_GPU( const Ptr<BaseFilter_GPU>& filter2D,  int srcType, int dstType)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createFilter2D_GPU(const Ptr<BaseFilter_GPU>& filter2D, int srcType, int dstType)
    Creates non-separable filter engine with the specified filter.
-    {Non-separable 2D filter.}
+    :param filter2D: Non-separable 2D filter.
+    :param srcType: Input image type. It must be supported by ``filter2D``.
+    :param dstType: Output image type. It must be supported by ``filter2D``.
-    :param srcType: Input image type. It must be supported by  ``filter2D`` .
+Usually this function is used inside high-level functions, like :cpp:func:`gpu::createLinearFilter_GPU`, :cpp:func:`gpu::createBoxFilter_GPU`.
-    :param dstType: Output image type. It must be supported by  ``filter2D`` .
-Usually this function is used inside high-level functions, like,.
 .. index:: gpu::createSeparableFilter_GPU
 gpu::createSeparableFilter_GPU
 ----------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createSeparableFilter_GPU( const Ptr<BaseRowFilter_GPU>& rowFilter,  const Ptr<BaseColumnFilter_GPU>& columnFilter,  int srcType, int bufType, int dstType)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createSeparableFilter_GPU( const Ptr<BaseRowFilter_GPU>& rowFilter, const Ptr<BaseColumnFilter_GPU>& columnFilter, int srcType, int bufType, int dstType)
    Creates separable filter engine with the specified filters.
-    {"Horizontal" 1D filter.}
+    :param rowFilter: "Horizontal" 1D filter.
-    {"Vertical" 1D filter.}
-    :param srcType: Input image type. It must be supported by  ``rowFilter`` .
+    :param columnFilter: "Vertical" 1D filter.
-    :param bufType: Buffer image type. It must be supported by  ``rowFilter``  and  ``columnFilter`` .
+    :param srcType: Input image type. It must be supported by ``rowFilter``.
+    :param bufType: Buffer image type. It must be supported by ``rowFilter`` and ``columnFilter``.
+    :param dstType: Output image type. It must be supported by ``columnFilter``.
+Usually this function is used inside high-level functions, like :cpp:func:`gpu::createSeparableLinearFilter_GPU`.
-    :param dstType: Output image type. It must be supported by  ``columnFilter`` .
-Usually this function is used inside high-level functions, like
-.
 .. index:: gpu::getRowSumFilter_GPU
 gpu::getRowSumFilter_GPU
 ----------------------------
-.. c:function:: Ptr<BaseRowFilter_GPU> getRowSumFilter_GPU(int srcType, int sumType,  int ksize, int anchor = -1)
+.. cpp:function:: Ptr<BaseRowFilter_GPU> gpu::getRowSumFilter_GPU(int srcType, int sumType, int ksize, int anchor = -1)
    Creates horizontal 1D box filter.
@@ -174,14 +181,15 @@ gpu::getRowSumFilter_GPU
    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
 .. index:: gpu::getColumnSumFilter_GPU
 gpu::getColumnSumFilter_GPU
 -------------------------------
-.. c:function:: Ptr<BaseColumnFilter_GPU> getColumnSumFilter_GPU(int sumType,  int dstType, int ksize, int anchor = -1)
+.. cpp:function:: Ptr<BaseColumnFilter_GPU> gpu::getColumnSumFilter_GPU(int sumType,  int dstType, int ksize, int anchor = -1)
    Creates vertical 1D box filter.
@@ -193,20 +201,21 @@ gpu::getColumnSumFilter_GPU
    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
 .. index:: gpu::createBoxFilter_GPU
 gpu::createBoxFilter_GPU
 ----------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createBoxFilter_GPU(int srcType, int dstType,  const Size& ksize,  const Point& anchor = Point(-1,-1))
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createBoxFilter_GPU(int srcType, int dstType, const Size& ksize, const Point& anchor = Point(-1,-1))
-    Creates normalized 2D box filter.
+.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getBoxFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1, -1))
-.. c:function:: Ptr<BaseFilter_GPU> getBoxFilter_GPU(int srcType, int dstType,  const Size& ksize,  Point anchor = Point(-1, -1))
+    Creates normalized 2D box filter.
-    :param srcType: Input image type. Supports  ``CV_8UC1``  and  ``CV_8UC4`` .
+    :param srcType: Input image type. Supports ``CV_8UC1`` and ``CV_8UC4``.
    :param dstType: Output image type. Supports only the same as source type.
@@ -214,68 +223,69 @@ gpu::createBoxFilter_GPU
    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`boxFilter`.
-See also: :func:`boxFilter` .
 .. index:: gpu::boxFilter
 gpu::boxFilter
 ------------------
-.. c:function:: void gpu::boxFilter(const GpuMat& src, GpuMat& dst, int ddepth, Size ksize,  Point anchor = Point(-1,-1))
+.. cpp:function:: void gpu::boxFilter(const GpuMat& src, GpuMat& dst, int ddepth, Size ksize, Point anchor = Point(-1,-1))
    Smooths the image using the normalized box filter.
    :param src: Input image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
-    :param dst: Output image type. Will have the same size and the same type as  ``src`` .
+    :param dst: Output image type. Will have the same size and the same type as ``src``.
-    :param ddepth: Output image depth. Support only the same as source depth ( ``CV_8U`` ) or -1 what means use source depth.
+    :param ddepth: Output image depth. Support only the same as source depth (``CV_8U``) or -1 what means use source depth.
    :param ksize: Kernel size.
    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`boxFilter`, :cpp:func:`gpu::createBoxFilter_GPU`.
-See also:
-:func:`boxFilter`,.
 .. index:: gpu::blur
 gpu::blur
 -------------
-.. c:function:: void gpu::blur(const GpuMat& src, GpuMat& dst, Size ksize,  Point anchor = Point(-1,-1))
+.. cpp:function:: void gpu::blur(const GpuMat& src, GpuMat& dst, Size ksize,  Point anchor = Point(-1,-1))
    A synonym for normalized box filter.
    :param src: Input image. Supports ``CV_8UC1`` and ``CV_8UC4`` source type.
-    :param dst: Output image type. Will have the same size and the same type as  ``src`` .
+    :param dst: Output image type. Will have the same size and the same type as ``src``.
    :param ksize: Kernel size.
    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`blur`, :cpp:func:`gpu::boxFilter`.
-See also:
-:func:`blur`,:func:`gpu::boxFilter` .
 .. index:: gpu::createMorphologyFilter_GPU
 gpu::createMorphologyFilter_GPU
 -----------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createMorphologyFilter_GPU(int op, int type,  const Mat& kernel,  const Point& anchor = Point(-1,-1),  int iterations = 1)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createMorphologyFilter_GPU(int op, int type, const Mat& kernel, const Point& anchor = Point(-1,-1), int iterations = 1)
-    Creates 2D morphological filter.
+.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMorphologyFilter_GPU(int op, int type, const Mat& kernel, const Size& ksize, Point anchor=Point(-1,-1))
-.. c:function:: Ptr<BaseFilter_GPU> getMorphologyFilter_GPU(int op, int type,  const Mat& kernel, const Size& ksize,  Point anchor=Point(-1,-1))
+    Creates 2D morphological filter.
-    {Morphology operation id. Only ``MORPH_ERODE``     and ``MORPH_DILATE``     are supported.}
+    :param op: Morphology operation id. Only ``MORPH_ERODE`` and ``MORPH_DILATE`` are supported.
    :param type: Input/output image type. Only ``CV_8UC1`` and ``CV_8UC4`` are supported.
@@ -285,71 +295,72 @@ gpu::createMorphologyFilter_GPU
    :param anchor: Anchor position within the structuring element; negative values mean that the anchor is at the center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`createMorphologyFilter`.
-See also:
-:func:`createMorphologyFilter` .
 .. index:: gpu::erode
 gpu::erode
 --------------
-.. c:function:: void gpu::erode(const GpuMat& src, GpuMat& dst, const Mat& kernel,  Point anchor = Point(-1, -1),  int iterations = 1)
+.. cpp:function:: void gpu::erode(const GpuMat& src, GpuMat& dst, const Mat& kernel, Point anchor = Point(-1, -1), int iterations = 1)
    Erodes an image by using a specific structuring element.
    :param src: Source image. Only ``CV_8UC1`` and ``CV_8UC4`` types are supported.
-    :param dst: Destination image. It will have the same size and the same type as  ``src`` .
+    :param dst: Destination image. It will have the same size and the same type as ``src``.
-    :param kernel: Structuring element used for dilation. If  ``kernel=Mat()`` , a  :math:`3 \times 3`  rectangular structuring element is used.
+    :param kernel: Structuring element used for dilation. If ``kernel=Mat()``, a :math:`3 \times 3` rectangular structuring element is used.
-    :param anchor: Position of the anchor within the element. The default value  :math:`(-1, -1)`  means that the anchor is at the element center.
+    :param anchor: Position of the anchor within the element. The default value ``(-1, -1)``  means that the anchor is at the element center.
    :param iterations: Number of times erosion to be applied.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`erode`, :cpp:func:`gpu::createMorphologyFilter_GPU`.
-See also:
-:func:`erode`,.
 .. index:: gpu::dilate
 gpu::dilate
 ---------------
-.. c:function:: void gpu::dilate(const GpuMat& src, GpuMat& dst, const Mat& kernel,  Point anchor = Point(-1, -1),  int iterations = 1)
+.. cpp:function:: void gpu::dilate(const GpuMat& src, GpuMat& dst, const Mat& kernel, Point anchor = Point(-1, -1), int iterations = 1)
    Dilates an image by using a specific structuring element.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
-    :param dst: Destination image. It will have the same size and the same type as  ``src`` .
+    :param dst: Destination image. It will have the same size and the same type as ``src``.
-    :param kernel: Structuring element used for dilation. If  ``kernel=Mat()`` , a  :math:`3 \times 3`  rectangular structuring element is used.
+    :param kernel: Structuring element used for dilation. If ``kernel=Mat()``, a :math:`3 \times 3` rectangular structuring element is used.
-    :param anchor: Position of the anchor within the element. The default value  :math:`(-1, -1)`  means that the anchor is at the element center.
+    :param anchor: Position of the anchor within the element. The default value ``(-1, -1)``  means that the anchor is at the element center.
    :param iterations: Number of times dilation to be applied.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`dilate`, :cpp:func:`gpu::createMorphologyFilter_GPU`.
-See also:
-:func:`dilate`,.
 .. index:: gpu::morphologyEx
 gpu::morphologyEx
 ---------------------
-.. c:function:: void gpu::morphologyEx(const GpuMat& src, GpuMat& dst, int op,  const Mat& kernel,  Point anchor = Point(-1, -1),  int iterations = 1)
+.. cpp:function:: void gpu::morphologyEx(const GpuMat& src, GpuMat& dst, int op,  const Mat& kernel,  Point anchor = Point(-1, -1),  int iterations = 1)
    Applies an advanced morphological operation to image.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source type.
-    :param dst: Destination image. It will have the same size and the same type as  ``src``
+    :param dst: Destination image. It will have the same size and the same type as ``src``.
    :param op: Type of morphological operation, one of the following:
            * **MORPH_OPEN** opening
@@ -362,30 +373,29 @@ gpu::morphologyEx
            * **MORPH_BLACKHAT** "black hat"
    :param kernel: Structuring element.
-    :param anchor: Position of the anchor within the element. The default value Point(-1, -1) means that the anchor is at the element center.
+    :param anchor: Position of the anchor within the element. The default value ``(-1, -1)`` means that the anchor is at the element center.
    :param iterations: Number of times erosion and dilation to be applied.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`morphologyEx`.
-See also:
-:func:`morphologyEx` .
 .. index:: gpu::createLinearFilter_GPU
 gpu::createLinearFilter_GPU
 -------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> gpu::createLinearFilter_GPU(int srcType, int dstType,  const Mat& kernel,  const Point& anchor = Point(-1,-1))
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createLinearFilter_GPU(int srcType, int dstType, const Mat& kernel, const Point& anchor = Point(-1,-1))
-    Creates the non-separable linear filter.
+.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getLinearFilter_GPU(int srcType, int dstType, const Mat& kernel, const Size& ksize, Point anchor = Point(-1, -1))
-.. c:function:: Ptr<BaseFilter_GPU> getLinearFilter_GPU(int srcType, int dstType,  const Mat& kernel, const Size& ksize,  Point anchor = Point(-1, -1))
+    Creates the non-separable linear filter.
-    :param srcType: Input image type. Supports  ``CV_8UC1``  and  ``CV_8UC4`` .
+    :param srcType: Input image type. Supports ``CV_8UC1`` and ``CV_8UC4``.
    :param dstType: Output image type. Supports only the same as source type.
@@ -393,183 +403,193 @@ gpu::createLinearFilter_GPU
    :param ksize: Kernel size.
-    :param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
+    :param anchor: Anchor point. The default value ``(-1, -1)`` means that the anchor is at the kernel center.
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`createLinearFilter`.
-**Please note:**
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-See also:
-:func:`createLinearFilter` .
 .. index:: gpu::filter2D
 gpu::filter2D
 -----------------
-.. c:function:: void gpu::filter2D(const GpuMat& src, GpuMat& dst, int ddepth,  const Mat& kernel,  Point anchor=Point(-1,-1))
+.. cpp:function:: void gpu::filter2D(const GpuMat& src, GpuMat& dst, int ddepth, const Mat& kernel, Point anchor=Point(-1,-1))
    Applies non-separable 2D linear filter to image.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
-    :param dst: Destination image. It will have the same size and the same number of channels as  ``src`` .
+    :param dst: Destination image. It will have the same size and the same number of channels as ``src``.
-    :param ddepth: The desired depth of the destination image. If it is negative, it will be the same as  ``src.depth()`` . Supports only the same depth as source image.
+    :param ddepth: The desired depth of the destination image. If it is negative, it will be the same as ``src.depth()``. Supports only the same depth as source image.
    :param kernel: 2D array of filter coefficients. This filter works with integers kernels, if ``kernel`` has ``float`` or ``double`` type it will use fixed point arithmetic.
-    :param anchor: Anchor of the kernel that indicates the relative position of a filtered point within the kernel. The anchor should lie within the kernel. The special default value (-1,-1) means that the anchor is at the kernel center.
+    :param anchor: Anchor of the kernel that indicates the relative position of a filtered point within the kernel. The anchor should lie within the kernel. The special default value ``(-1,-1)`` means that the anchor is at the kernel center.
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
+See also: :c:func:`filter2D`, :cpp:func:`gpu::createLinearFilter_GPU`.
-**Please note:**
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-See also:
-:func:`filter2D`,.
 .. index:: gpu::Laplacian
 gpu::Laplacian
 ------------------
-.. c:function:: void gpu::Laplacian(const GpuMat& src, GpuMat& dst, int ddepth,  int ksize = 1, double scale = 1)
+.. cpp:function:: void gpu::Laplacian(const GpuMat& src, GpuMat& dst, int ddepth, int ksize = 1, double scale = 1)
    Applies Laplacian operator to image.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
-    :param dst: Destination image; will have the same size and the same number of channels as  ``src`` .
+    :param dst: Destination image; will have the same size and the same number of channels as ``src``.
    :param ddepth: Desired depth of the destination image. Supports only tha same depth as source image depth.
-    :param ksize: Aperture size used to compute the second-derivative filters, see  :func:`getDerivKernels` . It must be positive and odd. Supports only  ``ksize``  = 1 and  ``ksize``  = 3.
+    :param ksize: Aperture size used to compute the second-derivative filters, see :c:func:`getDerivKernels`. It must be positive and odd. Supports only ``ksize`` = 1 and ``ksize`` = 3.
+    :param scale: Optional scale factor for the computed Laplacian values (by default, no scaling is applied, see  :c:func:`getDerivKernels`).
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-    :param scale: Optional scale factor for the computed Laplacian values (by default, no scaling is applied, see  :func:`getDerivKernels` ).
+See also: :c:func:`Laplacian`, :cpp:func:`gpu::filter2D`.
-**Please note:**
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-See also:
-:func:`Laplacian`,:func:`gpu::filter2D` .
 .. index:: gpu::getLinearRowFilter_GPU
 gpu::getLinearRowFilter_GPU
 -------------------------------
-.. c:function:: Ptr<BaseRowFilter_GPU> getLinearRowFilter_GPU(int srcType,  int bufType, const Mat& rowKernel, int anchor = -1,  int borderType = BORDER_CONSTANT)
+.. cpp:function:: Ptr<BaseRowFilter_GPU> gpu::getLinearRowFilter_GPU(int srcType, int bufType, const Mat& rowKernel, int anchor = -1, int borderType = BORDER_CONSTANT)
    Creates primitive row filter with the specified kernel.
-    :param srcType: Source array type. Supports only  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param srcType: Source array type. Supports only ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param bufType: Inermediate buffer type; must have as many channels as  ``srcType`` .
+    :param bufType: Inermediate buffer type; must have as many channels as ``srcType``.
    :param rowKernel: Filter coefficients.
    :param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
-    :param borderType: Pixel extrapolation method; see  :func:`borderInterpolate` . About limitation see below.
+    :param borderType: Pixel extrapolation method; see :c:func:`borderInterpolate`. About limitation see below.
+There are two version of algorithm: NPP and OpenCV. NPP calls when ``srcType == CV_8UC1`` or ``srcType == CV_8UC4`` and ``bufType == srcType``, otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
+See also: :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
-There are two version of algorithm: NPP and OpenCV. NPP calls when ``srcType == CV_8UC1`` or ``srcType == CV_8UC4`` and ``bufType == srcType`` , otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
-See also:,:func:`createSeparableLinearFilter` .
 .. index:: gpu::getLinearColumnFilter_GPU
 gpu::getLinearColumnFilter_GPU
 ----------------------------------
-.. c:function:: Ptr<BaseColumnFilter_GPU> getLinearColumnFilter_GPU(int bufType,  int dstType, const Mat& columnKernel, int anchor = -1,  int borderType = BORDER_CONSTANT)
+.. cpp:function:: Ptr<BaseColumnFilter_GPU> gpu::getLinearColumnFilter_GPU(int bufType, int dstType, const Mat& columnKernel, int anchor = -1, int borderType = BORDER_CONSTANT)
    Creates the primitive column filter with the specified kernel.
-    :param bufType: Inermediate buffer type; must have as many channels as  ``dstType`` .
+    :param bufType: Inermediate buffer type; must have as many channels as ``dstType``.
-    :param dstType: Destination array type. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  destination types.
+    :param dstType: Destination array type. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` destination types.
    :param columnKernel: Filter coefficients.
    :param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
-    :param borderType: Pixel extrapolation method; see  :func:`borderInterpolate` . About limitation see below.
+    :param borderType: Pixel extrapolation method; see :c:func:`borderInterpolate`. About limitation see below.
+There are two version of algorithm: NPP and OpenCV. NPP calls when ``dstType == CV_8UC1`` or ``dstType == CV_8UC4`` and ``bufType == dstType``, otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
+See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :c:func:`createSeparableLinearFilter`.
-There are two version of algorithm: NPP and OpenCV. NPP calls when ``dstType == CV_8UC1`` or ``dstType == CV_8UC4`` and ``bufType == dstType`` , otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
-See also:,:func:`createSeparableLinearFilter` .
 .. index:: gpu::createSeparableLinearFilter_GPU
 gpu::createSeparableLinearFilter_GPU
 ----------------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createSeparableLinearFilter_GPU(int srcType,  int dstType, const Mat& rowKernel, const Mat& columnKernel,  const Point& anchor = Point(-1,-1),  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createSeparableLinearFilter_GPU(int srcType,  int dstType, const Mat& rowKernel, const Mat& columnKernel, const Point& anchor = Point(-1,-1), int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Creates the separable linear filter engine.
-    :param srcType: Source array type. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param srcType: Source array type. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param dstType: Destination array type. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  destination types.
+    :param dstType: Destination array type. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` destination types.
    :param rowKernel, columnKernel: Filter coefficients.
    :param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method in the horizontal and the vertical directions; see  :func:`borderInterpolate` . About limitation see  ,  .
+    :param rowBorderType, columnBorderType: Pixel extrapolation method in the horizontal and the vertical directions; see :c:func:`borderInterpolate`. About limitation see :cpp:func:`gpu::getLinearRowFilter_GPU`, cpp:func:`gpu::getLinearColumnFilter_GPU`.
+See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
-See also:,,
-:func:`createSeparableLinearFilter` .
 .. index:: gpu::sepFilter2D
 gpu::sepFilter2D
 --------------------
-.. c:function:: void gpu::sepFilter2D(const GpuMat& src, GpuMat& dst, int ddepth,  const Mat& kernelX, const Mat& kernelY,  Point anchor = Point(-1,-1),  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: void gpu::sepFilter2D(const GpuMat& src, GpuMat& dst, int ddepth, const Mat& kernelX, const Mat& kernelY, Point anchor = Point(-1,-1), int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Applies separable 2D linear filter to the image.
-    :param src: Source image. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param src: Source image. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param dst: Destination image; will have the same size and the same number of channels as  ``src`` .
+    :param dst: Destination image; will have the same size and the same number of channels as ``src``.
-    :param ddepth: Destination image depth. Supports  ``CV_8U`` ,  ``CV_16S`` ,  ``CV_32S``  and  ``CV_32F`` .
+    :param ddepth: Destination image depth. Supports ``CV_8U``, ``CV_16S``, ``CV_32S`` and ``CV_32F``.
    :param kernelX, kernelY: Filter coefficients.
-    :param anchor: Anchor position within the kernel; The default value  :math:`(-1, 1)`  means that the anchor is at the kernel center.
+    :param anchor: Anchor position within the kernel; The default value ``(-1, 1)`` means that the anchor is at the kernel center.
+    :param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
+See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`sepFilter2D`.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method; see  :func:`borderInterpolate` .
-See also:,:func:`sepFilter2D` .
 .. index:: gpu::createDerivFilter_GPU
 gpu::createDerivFilter_GPU
 ------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createDerivFilter_GPU(int srcType, int dstType,  int dx, int dy, int ksize,  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createDerivFilter_GPU(int srcType, int dstType, int dx, int dy, int ksize, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Creates filter engine for the generalized Sobel operator.
-    :param srcType: Source image type. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param srcType: Source image type. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param dstType: Destination image type; must have as many channels as  ``srcType`` . Supports  ``CV_8U`` ,  ``CV_16S`` ,  ``CV_32S``  and  ``CV_32F``  depths.
+    :param dstType: Destination image type; must have as many channels as ``srcType``. Supports ``CV_8U``, ``CV_16S``, ``CV_32S`` and ``CV_32F`` depths.
    :param dx: Derivative order in respect with x.
    :param dy: Derivative order in respect with y.
-    :param ksize: Aperture size; see  :func:`getDerivKernels` .
+    :param ksize: Aperture size; see :c:func:`getDerivKernels`.
+    :param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
+See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createDerivFilter`.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method; see  :func:`borderInterpolate` .
-See also:,:func:`createDerivFilter` .
 .. index:: gpu::Sobel
 gpu::Sobel
 --------------
-.. c:function:: void gpu::Sobel(const GpuMat& src, GpuMat& dst, int ddepth, int dx, int dy,  int ksize = 3, double scale = 1,  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: void gpu::Sobel(const GpuMat& src, GpuMat& dst, int ddepth, int dx, int dy, int ksize = 3, double scale = 1, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Applies generalized Sobel operator to the image.
-    :param src: Source image. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param src: Source image. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
    :param dst: Destination image. Will have the same size and number of channels as source image.
-    :param ddepth: Destination image depth. Supports  ``CV_8U`` ,  ``CV_16S`` ,  ``CV_32S``  and  ``CV_32F`` .
+    :param ddepth: Destination image depth. Supports ``CV_8U``, ``CV_16S``, ``CV_32S`` and ``CV_32F``.
    :param dx: Derivative order in respect with x.
@@ -577,83 +597,93 @@ gpu::Sobel
    :param ksize: Size of the extended Sobel kernel, must be 1, 3, 5 or 7.
-    :param scale: Optional scale factor for the computed derivative values (by default, no scaling is applied, see  :func:`getDerivKernels` ).
+    :param scale: Optional scale factor for the computed derivative values (by default, no scaling is applied, see :c:func:`getDerivKernels`).
+    :param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
+See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Sobel`.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method; see  :func:`borderInterpolate` .
-See also:,:func:`Sobel` .
 .. index:: gpu::Scharr
 gpu::Scharr
 ---------------
-.. c:function:: void gpu::Scharr(const GpuMat& src, GpuMat& dst, int ddepth,  int dx, int dy, double scale = 1,  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: void gpu::Scharr(const GpuMat& src, GpuMat& dst, int ddepth, int dx, int dy, double scale = 1, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Calculates the first x- or y- image derivative using Scharr operator.
-    :param src: Source image. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param src: Source image. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param dst: Destination image; will have the same size and the same number of channels as  ``src`` .
+    :param dst: Destination image; will have the same size and the same number of channels as ``src``.
-    :param ddepth: Destination image depth. Supports  ``CV_8U`` ,  ``CV_16S`` ,  ``CV_32S``  and  ``CV_32F`` .
+    :param ddepth: Destination image depth. Supports ``CV_8U``, ``CV_16S``, ``CV_32S`` and ``CV_32F``.
    :param xorder: Order of the derivative x.
    :param yorder: Order of the derivative y.
-    :param scale: Optional scale factor for the computed derivative values (by default, no scaling is applied, see  :func:`getDerivKernels` ).
+    :param scale: Optional scale factor for the computed derivative values (by default, no scaling is applied, see :c:func:`getDerivKernels`).
+    :param rowBorderType, columnBorderType: Pixel extrapolation method, see :c:func:`borderInterpolate`.
+See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Scharr`.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method, see  :func:`borderInterpolate` See also:,:func:`Scharr` .
 .. index:: gpu::createGaussianFilter_GPU
 gpu::createGaussianFilter_GPU
 ---------------------------------
-.. c:function:: Ptr<FilterEngine_GPU> createGaussianFilter_GPU(int type, Size ksize,  double sigmaX, double sigmaY = 0,  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createGaussianFilter_GPU(int type, Size ksize, double sigmaX, double sigmaY = 0, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Creates Gaussian filter engine.
-    :param type: Source and the destination image type. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1`` .
+    :param type: Source and the destination image type. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1``.
+    :param ksize: Aperture size; see :c:func:`getGaussianKernel`.
+    :param sigmaX: Gaussian sigma in the horizontal direction; see :c:func:`getGaussianKernel`.
-    :param ksize: Aperture size; see  :func:`getGaussianKernel` .
+    :param sigmaY: Gaussian sigma in the vertical direction; if 0, then :math:`\texttt{sigmaY}\leftarrow\texttt{sigmaX}`.
-    :param sigmaX: Gaussian sigma in the horizontal direction; see  :func:`getGaussianKernel` .
+    :param rowBorderType, columnBorderType: Which border type to use; see :c:func:`borderInterpolate`.
-    :param sigmaY: Gaussian sigma in the vertical direction; if 0, then  :math:`\texttt{sigmaY}\leftarrow\texttt{sigmaX}` .
+See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createGaussianFilter`.
-    :param rowBorderType, columnBorderType: Which border type to use; see  :func:`borderInterpolate` .
-See also:,:func:`createGaussianFilter` .
 .. index:: gpu::GaussianBlur
 gpu::GaussianBlur
 ---------------------
-.. c:function:: void gpu::GaussianBlur(const GpuMat& src, GpuMat& dst, Size ksize,  double sigmaX, double sigmaY = 0,  int rowBorderType = BORDER_DEFAULT,  int columnBorderType = -1)
+.. cpp:function:: void gpu::GaussianBlur(const GpuMat& src, GpuMat& dst, Size ksize, double sigmaX, double sigmaY = 0, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
    Smooths the image using Gaussian filter.
-    :param src: Source image. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_16SC1`` ,  ``CV_16SC2`` ,  ``CV_32SC1`` ,  ``CV_32FC1``  source types.
+    :param src: Source image. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types.
-    :param dst: Destination image; will have the same size and the same type as  ``src`` .
+    :param dst: Destination image; will have the same size and the same type as ``src``.
-    :param ksize: Gaussian kernel size;  ``ksize.width``  and  ``ksize.height``  can differ, but they both must be positive and odd. Or, they can be zero's, then they are computed from  ``sigmaX``  amd  ``sigmaY`` .
+    :param ksize: Gaussian kernel size; ``ksize.width`` and ``ksize.height`` can differ, but they both must be positive and odd. Or, they can be zero's, then they are computed from ``sigmaX`` amd ``sigmaY``.
-    :param sigmaX, sigmaY: Gaussian kernel standard deviations in X and Y direction. If  ``sigmaY``  is zero, it is set to be equal to  ``sigmaX`` . If they are both zeros, they are computed from  ``ksize.width``  and  ``ksize.height`` , respectively, see  :func:`getGaussianKernel` . To fully control the result regardless of possible future modification of all this semantics, it is recommended to specify all of  ``ksize`` ,  ``sigmaX``  and  ``sigmaY`` .
+    :param sigmaX, sigmaY: Gaussian kernel standard deviations in X and Y direction. If ``sigmaY`` is zero, it is set to be equal to ``sigmaX``. If they are both zeros, they are computed from ``ksize.width`` and ``ksize.height``, respectively, see :c:func:`getGaussianKernel`. To fully control the result regardless of possible future modification of all this semantics, it is recommended to specify all of ``ksize``, ``sigmaX`` and ``sigmaY``.
+    :param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
+See also: :cpp:func:`gpu::createGaussianFilter_GPU`, :c:func:`GaussianBlur`.
-    :param rowBorderType, columnBorderType: Pixel extrapolation method; see  :func:`borderInterpolate` .
-See also:,:func:`GaussianBlur` .
 .. index:: gpu::getMaxFilter_GPU
 gpu::getMaxFilter_GPU
 -------------------------
-.. c:function:: Ptr<BaseFilter_GPU> getMaxFilter_GPU(int srcType, int dstType,  const Size& ksize, Point anchor = Point(-1,-1))
+.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMaxFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1,-1))
    Creates maximum filter.
-    :param srcType: Input image type. Supports only  ``CV_8UC1``  and  ``CV_8UC4`` .
+    :param srcType: Input image type. Supports only ``CV_8UC1`` and ``CV_8UC4``.
    :param dstType: Output image type. Supports only the same type as source.
@@ -661,18 +691,19 @@ gpu::getMaxFilter_GPU
    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
 .. index:: gpu::getMinFilter_GPU
 gpu::getMinFilter_GPU
 -------------------------
-.. c:function:: Ptr<BaseFilter_GPU> getMinFilter_GPU(int srcType, int dstType,  const Size& ksize, Point anchor = Point(-1,-1))
+.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMinFilter_GPU(int srcType, int dstType,  const Size& ksize, Point anchor = Point(-1,-1))
    Creates minimum filter.
-    :param srcType: Input image type. Supports only  ``CV_8UC1``  and  ``CV_8UC4`` .
+    :param srcType: Input image type. Supports only ``CV_8UC1`` and ``CV_8UC4``.
    :param dstType: Output image type. Supports only the same type as source.
@@ -680,5 +711,4 @@ gpu::getMinFilter_GPU
    :param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
-**Please note:**
+**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
-This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
--- a/modules/gpu/doc/image_processing.rst
+++ b/modules/gpu/doc/image_processing.rst
@@ -3,41 +3,39 @@ Image Processing
 .. highlight:: cpp
 .. index:: gpu::meanShiftFiltering
-cv::gpu::meanShiftFiltering
+gpu::meanShiftFiltering
 ---------------------------
-.. c:function:: void gpu::meanShiftFiltering(const GpuMat\& src, GpuMat\& dst,
+.. cpp:function:: void gpu::meanShiftFiltering(const GpuMat& src, GpuMat& dst, int sp, int sr, TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1))
-   int sp, int sr,
-   TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER
-   + TermCriteria::EPS, 5, 1))
    Performs mean-shift filtering for each point of the source image. It maps each point of the source image into another point, and as the result we have new color and new position of each point.
    :param src: Source image. Only ``CV_8UC4`` images are supported for now.
-    :param dst: Destination image, containing color of mapped points. Will have the same size and type as  ``src`` .
+    :param dst: Destination image, containing color of mapped points. Will have the same size and type as ``src``.
    :param sp: Spatial window radius.
    :param sr: Color window radius.
-    :param criteria: Termination criteria. See  .
+    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
 .. index:: gpu::meanShiftProc
-cv::gpu::meanShiftProc
+gpu::meanShiftProc
 ----------------------
-.. c:function:: void gpu::meanShiftProc(const GpuMat\& src, GpuMat\& dstr, GpuMat\& dstsp,
+.. cpp:function:: void gpu::meanShiftProc(const GpuMat& src, GpuMat& dstr, GpuMat& dstsp, int sp, int sr, TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1))
-   int sp, int sr,
-   TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER
-   + TermCriteria::EPS, 5, 1))
    Performs mean-shift procedure and stores information about processed points (i.e. their colors and positions) into two images.
    :param src: Source image. Only ``CV_8UC4`` images are supported for now.
-    :param dstr: Destination image, containing color of mapped points. Will have the same size and type as  ``src`` .
+    :param dstr: Destination image, containing color of mapped points. Will have the same size and type as ``src``.
    :param dstsp: Destination image, containing position of mapped points. Will have the same size as ``src`` and ``CV_16SC2`` type.
@@ -45,25 +43,23 @@ cv::gpu::meanShiftProc
    :param sr: Color window radius.
-    :param criteria: Termination criteria. See  .
+    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
+See also: :cpp:func:`gpu::meanShiftFiltering`.
-See also:
-:func:`gpu::meanShiftFiltering` .
 .. index:: gpu::meanShiftSegmentation
-cv::gpu::meanShiftSegmentation
+gpu::meanShiftSegmentation
 ------------------------------
-.. c:function:: void gpu::meanShiftSegmentation(const GpuMat\& src, Mat\& dst,
+.. cpp:function:: void gpu::meanShiftSegmentation(const GpuMat& src, Mat& dst, int sp, int sr, int minsize, TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 5, 1))
-   int sp, int sr, int minsize,
-   TermCriteria criteria = TermCriteria(TermCriteria::MAX_ITER
-   + TermCriteria::EPS, 5, 1))
    Performs mean-shift segmentation of the source image and eleminates small segments.
    :param src: Source image. Only ``CV_8UC4`` images are supported for now.
-    :param dst: Segmented image. Will have the same size and type as  ``src`` .
+    :param dst: Segmented image. Will have the same size and type as ``src``.
    :param sp: Spatial window radius.
@@ -71,44 +67,49 @@ cv::gpu::meanShiftSegmentation
    :param minsize: Minimum segment size. Smaller segements will be merged.
-    :param criteria: Termination criteria. See  .
+    :param criteria: Termination criteria. See :c:type:`TermCriteria`.
 .. index:: gpu::integral
-cv::gpu::integral
+gpu::integral
 -----------------
-.. c:function:: void gpu::integral(const GpuMat\& src, GpuMat\& sum)
+.. cpp:function:: void gpu::integral(const GpuMat& src, GpuMat& sum)
-.. c:function:: void gpu::integral(const GpuMat\& src, GpuMat\& sum, GpuMat\& sqsum)
+.. cpp:function:: void gpu::integral(const GpuMat& src, GpuMat& sum, GpuMat& sqsum)
    Computes integral image and squared integral image.
    :param src: Source image. Only ``CV_8UC1`` images are supported for now.
-    :param sum: Integral image. Will contain 32-bit unsigned integer values packed into  ``CV_32SC1`` .
+    :param sum: Integral image. Will contain 32-bit unsigned integer values packed into ``CV_32SC1``.
    :param sqsum: Squared integral image. Will have ``CV_32FC1`` type.
-See also:
+See also: :c:func:`integral`.
-:func:`integral` .
 .. index:: gpu::sqrIntegral
-cv::gpu::sqrIntegral
+gpu::sqrIntegral
 --------------------
-.. c:function:: void gpu::sqrIntegral(const GpuMat\& src, GpuMat\& sqsum)
+.. cpp:function:: void gpu::sqrIntegral(const GpuMat& src, GpuMat& sqsum)
    Computes squared integral image.
    :param src: Source image. Only ``CV_8UC1`` images are supported for now.
-    :param sqsum: Squared integral image. Will contain 64-bit unsigned integer values packed into  ``CV_64FC1`` .
+    :param sqsum: Squared integral image. Will contain 64-bit unsigned integer values packed into ``CV_64FC1``.
 .. index:: gpu::columnSum
-cv::gpu::columnSum
+gpu::columnSum
 ------------------
-.. c:function:: void gpu::columnSum(const GpuMat\& src, GpuMat\& sum)
+.. cpp:function:: void gpu::columnSum(const GpuMat& src, GpuMat& sum)
    Computes vertical (column) sum.
@@ -116,13 +117,13 @@ cv::gpu::columnSum
    :param sum: Destination image. Will have ``CV_32FC1`` type.
 .. index:: gpu::cornerHarris
-cv::gpu::cornerHarris
+gpu::cornerHarris
 ---------------------
-.. c:function:: void gpu::cornerHarris(const GpuMat\& src, GpuMat\& dst,
+.. cpp:function:: void gpu::cornerHarris(const GpuMat& src, GpuMat& dst, int blockSize, int ksize, double k, int borderType=BORDER_REFLECT101)
-   int blockSize, int ksize, double k,
-   int borderType=BORDER_REFLECT101)
    Computes Harris cornerness criteria at each image pixel.
@@ -138,16 +139,15 @@ cv::gpu::cornerHarris
    :param borderType: Pixel extrapolation method. Only ``BORDER_REFLECT101`` and ``BORDER_REPLICATE`` are supported for now.
-See also:
+See also: :c:func:`cornerHarris`.
-:func:`cornerHarris` .
 .. index:: gpu::cornerMinEigenVal
-cv::gpu::cornerMinEigenVal
+gpu::cornerMinEigenVal
 --------------------------
-.. c:function:: void gpu::cornerMinEigenVal(const GpuMat\& src, GpuMat\& dst,
+.. cpp:function:: void gpu::cornerMinEigenVal(const GpuMat& src, GpuMat& dst, int blockSize, int ksize, int borderType=BORDER_REFLECT101)
-   int blockSize, int ksize,
-   int borderType=BORDER_REFLECT101)
    Computes minimum eigen value of 2x2 derivative covariation matrix at each pixel - the cornerness criteria.
@@ -163,21 +163,21 @@ cv::gpu::cornerMinEigenVal
    :param borderType: Pixel extrapolation method. Only ``BORDER_REFLECT101`` and ``BORDER_REPLICATE`` are supported for now.
-See also:
+See also: :c:func:`cornerMinEigenValue`.
-:func:`cornerMinEigenValue` .
 .. index:: gpu::mulSpectrums
-cv::gpu::mulSpectrums
+gpu::mulSpectrums
 ---------------------
-.. c:function:: void gpu::mulSpectrums(const GpuMat\& a, const GpuMat\& b,
+.. cpp:function:: void gpu::mulSpectrums(const GpuMat& a, const GpuMat& b, GpuMat& c, int flags, bool conjB=false)
-   GpuMat\& c, int flags, bool conjB=false)
    Performs per-element multiplication of two Fourier spectrums.
    :param a: First spectrum.
-    :param b: Second spectrum. Must have the same size and type as  ``a`` .
+    :param b: Second spectrum. Must have the same size and type as ``a``.
    :param c: Destination spectrum.
@@ -187,21 +187,21 @@ cv::gpu::mulSpectrums
 Only full (i.e. not packed) ``CV_32FC2`` complex spectrums in the interleaved format are supported for now.
-See also:
+See also: :c:func:`mulSpectrums`.
-:func:`mulSpectrums` .
 .. index:: gpu::mulAndScaleSpectrums
-cv::gpu::mulAndScaleSpectrums
+gpu::mulAndScaleSpectrums
 -----------------------------
-.. c:function:: void gpu::mulAndScaleSpectrums(const GpuMat\& a, const GpuMat\& b,
+.. cpp:function:: void gpu::mulAndScaleSpectrums(const GpuMat& a, const GpuMat& b, GpuMat& c, int flags, float scale, bool conjB=false)
-   GpuMat\& c, int flags, float scale, bool conjB=false)
    Performs per-element multiplication of two Fourier spectrums and scales the result.
    :param a: First spectrum.
-    :param b: Second spectrum. Must have the same size and type as  ``a`` .
+    :param b: Second spectrum. Must have the same size and type as ``a``.
    :param c: Destination spectrum.
@@ -213,16 +213,17 @@ cv::gpu::mulAndScaleSpectrums
 Only full (i.e. not packed) ``CV_32FC2`` complex spectrums in the interleaved format are supported for now.
-See also:
+See also: :c:func:`mulSpectrums`.
-:func:`mulSpectrums` .
 .. index:: gpu::dft
-cv::gpu::dft
+gpu::dft
 ------------
-.. c:function:: void gpu::dft(const GpuMat\& src, GpuMat\& dst, Size dft_size, int flags=0)
+.. cpp:function:: void gpu::dft(const GpuMat& src, GpuMat& dst, Size dft_size, int flags=0)
-    Performs a forward or inverse discrete Fourier transform (1D or 2D) of floating point matrix. Can handle real matrices (CV32FC1) and complex matrices in the interleaved format (CV32FC2).
+    Performs a forward or inverse discrete Fourier transform (1D or 2D) of floating point matrix. Can handle real matrices ``CV32FC1`` and complex matrices in the interleaved format ``CV32FC2``.
    :param src: Source matrix (real or complex).
@@ -234,61 +235,55 @@ cv::gpu::dft
            * **DFT_ROWS** Transform each individual row of the source matrix.
-            * **DFT_SCALE** Scale the result: divide it by the number of elements in the transform (it's obtained from  ``dft_size`` ).
+            * **DFT_SCALE** Scale the result: divide it by the number of elements in the transform (it's obtained from ``dft_size``).
            * **DFT_INVERSE** Inverse DFT must be perfromed for complex-complex case (real-complex and complex-real cases are respectively forward and inverse always).
            * **DFT_REAL_OUTPUT** The source matrix is the result of real-complex transform, so the destination matrix must be real.
 The source matrix should be continuous, otherwise reallocation and data copying will be performed. Function chooses the operation mode depending on the flags, size and channel count of the source matrix:
-*
+* If the source matrix is complex and the output isn't specified as real then the destination matrix will be complex, will have ``dft_size`` size and ``CV_32FC2`` type. It will contain full result of the DFT (forward or inverse).
-    If the source matrix is complex and the output isn't specified as real then the destination matrix will be complex, will have ``dft_size``     size and ``CV_32FC2``     type. It will contain full result of the DFT (forward or inverse).
+* If the source matrix is complex and the output is specified as real then function assumes that its input is the result of the forward transform (see next item). The destionation matrix will have ``dft_size`` size and ``CV_32FC1`` type. It will contain result of the inverse DFT.
-*
+* If the source matrix is real (i.e. its type is ``CV_32FC1``) then forward DFT will be performed. The result of the DFT will be packed into complex (``CV_32FC2``) matrix so its width will be ``dft_size.width / 2 + 1``, but if the source is a single column then height will be reduced instead of width.
-    If the source matrix is complex and the output is specified as real then function assumes that its input is the result of the forward transform (see next item). The destionation matrix will have ``dft_size``     size and ``CV_32FC1``     type. It will contain result of the inverse DFT.
+See also: :c:func:`dft`.
-*
-    If the source matrix is real (i.e. its type is ``CV_32FC1``     ) then forward DFT will be performed. The result of the DFT will be packed into complex ( ``CV_32FC2``     ) matrix so its width will be ``dft_size.width / 2 + 1``     , but if the source is a single column then height will be reduced instead of width.
-See also:
-:func:`dft` .
 .. index:: gpu::convolve
-cv::gpu::convolve
+gpu::convolve
 -----------------
-.. c:function:: void gpu::convolve(const GpuMat\& image, const GpuMat\& templ, GpuMat\& result,
+.. cpp:function:: void gpu::convolve(const GpuMat& image, const GpuMat& templ, GpuMat& result, bool ccorr=false)
-   bool ccorr=false)
-.. c:function:: void gpu::convolve(const GpuMat\& image, const GpuMat\& templ, GpuMat\& result,
+.. cpp:function:: void gpu::convolve(const GpuMat& image, const GpuMat& templ, GpuMat& result, bool ccorr, ConvolveBuf& buf)
-   bool ccorr, ConvolveBuf\& buf)
    Computes convolution (or cross-correlation) of two images.
    :param image: Source image. Only ``CV_32FC1`` images are supported for now.
-    :param templ: Template image. Must have size not greater then  ``image``  size and be the same type as  ``image`` .
+    :param templ: Template image. Must have size not greater then ``image`` size and be the same type as ``image``.
-    :param result: Result image. Will have the same size and type as  ``image`` .
+    :param result: Result image. Will have the same size and type as ``image``.
    :param ccorr: Flags which indicates cross-correlation must be evaluated instead of convolution.
    :param buf: Optional buffer to avoid extra memory allocations (for many calls with the same sizes).
-.. index:: gpu::ConvolveBuf
-.. _gpu::ConvolveBuf:
+.. index:: gpu::ConvolveBuf
 gpu::ConvolveBuf
 ----------------
 .. c:type:: gpu::ConvolveBuf
-Memory buffer for the
+Memory buffer for the :cpp:func:`gpu::convolve` function. ::
-:func:`gpu::convolve` function. ::
-    struct CV_EXPORTS ConvolveBuf
+    struct ConvolveBuf
    {
        ConvolveBuf() {}
        ConvolveBuf(Size image_size, Size templ_size)
@@ -300,32 +295,34 @@ Memory buffer for the
    };
 .. index:: gpu::ConvolveBuf::ConvolveBuf
-cv::gpu::ConvolveBuf::ConvolveBuf
+gpu::ConvolveBuf::ConvolveBuf
 ---------------------------------
-.. c:function:: ConvolveBuf::ConvolveBuf()
+.. cpp:function:: gpu::ConvolveBuf::ConvolveBuf()
 Constructs an empty buffer which will be properly resized after first call of the convolve function.
-.. c:function:: ConvolveBuf::ConvolveBuf(Size image_size, Size templ_size)
+.. cpp:function:: gpu::ConvolveBuf::ConvolveBuf(Size image_size, Size templ_size)
 Constructs a buffer for the convolve function with respectively arguments.
 .. index:: gpu::matchTemplate
-cv::gpu::matchTemplate
+gpu::matchTemplate
 ----------------------
-.. c:function:: void gpu::matchTemplate(const GpuMat\& image, const GpuMat\& templ,
+.. cpp:function:: void gpu::matchTemplate(const GpuMat& image, const GpuMat& templ, GpuMat& result, int method)
-   GpuMat\& result, int method)
    Computes a proximity map for a raster template and an image where the template is searched for.
    :param image: Source image. ``CV_32F`` and ``CV_8U`` depth images (1..4 channels) are supported for now.
-    :param templ: Template image. Must have the same size and type as  ``image`` .
+    :param templ: Template image. Must have the same size and type as ``image``.
-    :param result: Map containing comparison results ( ``CV_32FC1`` ). If  ``image``  is  :math:`W \times H`  and ``templ``  is  :math:`w \times h`  then  ``result``  must be  :math:`(W-w+1) \times (H-h+1)` .
+    :param result: Map containing comparison results (``CV_32FC1``). If ``image`` is ``W`` :math:`\times` ``H`` and ``templ`` is ``w`` :math:`\times` ``h`` then ``result`` must be ``(W-w+1)`` :math:`\times` ``(H-h+1)``.
    :param method: Specifies the way which the template must be compared with the image.
@@ -343,20 +340,21 @@ Following methods are supported for the ``CV_32F`` images for now:
 * CV_TM_SQDIFF
 * CV_TM_CCORR
-See also:
+See also: :c:func:`matchTemplate`.
-:func:`matchTemplate` .
 .. index:: gpu::remap
-cv::gpu::remap
+gpu::remap
 --------------
-.. c:function:: void gpu::remap(const GpuMat\& src, GpuMat\& dst,  const GpuMat\& xmap, const GpuMat\& ymap)
+.. cpp:function:: void gpu::remap(const GpuMat& src, GpuMat& dst,  const GpuMat& xmap, const GpuMat& ymap)
    Applies a generic geometrical transformation to an image.
    :param src: Source image. Only ``CV_8UC1`` and ``CV_8UC3`` source types are supported.
-    :param dst: Destination image. It will have the same size as  ``xmap``  and the same type as  ``src`` .
+    :param dst: Destination image. It will have the same size as ``xmap`` and the same type as ``src``.
    :param xmap: X values. Only ``CV_32FC1`` type is supported.
@@ -366,81 +364,83 @@ The function transforms the source image using the specified map:
 .. math::
-    \texttt{dst} (x,y) =  \texttt{src} (xmap(x,y), ymap(x,y))
+    dst(x,y) = src(xmap(x,y), ymap(x,y))
 Values of pixels with non-integer coordinates are computed using bilinear interpolation.
-See also:
+See also: :c:func:`remap`.
-:func:`remap` .
 .. index:: gpu::cvtColor
-cv::gpu::cvtColor
+gpu::cvtColor
 -----------------
-.. c:function:: void gpu::cvtColor(const GpuMat\& src, GpuMat\& dst, int code, int dcn = 0)
+.. cpp:function:: void gpu::cvtColor(const GpuMat& src, GpuMat& dst, int code, int dcn = 0)
-.. c:function:: void gpu::cvtColor(const GpuMat\& src, GpuMat\& dst, int code, int dcn,  const Stream\& stream)
+.. cpp:function:: void gpu::cvtColor(const GpuMat& src, GpuMat& dst, int code, int dcn,  const Stream& stream)
    Converts image from one color space to another.
-    :param src: Source image with  ``CV_8U`` ,  ``CV_16U``  or  ``CV_32F``  depth and 1, 3 or 4 channels.
+    :param src: Source image with ``CV_8U``, ``CV_16U`` or ``CV_32F`` depth and 1, 3 or 4 channels.
-    :param dst: Destination image; will have the same size and the same depth as  ``src`` .
+    :param dst: Destination image; will have the same size and the same depth as ``src``.
-    :param code: Color space conversion code. For details see  :func:`cvtColor` . Conversion to/from Luv and Bayer color spaces doesn't supported.
+    :param code: Color space conversion code. For details see :c:func:`cvtColor`. Conversion to/from Luv and Bayer color spaces doesn't supported.
-    :param dcn: Number of channels in the destination image; if the parameter is 0, the number of the channels will be derived automatically from  ``src``  and the  ``code`` .
+    :param dcn: Number of channels in the destination image; if the parameter is 0, the number of the channels will be derived automatically from ``src`` and the ``code``.
    :param stream: Stream for the asynchronous version.
-3-channel color spaces (like ``HSV``,``XYZ`` , etc) can be stored to 4-channel image for better perfomance.
+3-channel color spaces (like ``HSV``, ``XYZ``, etc) can be stored to 4-channel image for better perfomance.
+See also: :c:func:`cvtColor`.
-See also:
-:func:`cvtColor` .
 .. index:: gpu::threshold
-cv::gpu::threshold
+gpu::threshold
 ------------------
-.. c:function:: double gpu::threshold(const GpuMat\& src, GpuMat\& dst, double thresh,  double maxval, int type)
+.. cpp:function:: double gpu::threshold(const GpuMat& src, GpuMat& dst, double thresh, double maxval, int type)
-.. c:function:: double gpu::threshold(const GpuMat\& src, GpuMat\& dst, double thresh,  double maxval, int type, const Stream\& stream)
+.. cpp:function:: double gpu::threshold(const GpuMat& src, GpuMat& dst, double thresh, double maxval, int type, const Stream& stream)
    Applies a fixed-level threshold to each array element.
    :param src: Source array (single-channel, ``CV_64F`` depth isn't supported).
-    :param dst: Destination array; will have the same size and the same type as  ``src`` .
+    :param dst: Destination array; will have the same size and the same type as ``src``.
    :param thresh: Threshold value.
    :param maxVal: Maximum value to use with ``THRESH_BINARY`` and ``THRESH_BINARY_INV`` thresholding types.
-    :param thresholdType: Thresholding type. For details see  :func:`threshold` .  ``THRESH_OTSU``  thresholding type doesn't supported.
+    :param thresholdType: Thresholding type. For details see :c:func:`threshold`. ``THRESH_OTSU`` thresholding type doesn't supported.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`threshold`.
-:func:`threshold` .
 .. index:: gpu::resize
-cv::gpu::resize
+gpu::resize
 ---------------
-.. c:function:: void gpu::resize(const GpuMat\& src, GpuMat\& dst, Size dsize,  double fx=0, double fy=0,  int interpolation = INTER_LINEAR)
+.. cpp:function:: void gpu::resize(const GpuMat& src, GpuMat& dst, Size dsize, double fx=0, double fy=0, int interpolation = INTER_LINEAR)
    Resizes an image.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` types.
-    :param dst: Destination image. It will have size  ``dsize``  (when it is non-zero) or the size computed from  ``src.size()``  and  ``fx``  and  ``fy`` . The type of  ``dst``  will be the same as of  ``src`` .
+    :param dst: Destination image. It will have size ``dsize`` (when it is non-zero) or the size computed from ``src.size()`` and ``fx`` and ``fy``. The type of ``dst`` will be the same as of ``src``.
    :param dsize: Destination image size. If it is zero, then it is computed as: 
        .. math::
+            dsize = Size(round(fx*src.cols), round(fy*src.rows))
- \texttt{dsize = Size(round(fx*src.cols), round(fy*src.rows))} 
        Either ``dsize`` or both ``fx`` or ``fy`` must be non-zero.
@@ -448,75 +448,75 @@ cv::gpu::resize
        .. math::
+            (double)dsize.width/src.cols
- \texttt{(double)dsize.width/src.cols} 
    :param fy: Scale factor along the vertical axis. When 0, it is computed as 
        .. math::
+            (double)dsize.height/src.rows
- \texttt{(double)dsize.height/src.rows} 
+    :param interpolation: Interpolation method. Supports only ``INTER_NEAREST`` and ``INTER_LINEAR``.
+See also: :c:func:`resize`.
-    :param interpolation: Interpolation method. Supports only  ``INTER_NEAREST``  and  ``INTER_LINEAR`` .
-See also:
-:func:`resize` .
 .. index:: gpu::warpAffine
-cv::gpu::warpAffine
+gpu::warpAffine
 -------------------
-.. c:function:: void gpu::warpAffine(const GpuMat\& src, GpuMat\& dst, const Mat\& M,  Size dsize, int flags = INTER_LINEAR)
+.. cpp:function:: void gpu::warpAffine(const GpuMat& src, GpuMat& dst, const Mat& M, Size dsize, int flags = INTER_LINEAR)
    Applies an affine transformation to an image.
-    :param src: Source image. Supports  ``CV_8U`` ,  ``CV_16U`` ,  ``CV_32S``  or  ``CV_32F``  depth and 1, 3 or 4 channels.
+    :param src: Source image. Supports ``CV_8U``, ``CV_16U``, ``CV_32S`` or ``CV_32F`` depth and 1, 3 or 4 channels.
-    :param dst: Destination image; will have size  ``dsize``  and the same type as  ``src`` .
+    :param dst: Destination image; will have size ``dsize`` and the same type as ``src``.
-    :param M: :math:`2\times 3`  transformation matrix.
+    :param M: :math:`2 \times 3`  transformation matrix.
    :param dsize: Size of the destination image.
-    :param flags: Combination of interpolation methods, see  :func:`resize` , and the optional flag  ``WARP_INVERSE_MAP``  that means that  ``M``  is the inverse transformation ( :math:`\texttt{dst}\rightarrow\texttt{src}` ). Supports only  ``INTER_NEAREST`` ,  ``INTER_LINEAR``  and  ``INTER_CUBIC``  interpolation methods.
+    :param flags: Combination of interpolation methods, see :c:func:`resize`, and the optional flag ``WARP_INVERSE_MAP`` that means that ``M`` is the inverse transformation(:math:`dst \rightarrow src` ). Supports only  ``INTER_NEAREST``, ``INTER_LINEAR`` and ``INTER_CUBIC`` interpolation methods.
+See also: :c:func:`warpAffine`.
-See also:
-:func:`warpAffine` .
 .. index:: gpu::warpPerspective
-cv::gpu::warpPerspective
+gpu::warpPerspective
 ------------------------
-.. c:function:: void gpu::warpPerspective(const GpuMat\& src, GpuMat\& dst, const Mat\& M,  Size dsize, int flags = INTER_LINEAR)
+.. cpp:function:: void gpu::warpPerspective(const GpuMat& src, GpuMat& dst, const Mat& M, Size dsize, int flags = INTER_LINEAR)
    Applies a perspective transformation to an image.
-    :param src: Source image. Supports  ``CV_8U`` ,  ``CV_16U`` ,  ``CV_32S``  or  ``CV_32F``  depth and 1, 3 or 4 channels.
+    :param src: Source image. Supports ``CV_8U``, ``CV_16U``, ``CV_32S`` or ``CV_32F`` depth and 1, 3 or 4 channels.
-    :param dst: Destination image; will have size  ``dsize``  and the same type as  ``src`` .
+    :param dst: Destination image; will have size ``dsize`` and the same type as ``src``.
-    :param M: :math:`2
+    :param M: :math:`2 \times 3` transformation matrix.
-         3`  transformation matrix.
    :param dsize: Size of the destination image.
-    :param flags: Combination of interpolation methods, see  :func:`resize` , and the optional flag  ``WARP_INVERSE_MAP``  that means that  ``M``  is the inverse transformation ( :math:`\texttt{dst}\rightarrow\texttt{src}` ). Supports only  ``INTER_NEAREST`` ,  ``INTER_LINEAR``  and  ``INTER_CUBIC``  interpolation methods.
+    :param flags: Combination of interpolation methods, see :c:func:`resize`, and the optional flag ``WARP_INVERSE_MAP`` that means that ``M`` is the inverse transformation (:math:`dst \rightarrow src` ). Supports only  ``INTER_NEAREST``, ``INTER_LINEAR`` and ``INTER_CUBIC`` interpolation methods.
+See also: :c:func:`warpPerspective`.
-See also:
-:func:`warpPerspective` .
 .. index:: gpu::rotate
-cv::gpu::rotate
+gpu::rotate
 ---------------
-.. c:function:: void gpu::rotate(const GpuMat\& src, GpuMat\& dst, Size dsize,  double angle, double xShift = 0, double yShift = 0,  int interpolation = INTER_LINEAR)
+.. cpp:function:: void gpu::rotate(const GpuMat& src, GpuMat& dst, Size dsize, double angle, double xShift = 0, double yShift = 0, int interpolation = INTER_LINEAR)
    Rotates an image around the origin (0,0) and then shifts it.
    :param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` types.
-    :param dst: Destination image; will have size  ``dsize``  and the same type as  ``src`` .
+    :param dst: Destination image; will have size ``dsize`` and the same type as ``src``.
    :param dsize: Size of the destination image.
@@ -526,34 +526,37 @@ cv::gpu::rotate
    :param yShift: Shift along vertical axis.
-    :param interpolation: Interpolation method. Supports only  ``INTER_NEAREST`` ,  ``INTER_LINEAR``  and  ``INTER_CUBIC`` .
+    :param interpolation: Interpolation method. Supports only ``INTER_NEAREST``, ``INTER_LINEAR`` and ``INTER_CUBIC``.
+See also: :cpp:func:`gpu::warpAffine`.
-See also:
-:func:`gpu::warpAffine` .
 .. index:: gpu::copyMakeBorder
-cv::gpu::copyMakeBorder
+gpu::copyMakeBorder
 -----------------------
-.. c:function:: void gpu::copyMakeBorder(const GpuMat\& src, GpuMat\& dst,  int top, int bottom, int left, int right,  const Scalar\& value = Scalar())
+.. cpp:function:: void gpu::copyMakeBorder(const GpuMat& src, GpuMat& dst, int top, int bottom, int left, int right, const Scalar& value = Scalar())
    Copies 2D array to a larger destination array and pads borders with the given constant.
-    :param src: Source image. Supports  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  types.
+    :param src: Source image. Supports ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` types.
-    :param dst: The destination image; will have the same type as  ``src``  and the size  ``Size(src.cols+left+right, src.rows+top+bottom)`` .
+    :param dst: The destination image; will have the same type as ``src`` and the size ``Size(src.cols+left+right, src.rows+top+bottom)``.
    :param top, bottom, left, right: Specify how much pixels in each direction from the source image rectangle one needs to extrapolate, e.g. ``top=1, bottom=1, left=1, right=1`` mean that 1 pixel-wide border needs to be built.
    :param value: Border value.
-See also:
+See also: :c:func:`copyMakeBorder`.
-:func:`copyMakeBorder`
 .. index:: gpu::rectStdDev
-cv::gpu::rectStdDev
+gpu::rectStdDev
 -------------------
-.. c:function:: void gpu::rectStdDev(const GpuMat\& src, const GpuMat\& sqr, GpuMat\& dst,  const Rect\& rect)
+.. cpp:function:: void gpu::rectStdDev(const GpuMat& src, const GpuMat& sqr, GpuMat& dst, const Rect& rect)
    Computes standard deviation of integral images.
@@ -561,15 +564,17 @@ cv::gpu::rectStdDev
    :param sqr: Squared source image. Supports only ``CV_32FC1`` type.
-    :param dst: Destination image; will have the same type and the same size as  ``src`` .
+    :param dst: Destination image; will have the same type and the same size as ``src``.
    :param rect: Rectangular window.
 .. index:: gpu::evenLevels
-cv::gpu::evenLevels
+gpu::evenLevels
 -------------------
-.. c:function:: void gpu::evenLevels(GpuMat\& levels, int nLevels,  int lowerLevel, int upperLevel)
+.. cpp:function:: void gpu::evenLevels(GpuMat& levels, int nLevels, int lowerLevel, int upperLevel)
    Computes levels with even distribution.
@@ -581,17 +586,19 @@ cv::gpu::evenLevels
    :param upperLevel: Upper boundary value of the greatest level.
 .. index:: gpu::histEven
-cv::gpu::histEven
+gpu::histEven
 -----------------
-.. c:function:: void gpu::histEven(const GpuMat\& src, GpuMat\& hist,  int histSize, int lowerLevel, int upperLevel)
+.. cpp:function:: void gpu::histEven(const GpuMat& src, GpuMat& hist, int histSize, int lowerLevel, int upperLevel)
-.. c:function:: void gpu::histEven(const GpuMat\& src, GpuMat hist[4],  int histSize[4], int lowerLevel[4], int upperLevel[4])
+.. cpp:function:: void gpu::histEven(const GpuMat& src, GpuMat hist[4], int histSize[4], int lowerLevel[4], int upperLevel[4])
    Calculates histogram with evenly distributed bins.
-    :param src: Source image. Supports  ``CV_8U`` ,  ``CV_16U``  or  ``CV_16S``  depth and 1 or 4 channels. For four-channel image all channels are processed separately.
+    :param src: Source image. Supports ``CV_8U``, ``CV_16U`` or ``CV_16S`` depth and 1 or 4 channels. For four-channel image all channels are processed separately.
    :param hist: Destination histogram. Will have one row, ``histSize`` cols and ``CV_32S`` type.
@@ -601,19 +608,20 @@ cv::gpu::histEven
    :param upperLevel: Upper boundary of highest level bin.
 .. index:: gpu::histRange
-cv::gpu::histRange
+gpu::histRange
 ------------------
-.. c:function:: void gpu::histRange(const GpuMat\& src, GpuMat\& hist, const GpuMat\& levels)
+.. cpp:function:: void gpu::histRange(const GpuMat& src, GpuMat& hist, const GpuMat& levels)
-.. c:function:: void gpu::histRange(const GpuMat\& src, GpuMat hist[4],  const GpuMat levels[4])
+.. cpp:function:: void gpu::histRange(const GpuMat& src, GpuMat hist[4],  const GpuMat levels[4])
    Calculates histogram with bins determined by levels array.
-    :param src: Source image. Supports  ``CV_8U`` ,  ``CV_16U``  or  ``CV_16S``  depth and 1 or 4 channels. For four-channel image all channels are processed separately.
+    :param src: Source image. Supports ``CV_8U``, ``CV_16U`` or ``CV_16S`` depth and 1 or 4 channels. For four-channel image all channels are processed separately.
    :param hist: Destination histogram. Will have one row, ``(levels.cols-1)`` cols and ``CV_32SC1`` type.
    :param levels: Number of levels in histogram.
--- a/modules/gpu/doc/initalization_and_information.rst
+++ b/modules/gpu/doc/initalization_and_information.rst
@@ -3,35 +3,41 @@ Initalization and Information
 .. highlight:: cpp
 .. index:: gpu::getCudaEnabledDeviceCount
-cv::gpu::getCudaEnabledDeviceCount
+gpu::getCudaEnabledDeviceCount
 ----------------------------------
-.. c:function:: int getCudaEnabledDeviceCount()
+.. cpp:function:: int gpu::getCudaEnabledDeviceCount()
    Returns number of CUDA-enabled devices installed. It is to be used before any other GPU functions calls. If OpenCV is compiled without GPU support this function returns 0.
 .. index:: gpu::setDevice
-cv::gpu::setDevice
+gpu::setDevice
 ------------------
-.. c:function:: void setDevice(int device)
+.. cpp:function:: void gpu::setDevice(int device)
    Sets device and initializes it for the current thread. Call of this function can be omitted, but in this case a default device will be initialized on fist GPU usage.
    :param device: index of GPU device in system starting with 0.
 .. index:: gpu::getDevice
-cv::gpu::getDevice
+gpu::getDevice
 ------------------
-.. c:function:: int getDevice()
+.. cpp:function:: int gpu::getDevice()
-    Returns the current device index, which was set by {gpu::getDevice} or initialized by default.
+    Returns the current device index, which was set by :cpp:func:`gpu::setDevice` or initialized by default.
-.. index:: gpu::GpuFeature
-.. _gpu::GpuFeature:
+.. index:: gpu::GpuFeature
 gpu::GpuFeature
 ---------------
@@ -48,17 +54,16 @@ GPU compute features. ::
    };
-.. index:: gpu::DeviceInfo
-.. _gpu::DeviceInfo:
+.. index:: gpu::DeviceInfo
 gpu::DeviceInfo
 ---------------
-.. c:type:: gpu::DeviceInfo
+.. cpp:class:: gpu::DeviceInfo
 This class provides functionality for querying the specified GPU properties. ::
-    class CV_EXPORTS DeviceInfo
+    class DeviceInfo
    {
    public:
        DeviceInfo();
@@ -79,87 +84,104 @@ This class provides functionality for querying the specified GPU properties. ::
    };
 .. index:: gpu::DeviceInfo::DeviceInfo
-cv::gpu::DeviceInfo::DeviceInfo
+gpu::DeviceInfo::DeviceInfo
------------------------------- ``_``
+-------------------------------
-.. c:function:: DeviceInfo::DeviceInfo()
+.. cpp:function:: gpu::DeviceInfo::DeviceInfo()
-.. c:function:: DeviceInfo::DeviceInfo(int device_id)
+.. cpp:function:: gpu::DeviceInfo::DeviceInfo(int device_id)
-    Constructs DeviceInfo object for the specified device. If deviceidparameter is missed it constructs object for the current device.
+    Constructs :cpp:class:`gpu::DeviceInfo` object for the specified device. If ``device_id`` parameter is missed it constructs object for the current device.
    :param device_id: Index of the GPU device in system starting with 0.
 .. index:: gpu::DeviceInfo::name
-cv::gpu::DeviceInfo::name
+gpu::DeviceInfo::name
 -------------------------
-.. c:function:: string DeviceInfo::name()
+.. cpp:function:: string gpu::DeviceInfo::name()
    Returns the device name.
 .. index:: gpu::DeviceInfo::majorVersion
-cv::gpu::DeviceInfo::majorVersion
+gpu::DeviceInfo::majorVersion
 ---------------------------------
-.. c:function:: int DeviceInfo::majorVersion()
+.. cpp:function:: int gpu::DeviceInfo::majorVersion()
    Returns the major compute capability version.
 .. index:: gpu::DeviceInfo::minorVersion
-cv::gpu::DeviceInfo::minorVersion
+gpu::DeviceInfo::minorVersion
 ---------------------------------
-.. c:function:: int DeviceInfo::minorVersion()
+.. cpp:function:: int gpu::DeviceInfo::minorVersion()
    Returns the minor compute capability version.
 .. index:: gpu::DeviceInfo::multiProcessorCount
-cv::gpu::DeviceInfo::multiProcessorCount
+gpu::DeviceInfo::multiProcessorCount
 ----------------------------------------
-.. c:function:: int DeviceInfo::multiProcessorCount()
+.. cpp:function:: int gpu::DeviceInfo::multiProcessorCount()
    Returns the number of streaming multiprocessors.
 .. index:: gpu::DeviceInfo::freeMemory
-cv::gpu::DeviceInfo::freeMemory
+gpu::DeviceInfo::freeMemory
 -------------------------------
-.. c:function:: size_t DeviceInfo::freeMemory()
+.. cpp:function:: size_t gpu::DeviceInfo::freeMemory()
    Returns the amount of free memory in bytes.
 .. index:: gpu::DeviceInfo::totalMemory
-cv::gpu::DeviceInfo::totalMemory
+gpu::DeviceInfo::totalMemory
 --------------------------------
-.. c:function:: size_t DeviceInfo::totalMemory()
+.. cpp:function:: size_t gpu::DeviceInfo::totalMemory()
    Returns the amount of total memory in bytes.
 .. index:: gpu::DeviceInfo::supports
-cv::gpu::DeviceInfo::supports
+gpu::DeviceInfo::supports
 -----------------------------
-.. c:function:: bool DeviceInfo::supports(GpuFeature feature)
+.. cpp:function:: bool gpu::DeviceInfo::supports(GpuFeature feature)
    Returns true if the device has the given GPU feature, otherwise false.
-    :param feature: Feature to be checked. See  .
+    :param feature: Feature to be checked. See :c:type:`gpu::GpuFeature`.
 .. index:: gpu::DeviceInfo::isCompatible
-cv::gpu::DeviceInfo::isCompatible
+gpu::DeviceInfo::isCompatible
 ---------------------------------
-.. c:function:: bool DeviceInfo::isCompatible()
+.. cpp:function:: bool gpu::DeviceInfo::isCompatible()
    Returns true if the GPU module can be run on the specified device, otherwise false.
-.. index:: gpu::TargetArchs
-.. _gpu::TargetArchs:
+.. index:: gpu::TargetArchs
 gpu::TargetArchs
 ----------------
@@ -167,32 +189,110 @@ gpu::TargetArchs
 This class provides functionality (as set of static methods) for checking which NVIDIA card architectures the GPU module was built for.
-bigskip
 The following method checks whether the module was built with the support of the given feature:
-.. c:function:: static bool builtWith(GpuFeature feature)
+.. cpp:function:: static bool gpu::TargetArchs::builtWith(GpuFeature feature) 
-    :param feature: Feature to be checked. See  .
+    :param feature: Feature to be checked. See :c:type:`gpu::GpuFeature`.
 There are a set of methods for checking whether the module contains intermediate (PTX) or binary GPU code for the given architecture(s):
-.. c:function:: static bool has(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::has(int major, int minor)
-.. c:function:: static bool hasPtx(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasPtx(int major, int minor)
-.. c:function:: static bool hasBin(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasBin(int major, int minor)
-.. c:function:: static bool hasEqualOrLessPtx(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasEqualOrLessPtx(int major, int minor)
-.. c:function:: static bool hasEqualOrGreater(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasEqualOrGreater(int major, int minor)
-.. c:function:: static bool hasEqualOrGreaterPtx(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasEqualOrGreaterPtx(int major, int minor)
-.. c:function:: static bool hasEqualOrGreaterBin(int major, int minor)
+.. cpp:function:: static bool gpu::TargetArchs::hasEqualOrGreaterBin(int major, int minor)
-    * **major** Major compute capability version.
+    :param major: Major compute capability version.
-    * **minor** Minor compute capability version.
+    :param minor: Minor compute capability version.
 According to the CUDA C Programming Guide Version 3.2: "PTX code produced for some specific compute capability can always be compiled to binary code of greater or equal compute capability".
+.. index:: gpu::MultiGpuManager
+gpu::MultiGpuManager
+--------------------
+.. c:type:: gpu::MultiGpuManager
+Provides functionality for working with many GPUs. ::
+    class MultiGpuManager
+    {
+    public:
+        MultiGpuManager();
+        ~MultiGpuManager();
+        // Must be called before any other GPU calls
+        void init();
+        // Makes the given GPU active
+        void gpuOn(int gpu_id);
+        // Finishes the piece of work on the current GPU
+        void gpuOff();
+        static const int BAD_GPU_ID;
+    };
+.. index:: gpu::MultiGpuManager::MultiGpuManager
+gpu::MultiGpuManager::MultiGpuManager
+----------------------------------------
+.. cpp:function:: gpu::MultiGpuManager::MultiGpuManager()
+    Creates multi GPU manager, but doesn't initialize it.
+.. index:: gpu::MultiGpuManager::~MultiGpuManager
+gpu::MultiGpuManager::~MultiGpuManager
+----------------------------------------
+.. cpp:function:: gpu::MultiGpuManager::~MultiGpuManager()
+    Releases multi GPU manager.
+.. index:: gpu::MultiGpuManager::init
+gpu::MultiGpuManager::init
+----------------------------------------
+.. cpp:function:: void gpu::MultiGpuManager::init()
+    Initializes multi GPU manager.
+.. index:: gpu::MultiGpuManager::gpuOn
+gpu::MultiGpuManager::gpuOn
+----------------------------------------
+.. cpp:function:: void gpu::MultiGpuManager::gpuOn(int gpu_id)
+    Makes the given GPU active.
+    :param gpu_id: Index of the GPU device in system starting with 0.
+.. index:: gpu::MultiGpuManager::gpuOff
+gpu::MultiGpuManager::gpuOff
+----------------------------------------
+.. cpp:function:: void gpu::MultiGpuManager::gpuOff()
+    Finishes the piece of work on the current GPU.
--- a/modules/gpu/doc/introduction.rst
+++ b/modules/gpu/doc/introduction.rst
@@ -14,10 +14,7 @@ The GPU module depends on the Cuda Toolkit and NVidia Performance Primitives lib
 OpenCV GPU module is designed for ease of use and does not require any knowledge of Cuda. Though, such a knowledge will certainly be useful in non-trivial cases, or when you want to get the highest performance. It is helpful to have understanding of the costs of various operations, what the GPU does, what are the preferred data formats etc. The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if you algorithm involves many simple operations, then for the best possible performance you may still need to write your own kernels, to avoid extra write and read operations on the intermediate results.
-To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module will be built. Otherwise, the module will still be built, but at runtime all functions from the module will throw
+To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module will be built. Otherwise, the module will still be built, but at runtime all functions from the module will throw :c:type:`Exception` with ``CV_GpuNotSupported`` error code, except for :cpp:func:`gpu::getCudaEnabledDeviceCount`. The latter function will return zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require Cuda Toolkit installed. Therefore, using :cpp:func:`gpu::getCudaEnabledDeviceCount` function it is possible to implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
-:func:`Exception` with ``CV_GpuNotSupported`` error code, except for
-:func:`gpu::getCudaEnabledDeviceCount()` . The latter function will return zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require Cuda Toolkit installed. Therefore, using
-:func:`gpu::getCudaEnabledDeviceCount()` function it is possible to implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
 Compilation for different NVidia platforms.
 -------------------------------------------
@@ -28,19 +25,16 @@ On first call, the PTX code is compiled to binary code for the particular GPU us
 By default, the OpenCV GPU module includes:
-*
+* Binaries for compute capabilities 1.1, 1.2, 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN`` in CMake)
-    Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN``     in CMake)
-*
+* PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX`` in CMake)
-    PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX``     in CMake)
-That means for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms the PTX code for 1.3 is JIT'ed to a binary image. For devices with 1.1 and 1.2 the PTX for 1.1 is JIT'ed. For devices with CC 1.0 no code is available and the functions will throw
+That means for devices with CC 1.1, 1.2, 1.3 and 2.0 binary images are ready to run. For all newer platforms the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.0 no code is available and the functions will throw
-:func:`Exception` . For platforms where JIT compilation is performed first run will be slow.
+:c:type:`Exception`. For platforms where JIT compilation is performed first run will be slow.
-If you happen to have GPU with CC 1.0, the GPU module can still be compiled on it and most of the functions will run just fine on such card. Simply add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"`` . The functions that can not be run on CC 1.0 GPUs will throw an exception.
+If you happen to have GPU with CC 1.0, the GPU module can still be compiled on it and most of the functions will run just fine on such card. Simply add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"``. The functions that can not be run on CC 1.0 GPUs will throw an exception.
-You can always determine at runtime whether OpenCV GPU built binaries (or PTX code) are compatible with your GPU. The function
+You can always determine at runtime whether OpenCV GPU built binaries (or PTX code) are compatible with your GPU. The function :cpp:func:`gpu::DeviceInfo::isCompatible` return the compatibility status (true/false).
-:func:`gpu::DeviceInfo::isCompatible` return the compatibility status (true/false).
 Threading and multi-threading.
 ------------------------------
@@ -56,25 +50,14 @@ Multi-GPU
 In the current version each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, user has to manually distribute the work between the GPUs. Here are the two ways of utilizing multiple GPUs:
-*
+* If you only use synchronous functions, first, create several CPU threads (one per each GPU) and from within each thread create CUDA context for the corresponding GPU using :cpp:func:`gpu::setDevice` or Driver API. That's it. Now each of the threads will use the associated GPU.
-    If you only use synchronous functions, first, create several CPU threads (one per each GPU) and from within each thread create CUDA context for the corresponding GPU using
-    :func:`gpu::setDevice()`     or Driver API. That's it. Now each of the threads will use the associated GPU.
-*
+* In case of asynchronous functions, it is possible to create several Cuda contexts associated with different GPUs but attached to one CPU thread. This can be done only by Driver API. Within the thread you can switch from one GPU to another by making the corresponding context "current". With non-blocking GPU calls managing algorithm is clear.
-    In case of asynchronous functions, it is possible to create several Cuda contexts associated with different GPUs but attached to one CPU thread. This can be done only by Driver API. Within the thread you can switch from one GPU to another by making the corresponding context "current". With non-blocking GPU calls managing algorithm is clear.
 While developing algorithms for multiple GPUs a data passing overhead have to be taken into consideration. For primitive functions and for small images it can be significant and eliminate all the advantages of having multiple GPUs. But for high level algorithms Multi-GPU acceleration may be suitable. For example, Stereo Block Matching algorithm has been successfully parallelized using the following algorithm:
-*
+* Each image of the stereo pair is split into two horizontal overlapping stripes.
-    Each image of the stereo pair is split into two horizontal overlapping stripes.
+* Each pair of stripes (from the left and the right images) has been processed on a separate Fermi GPU
+* The results are merged into the single disparity map.
-*
+With this scheme dual GPU gave 180 % performance increase comparing to the single Fermi GPU. The source code of the example is available at https://code.ros.org/svn/opencv/trunk/opencv/examples/gpu/.
-    Each pair of stripes (from the left and the right images) has been processed on a separate Fermi GPU
-*
-    The results are merged into the single disparity map.
-With this scheme dual GPU gave 180
-%
-performance increase comparing to the single Fermi GPU. The source code of the example is available at
-https://code.ros.org/svn/opencv/trunk/opencv/examples/gpu/
--- a/modules/gpu/doc/matrix_reductions.rst
+++ b/modules/gpu/doc/matrix_reductions.rst
@@ -3,11 +3,13 @@ Matrix Reductions
 .. highlight:: cpp
 .. index:: gpu::meanStdDev
 gpu::meanStdDev
 -------------------
-.. c:function:: void gpu::meanStdDev(const GpuMat\& mtx, Scalar\& mean, Scalar\& stddev)
+.. cpp:function:: void gpu::meanStdDev(const GpuMat& mtx, Scalar& mean, Scalar& stddev)
    Computes mean value and standard deviation of matrix elements.
@@ -17,93 +19,99 @@ gpu::meanStdDev
    :param stddev: Standard deviation value.
-See also:
+See also: :c:func:`meanStdDev`.
-:func:`meanStdDev` .
 .. index:: gpu::norm
 gpu::norm
 -------------
-.. c:function:: double gpu::norm(const GpuMat\& src, int normType=NORM_L2)
+.. cpp:function:: double gpu::norm(const GpuMat& src, int normType=NORM_L2)
    Returns norm of matrix (or of two matrices difference).
    :param src: Source matrix. Any matrices except 64F are supported.
-    :param normType: Norm type.  ``NORM_L1`` ,  ``NORM_L2``  and  ``NORM_INF``  are supported for now.
+    :param normType: Norm type. ``NORM_L1``, ``NORM_L2`` and ``NORM_INF`` are supported for now.
-.. c:function:: double norm(const GpuMat\& src, int normType, GpuMat\& buf)
+.. cpp:function:: double gpu::norm(const GpuMat& src, int normType, GpuMat& buf)
-    * **src** Source matrix. Any matrices except 64F are supported.
+    :param src: Source matrix. Any matrices except 64F are supported.
-    * **normType** Norm type.  ``NORM_L1`` ,  ``NORM_L2``  and  ``NORM_INF``  are supported for now.
+    :param normType: Norm type. ``NORM_L1``, ``NORM_L2`` and ``NORM_INF`` are supported for now.
-    * **buf** Optional buffer to avoid extra memory allocations. It's resized automatically.
+    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
-.. c:function:: double norm(const GpuMat\& src1, const GpuMat\& src2,
+.. cpp:function:: double gpu::norm(const GpuMat& src1, const GpuMat& src2, int normType=NORM_L2)
-   int normType=NORM_L2)
-    * **src1** First source matrix.  ``CV_8UC1``  matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1`` matrices are supported for now.
-    * **src2** Second source matrix. Must have the same size and type as  ``src1``.
+    :param src2: Second source matrix. Must have the same size and type as ``src1``.
+    :param normType: Norm type. ``NORM_L1``, ``NORM_L2`` and ``NORM_INF`` are supported for now.
+See also: :c:func:`norm`.
-    * **normType** Norm type.  ``NORM_L1`` ,  ``NORM_L2``  and  ``NORM_INF``  are supported for now.
-See also:
-:func:`norm` .
 .. index:: gpu::sum
 gpu::sum
 ------------
-.. c:function:: Scalar gpu::sum(const GpuMat\& src)
+.. cpp:function:: Scalar gpu::sum(const GpuMat& src)
-.. c:function:: Scalar gpu::sum(const GpuMat\& src, GpuMat\& buf)
+.. cpp:function:: Scalar gpu::sum(const GpuMat& src, GpuMat& buf)
    Returns sum of matrix elements.
-    :param src: Source image of any depth except  ``CV_64F`` .
+    :param src: Source image of any depth except ``CV_64F``.
    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
-See also:
+See also: :c:func:`sum`.
-:func:`sum` .
 .. index:: gpu::absSum
 gpu::absSum
 ---------------
-.. c:function:: Scalar gpu::absSum(const GpuMat\& src)
+.. cpp:function:: Scalar gpu::absSum(const GpuMat& src)
-.. c:function:: Scalar gpu::absSum(const GpuMat\& src, GpuMat\& buf)
+.. cpp:function:: Scalar gpu::absSum(const GpuMat& src, GpuMat\& buf)
    Returns sum of matrix elements absolute values.
-    :param src: Source image of any depth except  ``CV_64F`` .
+    :param src: Source image of any depth except ``CV_64F``.
    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
 .. index:: gpu::sqrSum
 gpu::sqrSum
 ---------------
-.. c:function:: Scalar gpu::sqrSum(const GpuMat\& src)
+.. cpp:function:: Scalar gpu::sqrSum(const GpuMat& src)
-.. c:function:: Scalar gpu::sqrSum(const GpuMat\& src, GpuMat\& buf)
+.. cpp:function:: Scalar gpu::sqrSum(const GpuMat& src, GpuMat\& buf)
    Returns squared sum of matrix elements.
-    :param src: Source image of any depth except  ``CV_64F`` .
+    :param src: Source image of any depth except ``CV_64F``.
    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
 .. index:: gpu::minMax
 gpu::minMax
 ---------------
-.. c:function:: void gpu::minMax(const GpuMat\& src, double* minVal, double* maxVal=0, const GpuMat\& mask=GpuMat())
+.. cpp:function:: void gpu::minMax(const GpuMat& src, double* minVal, double* maxVal=0, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::minMax(const GpuMat\& src, double* minVal, double* maxVal, const GpuMat\& mask, GpuMat\& buf)
+.. cpp:function:: void gpu::minMax(const GpuMat& src, double* minVal, double* maxVal, const GpuMat& mask, GpuMat& buf)
    Finds global minimum and maximum matrix elements and returns their values.
@@ -117,18 +125,19 @@ gpu::minMax
    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
-Function doesn't work with ``CV_64F`` images on GPU with compute capability
+Function doesn't work with ``CV_64F`` images on GPU with compute capability :math:`<` 1.3.
-:math:`<` 1.3.
-See also:
+See also: :c:func:`minMaxLoc`.
-:func:`minMaxLoc` .
 .. index:: gpu::minMaxLoc
 gpu::minMaxLoc
 ------------------
-.. c:function:: void gpu::minMaxLoc(const GpuMat& src, double* minVal, double* maxVal=0, Point* minLoc=0, Point* maxLoc=0, const GpuMat& mask=GpuMat())
+.. cpp:function:: void gpu::minMaxLoc(const GpuMat& src, double* minVal, double* maxVal=0, Point* minLoc=0, Point* maxLoc=0, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::minMaxLoc(const GpuMat& src, double* minVal, double* maxVal, Point* minLoc, Point* maxLoc, const GpuMat& mask, GpuMat& valbuf, GpuMat& locbuf)
+.. cpp:function:: void gpu::minMaxLoc(const GpuMat& src, double* minVal, double* maxVal, Point* minLoc, Point* maxLoc, const GpuMat& mask, GpuMat& valbuf, GpuMat& locbuf)
    Finds global minimum and maximum matrix elements and returns their values with locations.
@@ -148,18 +157,19 @@ gpu::minMaxLoc
    :param locbuf: Optional locations buffer to avoid extra memory allocations. It's resized automatically.
-Function doesn't work with ``CV_64F`` images on GPU with compute capability
+Function doesn't work with ``CV_64F`` images on GPU with compute capability :math:`<` 1.3.
-:math:`<` 1.3.
-See also:
+See also: :c:func:`minMaxLoc`.
-:func:`minMaxLoc` .
 .. index:: gpu::countNonZero
 gpu::countNonZero
 ---------------------
-.. c:function:: int gpu::countNonZero(const GpuMat\& src)
+.. cpp:function:: int gpu::countNonZero(const GpuMat& src)
-.. c:function:: int gpu::countNonZero(const GpuMat\& src, GpuMat\& buf)
+.. cpp:function:: int gpu::countNonZero(const GpuMat& src, GpuMat& buf)
    Counts non-zero matrix elements.
@@ -167,7 +177,6 @@ gpu::countNonZero
    :param buf: Optional buffer to avoid extra memory allocations. It's resized automatically.
-Function doesn't work with ``CV_64F`` images on GPU with compute capability
+Function doesn't work with ``CV_64F`` images on GPU with compute capability :math:`<` 1.3.
-:math:`<` 1.3.
-See also:
+See also: :c:func:`countNonZero`.
-:func:`countNonZero` .
--- a/modules/gpu/doc/object_detection.rst
+++ b/modules/gpu/doc/object_detection.rst
@@ -3,19 +3,17 @@ Object Detection
 .. highlight:: cpp
-.. index:: gpu::HOGDescriptor
-.. _gpu::HOGDescriptor:
+.. index:: gpu::HOGDescriptor
 gpu::HOGDescriptor
 ------------------
-.. c:type:: gpu::HOGDescriptor
+.. cpp:class:: gpu::HOGDescriptor
-Histogram of Oriented Gradients
+Histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector. ::
-dalal_hog
-descriptor and detector. ::
-    struct CV_EXPORTS HOGDescriptor
+    struct HOGDescriptor
    {
        enum { DEFAULT_WIN_SIGMA = -1 };
        enum { DEFAULT_NLEVELS = 64 };
@@ -66,16 +64,13 @@ descriptor and detector. ::
 Interfaces of all methods are kept similar to CPU HOG descriptor and detector analogues as much as possible.
 .. index:: gpu::HOGDescriptor::HOGDescriptor
-cv::gpu::HOGDescriptor::HOGDescriptor
+gpu::HOGDescriptor::HOGDescriptor
 -------------------------------------
-.. c:function:: HOGDescriptor::HOGDescriptor(Size win_size=Size(64, 128),
+.. cpp:function:: gpu::HOGDescriptor::HOGDescriptor(Size win_size=Size(64, 128), Size block_size=Size(16, 16), Size block_stride=Size(8, 8), Size cell_size=Size(8, 8), int nbins=9, double win_sigma=DEFAULT_WIN_SIGMA, double threshold_L2hys=0.2, bool gamma_correction=true, int nlevels=DEFAULT_NLEVELS)
-   Size block_size=Size(16, 16), Size block_stride=Size(8, 8),
-   Size cell_size=Size(8, 8), int nbins=9,
-   double win_sigma=DEFAULT_WIN_SIGMA,
-   double threshold_L2hys=0.2, bool gamma_correction=true,
-   int nlevels=DEFAULT_NLEVELS)
    Creates HOG descriptor and detector.
@@ -97,61 +92,73 @@ cv::gpu::HOGDescriptor::HOGDescriptor
    :param nlevels: Maximum number of detection window increases.
 .. index:: gpu::HOGDescriptor::getDescriptorSize
-cv::gpu::HOGDescriptor::getDescriptorSize
+gpu::HOGDescriptor::getDescriptorSize
 -----------------------------------------
-.. c:function:: size_t HOGDescriptor::getDescriptorSize() const
+.. cpp:function:: size_t gpu::HOGDescriptor::getDescriptorSize() const
    Returns number of coefficients required for the classification.
 .. index:: gpu::HOGDescriptor::getBlockHistogramSize
-cv::gpu::HOGDescriptor::getBlockHistogramSize
+gpu::HOGDescriptor::getBlockHistogramSize
 ---------------------------------------------
-.. c:function:: size_t HOGDescriptor::getBlockHistogramSize() const
+.. cpp:function:: size_t gpu::HOGDescriptor::getBlockHistogramSize() const
    Returns block histogram size.
 .. index:: gpu::HOGDescriptor::setSVMDetector
-cv::gpu::HOGDescriptor::setSVMDetector
+gpu::HOGDescriptor::setSVMDetector
 --------------------------------------
-.. c:function:: void HOGDescriptor::setSVMDetector(const vector<float>\& detector)
+.. cpp:function:: void gpu::HOGDescriptor::setSVMDetector(const vector<float>& detector)
    Sets coefficients for the linear SVM classifier.
 .. index:: gpu::HOGDescriptor::getDefaultPeopleDetector
-cv::gpu::HOGDescriptor::getDefaultPeopleDetector
+gpu::HOGDescriptor::getDefaultPeopleDetector
 ------------------------------------------------
-.. c:function:: static vector<float> HOGDescriptor::getDefaultPeopleDetector()
+.. cpp:function:: static vector<float> gpu::HOGDescriptor::getDefaultPeopleDetector()
    Returns coefficients of the classifier trained for people detection (for default window size).
 .. index:: gpu::HOGDescriptor::getPeopleDetector48x96
-cv::gpu::HOGDescriptor::getPeopleDetector48x96
+gpu::HOGDescriptor::getPeopleDetector48x96
 ----------------------------------------------
-.. c:function:: static vector<float> HOGDescriptor::getPeopleDetector48x96()
+.. cpp:function:: static vector<float> gpu::HOGDescriptor::getPeopleDetector48x96()
    Returns coefficients of the classifier trained for people detection (for 48x96 windows).
 .. index:: gpu::HOGDescriptor::getPeopleDetector64x128
-cv::gpu::HOGDescriptor::getPeopleDetector64x128
+gpu::HOGDescriptor::getPeopleDetector64x128
 -----------------------------------------------
-.. c:function:: static vector<float> HOGDescriptor::getPeopleDetector64x128()
+.. cpp:function:: static vector<float> gpu::HOGDescriptor::getPeopleDetector64x128()
    Returns coefficients of the classifier trained for people detection (for 64x128 windows).
 .. index:: gpu::HOGDescriptor::detect
-cv::gpu::HOGDescriptor::detect
+gpu::HOGDescriptor::detect
 ------------------------------
-.. c:function:: void HOGDescriptor::detect(const GpuMat\& img,
+.. cpp:function:: void gpu::HOGDescriptor::detect(const GpuMat& img, vector<Point>& found_locations, double hit_threshold=0, Size win_stride=Size(), Size padding=Size())
-   vector<Point>\& found_locations, double hit_threshold=0,
-   Size win_stride=Size(), Size padding=Size())
    Perfroms object detection without multiscale window.
@@ -165,22 +172,21 @@ cv::gpu::HOGDescriptor::detect
    :param padding: Mock parameter to keep CPU interface compatibility. Must be (0,0).
 .. index:: gpu::HOGDescriptor::detectMultiScale
-cv::gpu::HOGDescriptor::detectMultiScale
+gpu::HOGDescriptor::detectMultiScale
 ----------------------------------------
-.. c:function:: void HOGDescriptor::detectMultiScale(const GpuMat\& img,
+.. cpp:function:: void gpu::HOGDescriptor::detectMultiScale(const GpuMat& img, vector<Rect>& found_locations, double hit_threshold=0, Size win_stride=Size(), Size padding=Size(), double scale0=1.05, int group_threshold=2)
-   vector<Rect>\& found_locations, double hit_threshold=0,
-   Size win_stride=Size(), Size padding=Size(),
-   double scale0=1.05, int group_threshold=2)
    Perfroms object detection with multiscale window.
-    :param img: Source image. See  :func:`gpu::HOGDescriptor::detect`  for type limitations.
+    :param img: Source image. See :cpp:func:`gpu::HOGDescriptor::detect` for type limitations.
    :param found_locations: Will contain detected objects boundaries.
-    :param hit_threshold: The threshold for the distance between features and SVM classifying plane. See  :func:`gpu::HOGDescriptor::detect`  for details.
+    :param hit_threshold: The threshold for the distance between features and SVM classifying plane. See :cpp:func:`gpu::HOGDescriptor::detect` for details.
    :param win_stride: Window stride. Must be a multiple of block stride.
@@ -188,20 +194,19 @@ cv::gpu::HOGDescriptor::detectMultiScale
    :param scale0: Coefficient of the detection window increase.
-    :param group_threshold: After detection some objects could be covered by many rectangles. This coefficient regulates similarity threshold. 0 means don't perform grouping.
+    :param group_threshold: After detection some objects could be covered by many rectangles. This coefficient regulates similarity threshold. 0 means don't perform grouping. See :c:func:`groupRectangles`.
-        See  :func:`groupRectangles` .
 .. index:: gpu::HOGDescriptor::getDescriptors
-cv::gpu::HOGDescriptor::getDescriptors
+gpu::HOGDescriptor::getDescriptors
 --------------------------------------
-.. c:function:: void HOGDescriptor::getDescriptors(const GpuMat\& img,
+.. cpp:function:: void gpu::HOGDescriptor::getDescriptors(const GpuMat& img, Size win_stride, GpuMat& descriptors, int descr_format=DESCR_FORMAT_COL_BY_COL)
-   Size win_stride, GpuMat\& descriptors,
-   int descr_format=DESCR_FORMAT_COL_BY_COL)
    Returns block descriptors computed for the whole image. It's mainly used for classifier learning purposes.
-    :param img: Source image. See  :func:`gpu::HOGDescriptor::detect`  for type limitations.
+    :param img: Source image. See :cpp:func:`gpu::HOGDescriptor::detect` for type limitations.
    :param win_stride: Window stride. Must be a multiple of block stride.
@@ -214,17 +219,16 @@ cv::gpu::HOGDescriptor::getDescriptors
            * **DESCR_FORMAT_COL_BY_COL** Column-major order.
-.. index:: gpu::CascadeClassifier_GPU
-.. _gpu::CascadeClassifier_GPU:
+.. index:: gpu::CascadeClassifier_GPU
 gpu::CascadeClassifier_GPU
 --------------------------
-.. c:type:: gpu::CascadeClassifier_GPU
+.. cpp:class:: gpu::CascadeClassifier_GPU
 The cascade classifier class for object detection. ::
-    class CV_EXPORTS CascadeClassifier_GPU
+    class CascadeClassifier_GPU
    {
    public:
        CascadeClassifier_GPU();
@@ -248,63 +252,62 @@ The cascade classifier class for object detection. ::
    };
-.. index:: cv::gpu::CascadeClassifier_GPU::CascadeClassifier_GPU
-.. _cv::gpu::CascadeClassifier_GPU::CascadeClassifier_GPU:
+.. index:: gpu::CascadeClassifier_GPU::CascadeClassifier_GPU
-cv::gpu::CascadeClassifier_GPU::CascadeClassifier_GPU
+gpu::CascadeClassifier_GPU::CascadeClassifier_GPU
 -----------------------------------------------------
-.. c:function:: cv::CascadeClassifier_GPU(const string\& filename)
+.. cpp:function:: gpu::CascadeClassifier_GPU::CascadeClassifier_GPU(const string& filename)
    Loads the classifier from file.
    :param filename: Name of file from which classifier will be load. Only old haar classifier (trained by haartraining application) and NVidia's nvbin are supported.
-.. index:: cv::gpu::CascadeClassifier_GPU::empty
-.. _cv::gpu::CascadeClassifier_GPU::empty:
-cv::gpu::CascadeClassifier_GPU::empty
+.. index:: gpu::CascadeClassifier_GPU::empty
+gpu::CascadeClassifier_GPU::empty
 -------------------------------------
-.. c:function:: bool CascadeClassifier_GPU::empty() const
+.. cpp:function:: bool gpu::CascadeClassifier_GPU::empty() const
    Checks if the classifier has been loaded or not.
-.. index:: cv::gpu::CascadeClassifier_GPU::load
-.. _cv::gpu::CascadeClassifier_GPU::load:
-cv::gpu::CascadeClassifier_GPU::load
+.. index:: cv::gpu::CascadeClassifier_GPU::load
+gpu::CascadeClassifier_GPU::load
 ------------------------------------
-.. c:function:: bool CascadeClassifier_GPU::load(const string\& filename)
+.. cpp:function:: bool gpu::CascadeClassifier_GPU::load(const string\& filename)
    Loads the classifier from file. The previous content is destroyed.
    :param filename: Name of file from which classifier will be load. Only old haar classifier (trained by haartraining application) and NVidia's nvbin are supported.
-.. index:: cv::gpu::CascadeClassifier_GPU::release
-.. _cv::gpu::CascadeClassifier_GPU::release:
-cv::gpu::CascadeClassifier_GPU::release
+.. index:: gpu::CascadeClassifier_GPU::release
+gpu::CascadeClassifier_GPU::release
 ---------------------------------------
-.. c:function:: void CascadeClassifier_GPU::release()
+.. cpp:function:: void gpu::CascadeClassifier_GPU::release()
    Destroys loaded classifier.
-.. index:: cv::gpu::CascadeClassifier_GPU::detectMultiScale
-.. _cv::gpu::CascadeClassifier_GPU::detectMultiScale:
-cv::gpu::CascadeClassifier_GPU::detectMultiScale
+.. index:: gpu::CascadeClassifier_GPU::detectMultiScale
+gpu::CascadeClassifier_GPU::detectMultiScale
 ------------------------------------------------
-.. c:function:: int CascadeClassifier_GPU::detectMultiScale(const GpuMat\& image, GpuMat\& objectsBuf, double scaleFactor=1.2, int minNeighbors=4, Size minSize=Size())
+.. cpp:function:: int gpu::CascadeClassifier_GPU::detectMultiScale(const GpuMat& image, GpuMat& objectsBuf, double scaleFactor=1.2, int minNeighbors=4, Size minSize=Size())
    Detects objects of different sizes in the input image. The detected objects are returned as a list of rectangles.
    :param image: Matrix of type ``CV_8U`` containing the image in which to detect objects.
-    :param objects: Buffer to store detected objects (rectangles). If it is empty, it will be allocated with default size. If not empty, function will search not more than N objects, where N = sizeof(objectsBufer's data)/sizeof(cv::Rect).
+    :param objects: Buffer to store detected objects (rectangles). If it is empty, it will be allocated with default size. If not empty, function will search not more than N objects, where ``N = sizeof(objectsBufer's data)/sizeof(cv::Rect)``.
    :param scaleFactor: Specifies how much the image size is reduced at each image scale.
@@ -333,7 +336,5 @@ The function returns number of detected objects, so you can retrieve them as in
    imshow("Faces", image_cpu);
+See also: :c:func:`CascadeClassifier::detectMultiScale`.
-See also:
-:func:`CascadeClassifier::detectMultiScale` .
--- a/modules/gpu/doc/operations_on_matrices.rst
+++ b/modules/gpu/doc/operations_on_matrices.rst
@@ -3,11 +3,13 @@ Operations on Matrices
 .. highlight:: cpp
 .. index:: gpu::transpose
 gpu::transpose
 ------------------
-.. c:function:: void gpu::transpose(const GpuMat\& src, GpuMat\& dst)
+.. cpp:function:: void gpu::transpose(const GpuMat& src, GpuMat& dst)
    Transposes a matrix.
@@ -15,14 +17,15 @@ gpu::transpose
    :param dst: Destination matrix.
-See also:
+See also: :c:func:`transpose`.
-:func:`transpose` .
 .. index:: gpu::flip
 gpu::flip
 -------------
-.. c:function:: void gpu::flip(const GpuMat\& a, GpuMat\& b, int flipCode)
+.. cpp:function:: void gpu::flip(const GpuMat& a, GpuMat& b, int flipCode)
    Flips a 2D matrix around vertical, horizontal or both axes.
@@ -38,39 +41,40 @@ gpu::flip
            * **:math:`<`0** Flip around both axes.
+See also: :c:func:`flip`.
-See also:
-:func:`flip` .
 .. index:: gpu::LUT
 gpu::LUT
 ------------
-.. math::
-    dst(I) = lut(src(I))
-.. c:function:: void gpu::LUT(const GpuMat\& src, const Mat\& lut, GpuMat\& dst)
+.. cpp:function:: void gpu::LUT(const GpuMat& src, const Mat& lut, GpuMat& dst)
    Transforms the source matrix into the destination matrix using given look-up table:
+    .. math::
+        dst(I) = lut(src(I))
    :param src: Source matrix. ``CV_8UC1`` and ``CV_8UC3`` matrixes are supported for now.
    :param lut: Look-up table. Must be continuous, ``CV_8U`` depth matrix. Its area must satisfy to ``lut.rows`` :math:`\times` ``lut.cols`` = 256 condition.
-    :param dst: Destination matrix. Will have the same depth as  ``lut``  and the same number of channels as  ``src`` .
+    :param dst: Destination matrix. Will have the same depth as ``lut`` and the same number of channels as ``src``.
+See also: :c:func:`LUT`.
-See also:
-:func:`LUT` .
 .. index:: gpu::merge
 gpu::merge
 --------------
-.. c:function:: void gpu::merge(const GpuMat* src, size_t n, GpuMat\& dst)
+.. cpp:function:: void gpu::merge(const GpuMat* src, size_t n, GpuMat& dst)
-.. c:function:: void gpu::merge(const GpuMat* src, size_t n, GpuMat\& dst,
+.. cpp:function:: void gpu::merge(const GpuMat* src, size_t n, GpuMat& dst, const Stream& stream)
-   const Stream\& stream)
    Makes a multi-channel matrix out of several single-channel matrices.
@@ -82,27 +86,27 @@ gpu::merge
    :param stream: Stream for the asynchronous version.
-.. c:function:: void merge(const vector$<$GpuMat$>$\& src, GpuMat\& dst)
+.. cpp:function:: void gpu::merge(const vector<GpuMat>& src, GpuMat& dst)
-.. c:function:: void merge(const vector$<$GpuMat$>$\& src, GpuMat\& dst,
+.. cpp:function:: void gpu::merge(const vector<GpuMat>& src, GpuMat& dst, const Stream& stream)
-   const Stream\& stream)
+    :param src: Vector of the source matrices.
+    :param dst: Destination matrix.
-    * **src** Vector of the source matrices.
+    :param stream: Stream for the asynchronous version.
-    * **dst** Destination matrix.
+See also: :c:func:`merge`.
-    * **stream** Stream for the asynchronous version.
-See also:
-:func:`merge` .
 .. index:: gpu::split
 gpu::split
 --------------
-.. c:function:: void gpu::split(const GpuMat\& src, GpuMat* dst)
+.. cpp:function:: void gpu::split(const GpuMat& src, GpuMat* dst)
-.. c:function:: void gpu::split(const GpuMat\& src, GpuMat* dst, const Stream\& stream)
+.. cpp:function:: void gpu::split(const GpuMat& src, GpuMat* dst, const Stream& stream)
    Copies each plane of a multi-channel matrix into an array.
@@ -112,149 +116,144 @@ gpu::split
    :param stream: Stream for the asynchronous version.
-.. c:function:: void gpu::split(const GpuMat\& src, vector$<$GpuMat$>$\& dst)
+.. cpp:function:: void gpu::split(const GpuMat& src, vector<GpuMat>& dst)
+.. cpp:function:: void gpu::split(const GpuMat& src, vector<GpuMat>& dst, const Stream& stream)
+    :param src: Source matrix.
-.. c:function:: void gpu::split(const GpuMat\& src, vector$<$GpuMat$>$\& dst,
+    :param dst: Destination vector of single-channel matrices.
-   const Stream\& stream)
-    * **src** Source matrix.
+    :param stream: Stream for the asynchronous version.
-    * **dst** Destination vector of single-channel matrices.
+See also: :c:func:`split`.
-    * **stream** Stream for the asynchronous version.
-See also:
-:func:`split` .
 .. index:: gpu::magnitude
 gpu::magnitude
 ------------------
-.. c:function:: void gpu::magnitude(const GpuMat\& x, GpuMat\& magnitude)
+.. cpp:function:: void gpu::magnitude(const GpuMat& x, GpuMat& magnitude)
    Computes magnitudes of complex matrix elements.
-    :param x: Source complex matrix in the interleaved format ( ``CV_32FC2`` ).
+    :param x: Source complex matrix in the interleaved format (``CV_32FC2``).
+    :param magnitude: Destination matrix of float magnitudes (``CV_32FC1``).
-    :param magnitude: Destination matrix of float magnitudes ( ``CV_32FC1`` ).
+.. cpp:function:: void gpu::magnitude(const GpuMat& x, const GpuMat& y, GpuMat& magnitude)
-.. c:function:: void magnitude(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude)
+.. cpp:function:: void gpu::magnitude(const GpuMat& x, const GpuMat& y, GpuMat& magnitude, const Stream& stream)
-.. c:function:: void magnitude(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude,
+    :param x: Source matrix, containing real components (``CV_32FC1``).
-   const Stream\& stream)
-    * **x** Source matrix, containing real components ( ``CV_32FC1`` ).
+    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
-    * **y** Source matrix, containing imaginary components ( ``CV_32FC1`` ).
+    :param magnitude: Destination matrix of float magnitudes (``CV_32FC1``).
+    :param stream: Stream for the asynchronous version.
-    * **magnitude** Destination matrix of float magnitudes ( ``CV_32FC1`` ).
+See also: :c:func:`magnitude`.
-    * **stream** Stream for the asynchronous version.
-See also:
-:func:`magnitude` .
 .. index:: gpu::magnitudeSqr
 gpu::magnitudeSqr
 ---------------------
-.. c:function:: void gpu::magnitudeSqr(const GpuMat\& x, GpuMat\& magnitude)
+.. cpp:function:: void gpu::magnitudeSqr(const GpuMat& x, GpuMat& magnitude)
    Computes squared magnitudes of complex matrix elements.
-    :param x: Source complex matrix in the interleaved format ( ``CV_32FC2`` ).
+    :param x: Source complex matrix in the interleaved format (``CV_32FC2``).
-    :param magnitude: Destination matrix of float magnitude squares ( ``CV_32FC1`` ).
+    :param magnitude: Destination matrix of float magnitude squares (``CV_32FC1``).
-.. c:function:: void magnitudeSqr(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude)
+.. cpp:function:: void gpu::magnitudeSqr(const GpuMat& x, const GpuMat& y, GpuMat& magnitude)
-.. c:function:: void magnitudeSqr(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude,
+.. cpp:function:: void gpu::magnitudeSqr(const GpuMat& x, const GpuMat& y, GpuMat& magnitude, const Stream& stream)
-   const Stream\& stream)
-    * **x** Source matrix, containing real components ( ``CV_32FC1`` ).
+    :param x: Source matrix, containing real components (``CV_32FC1``).
-    * **y** Source matrix, containing imaginary components ( ``CV_32FC1`` ).
+    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
+    :param magnitude: Destination matrix of float magnitude squares (``CV_32FC1``).
+    :param stream: Stream for the asynchronous version.
-    * **magnitude** Destination matrix of float magnitude squares ( ``CV_32FC1`` ).
-    * **stream** Stream for the asynchronous version.
 .. index:: gpu::phase
 gpu::phase
 --------------
-.. c:function:: void gpu::phase(const GpuMat\& x, const GpuMat\& y, GpuMat\& angle,
+.. cpp:function:: void gpu::phase(const GpuMat& x, const GpuMat& y, GpuMat& angle, bool angleInDegrees=false)
-   bool angleInDegrees=false)
-.. c:function:: void phase(const GpuMat\& x, const GpuMat\& y, GpuMat\& angle,
+.. cpp:function:: void gpu::phase(const GpuMat& x, const GpuMat& y, GpuMat& angle, bool angleInDegrees, const Stream& stream)
-   bool angleInDegrees, const Stream\& stream)
    Computes polar angles of complex matrix elements.
-    :param x: Source matrix, containing real components ( ``CV_32FC1`` ).
+    :param x: Source matrix, containing real components (``CV_32FC1``).
-    :param y: Source matrix, containing imaginary components ( ``CV_32FC1`` ).
+    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
-    :param angle: Destionation matrix of angles ( ``CV_32FC1`` ).
+    :param angle: Destionation matrix of angles (``CV_32FC1``).
    :param angleInDegress: Flag which indicates angles must be evaluated in degress.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`phase`.
-:func:`phase` .
 .. index:: gpu::cartToPolar
 gpu::cartToPolar
 --------------------
-.. c:function:: void gpu::cartToPolar(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude,
+.. cpp:function:: void gpu::cartToPolar(const GpuMat& x, const GpuMat& y, GpuMat& magnitude, GpuMat& angle, bool angleInDegrees=false)
-   GpuMat\& angle, bool angleInDegrees=false)
-.. c:function:: void cartToPolar(const GpuMat\& x, const GpuMat\& y, GpuMat\& magnitude,
+.. cpp:function:: void gpu::cartToPolar(const GpuMat& x, const GpuMat& y, GpuMat& magnitude, GpuMat& angle, bool angleInDegrees, const Stream& stream)
-   GpuMat\& angle, bool angleInDegrees, const Stream\& stream)
    Converts Cartesian coordinates into polar.
-    :param x: Source matrix, containing real components ( ``CV_32FC1`` ).
+    :param x: Source matrix, containing real components (``CV_32FC1``).
-    :param y: Source matrix, containing imaginary components ( ``CV_32FC1`` ).
+    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
-    :param magnitude: Destination matrix of float magnituds ( ``CV_32FC1`` ).
+    :param magnitude: Destination matrix of float magnituds (``CV_32FC1``).
-    :param angle: Destionation matrix of angles ( ``CV_32FC1`` ).
+    :param angle: Destionation matrix of angles (``CV_32FC1``).
    :param angleInDegress: Flag which indicates angles must be evaluated in degress.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`cartToPolar`.
-:func:`cartToPolar` .
 .. index:: gpu::polarToCart
 gpu::polarToCart
 --------------------
-.. c:function:: void gpu::polarToCart(const GpuMat\& magnitude, const GpuMat\& angle,
+.. cpp:function:: void gpu::polarToCart(const GpuMat& magnitude, const GpuMat& angle, GpuMat& x, GpuMat& y, bool angleInDegrees=false)
-   GpuMat\& x, GpuMat\& y, bool angleInDegrees=false)
-.. c:function:: void gpu::polarToCart(const GpuMat\& magnitude, const GpuMat\& angle,
+.. cpp:function:: void gpu::polarToCart(const GpuMat& magnitude, const GpuMat& angle, GpuMat& x, GpuMat& y, bool angleInDegrees, const Stream& stream)
-   GpuMat\& x, GpuMat\& y, bool angleInDegrees,
-   const Stream\& stream)
    Converts polar coordinates into Cartesian.
-    :param magnitude: Source matrix, containing magnitudes ( ``CV_32FC1`` ).
+    :param magnitude: Source matrix, containing magnitudes (``CV_32FC1``).
-    :param angle: Source matrix, containing angles ( ``CV_32FC1`` ).
+    :param angle: Source matrix, containing angles (``CV_32FC1``).
-    :param x: Destination matrix of real components ( ``CV_32FC1`` ).
+    :param x: Destination matrix of real components (``CV_32FC1``).
-    :param y: Destination matrix of imaginary components ( ``CV_32FC1`` ).
+    :param y: Destination matrix of imaginary components (``CV_32FC1``).
    :param angleInDegress: Flag which indicates angles are in degress.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`polarToCart`.
-:func:`polarToCart` .
--- a/modules/gpu/doc/per_element_operations.rst
+++ b/modules/gpu/doc/per_element_operations.rst
@@ -3,175 +3,183 @@ Per-element Operations.
 .. highlight:: cpp
 .. index:: gpu::add
 gpu::add
 ------------
-.. c:function:: void gpu::add(const GpuMat& a, const GpuMat& b, GpuMat& c)
+.. cpp:function:: void gpu::add(const GpuMat& a, const GpuMat& b, GpuMat& c)
    Computes matrix-matrix or matrix-scalar sum.
-    :param a: First source matrix.  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  matrices are supported for now.
+    :param a: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param b: Second source matrix. Must have the same size and type as ``a``.
+    :param c: Destination matrix. Will have the same size and type as ``a``.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+.. cpp:function:: void gpu::add(const GpuMat& a, const Scalar& sc, GpuMat& c)
-    :param c: Destination matrix. Will have the same size and type as  ``a`` .
+    :param a: Source matrix. ``CV_32FC1`` and ``CV_32FC2`` matrixes are supported for now.
-.. c:function:: void gpu::add(const GpuMat& a, const Scalar& sc, GpuMat& c)
+    :param b: Source scalar to be added to the source matrix.
-    * **a** Source matrix.  ``CV_32FC1``  and  ``CV_32FC2``  matrixes are supported for now.
+    :param c: Destination matrix. Will have the same size and type as ``a``.
-    * **b** Source scalar to be added to the source matrix.
+See also: :c:func:`add`.
-    * **c** Destination matrix. Will have the same size and type as  ``a`` .
-See also:
-:func:`add` .
 .. index:: gpu::subtract
 gpu::subtract
 -----------------
-.. c:function:: void gpu::subtract(const GpuMat& a, const GpuMat& b, GpuMat& c)
+.. cpp:function:: void gpu::subtract(const GpuMat& a, const GpuMat& b, GpuMat& c)
    Subtracts matrix from another matrix (or scalar from matrix).
-    :param a: First source matrix.  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  matrices are supported for now.
+    :param a: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+    :param b: Second source matrix. Must have the same size and type as ``a``.
-    :param c: Destination matrix. Will have the same size and type as  ``a`` .
+    :param c: Destination matrix. Will have the same size and type as ``a``.
-.. c:function:: void subtract(const GpuMat& a, const Scalar& sc, GpuMat& c)
+.. cpp:function:: void gpu::subtract(const GpuMat& a, const Scalar& sc, GpuMat& c)
-    * **a** Source matrix.   ``CV_32FC1``  and  ``CV_32FC2``  matrixes are supported for now.
+    :param a: Source matrix. ``CV_32FC1`` and ``CV_32FC2`` matrixes are supported for now.
-    * **b** Scalar to be subtracted from the source matrix elements.
+    :param b: Scalar to be subtracted from the source matrix elements.
+    :param c: Destination matrix. Will have the same size and type as ``a``.
+See also: :c:func:`subtract`.
-    * **c** Destination matrix. Will have the same size and type as  ``a`` .
-See also:
-:func:`subtract` .
 .. index:: gpu::multiply
 gpu::multiply
 -----------------
-.. c:function:: void gpu::multiply(const GpuMat& a, const GpuMat& b, GpuMat& c)
+.. cpp:function:: void gpu::multiply(const GpuMat& a, const GpuMat& b, GpuMat& c)
    Computes per-element product of two matrices (or of matrix and scalar).
-    :param a: First source matrix.  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  matrices are supported for now.
+    :param a: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param b: Second source matrix. Must have the same size and type as ``a``.
+    :param c: Destionation matrix. Will have the same size and type as ``a``.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+.. cpp:function:: void gpu::multiply(const GpuMat& a, const Scalar& sc, GpuMat& c)
-    :param c: Destionation matrix. Will have the same size and type as  ``a`` .
+    :param a: Source matrix. ``CV_32FC1`` and ``CV_32FC2`` matrixes are supported for now.
-.. c:function:: void multiply(const GpuMat& a, const Scalar& sc, GpuMat& c)
+    :param b: Scalar to be multiplied by.
-    * **a** Source matrix.   ``CV_32FC1``  and  ``CV_32FC2``  matrixes are supported for now.
+    :param c: Destination matrix. Will have the same size and type as ``a``.
-    * **b** Scalar to be multiplied by.
+See also: :c:func:`multiply`.
-    * **c** Destination matrix. Will have the same size and type as  ``a`` .
-See also:
-:func:`multiply` .
 .. index:: gpu::divide
 gpu::divide
 ---------------
-.. c:function:: void gpu::divide(const GpuMat& a, const GpuMat& b, GpuMat& c)
+.. cpp:function:: void gpu::divide(const GpuMat& a, const GpuMat& b, GpuMat& c)
    Performs per-element division of two matrices (or division of matrix by scalar).
-    :param a: First source matrix.  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  matrices are supported for now.
+    :param a: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+    :param b: Second source matrix. Must have the same size and type as ``a``.
-    :param c: Destionation matrix. Will have the same size and type as  ``a`` .
+    :param c: Destionation matrix. Will have the same size and type as ``a``.
-.. c:function:: void divide(const GpuMat& a, const Scalar& sc, GpuMat& c)
+.. cpp:function:: void gpu::divide(const GpuMat& a, const Scalar& sc, GpuMat& c)
-    * **a** Source matrix.   ``CV_32FC1``  and  ``CV_32FC2``  matrixes are supported for now.
+    :param a: Source matrix. ``CV_32FC1`` and ``CV_32FC2`` matrixes are supported for now.
-    * **b** Scalar to be divided by.
+    :param b: Scalar to be divided by.
-    * **c** Destination matrix. Will have the same size and type as  ``a`` .
+    :param c: Destination matrix. Will have the same size and type as ``a``.
+This function in contrast to :func:`divide` uses round-down rounding mode.
+See also: :c:func:`divide`.
-This function in contrast to
-:func:`divide` uses round-down rounding mode.
-See also:
-:func:`divide` .
 .. index:: gpu::exp
 gpu::exp
 ------------
-.. c:function:: void gpu::exp(const GpuMat& a, GpuMat& b)
+.. cpp:function:: void gpu::exp(const GpuMat& a, GpuMat& b)
    Computes exponent of each matrix element.
    :param a: Source matrix. ``CV_32FC1`` matrixes are supported for now.
-    :param b: Destination matrix. Will have the same size and type as  ``a`` .
+    :param b: Destination matrix. Will have the same size and type as ``a``.
+See also: :c:func:`exp`.
-See also:
-:func:`exp` .
 .. index:: gpu::log
 gpu::log
 ------------
-.. c:function:: void gpu::log(const GpuMat& a, GpuMat& b)
+.. cpp:function:: void gpu::log(const GpuMat& a, GpuMat& b)
    Computes natural logarithm of absolute value of each matrix element.
    :param a: Source matrix. ``CV_32FC1`` matrixes are supported for now.
-    :param b: Destination matrix. Will have the same size and type as  ``a`` .
+    :param b: Destination matrix. Will have the same size and type as ``a``.
+See also: :c:func:`log`.
-See also:
-:func:`log` .
 .. index:: gpu::absdiff
 gpu::absdiff
 ----------------
-.. c:function:: void gpu::absdiff(const GpuMat& a, const GpuMat& b, GpuMat& c)
+.. cpp:function:: void gpu::absdiff(const GpuMat& a, const GpuMat& b, GpuMat& c)
    Computes per-element absolute difference of two matrices (or of matrix and scalar).
-    :param a: First source matrix.  ``CV_8UC1`` ,  ``CV_8UC4`` ,  ``CV_32SC1``  and  ``CV_32FC1``  matrices are supported for now.
+    :param a: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param b: Second source matrix. Must have the same size and type as ``a``.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+    :param c: Destionation matrix. Will have the same size and type as ``a``.
-    :param c: Destionation matrix. Will have the same size and type as  ``a`` .
+.. cpp:function:: void gpu::absdiff(const GpuMat& a, const Scalar& s, GpuMat& c)
-.. c:function:: void absdiff(const GpuMat& a, const Scalar& s, GpuMat& c)
+    :param a: Source matrix. ``CV_32FC1`` matrixes are supported for now.
+    :param b: Scalar to be subtracted from the source matrix elements.
-    * **a** Source matrix.  ``CV_32FC1``  matrixes are supported for now.
+    :param c: Destination matrix. Will have the same size and type as ``a``.
-    * **b** Scalar to be subtracted from the source matrix elements.
+See also: :c:func:`absdiff`.
-    * **c** Destination matrix. Will have the same size and type as  ``a`` .
-See also:
-:func:`absdiff` .
 .. index:: gpu::compare
 gpu::compare
 ----------------
-.. c:function:: void gpu::compare(const GpuMat& a, const GpuMat& b, GpuMat& c, int cmpop)
+.. cpp:function:: void gpu::compare(const GpuMat& a, const GpuMat& b, GpuMat& c, int cmpop)
    Compares elements of two matrices.
    :param a: First source matrix. ``CV_8UC4`` and ``CV_32FC1`` matrices are supported for now.
-    :param b: Second source matrix. Must have the same size and type as  ``a`` .
+    :param b: Second source matrix. Must have the same size and type as ``a``.
    :param c: Destination matrix. Will have the same size as ``a`` and be ``CV_8UC1`` type.
@@ -184,98 +192,107 @@ gpu::compare
            * **CMP_LE** :math:`\le`             
            * **CMP_NE** :math:`\ne`             
+See also: :c:func:`compare`.
-See also:
-:func:`compare` .
 .. index:: gpu::bitwise_not
 gpu::bitwise_not
 --------------------
-.. c:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask=GpuMat())
+.. cpp:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask, const Stream& stream)
+.. cpp:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask, const Stream& stream)
    Performs per-element bitwise inversion.
    :param src: Source matrix.
-    :param dst: Destination matrix. Will have the same size and type as  ``src`` .
+    :param dst: Destination matrix. Will have the same size and type as ``src``.
    :param mask: Optional operation mask. 8-bit single channel image.
    :param stream: Stream for the asynchronous version.
 .. index:: gpu::bitwise_or
 gpu::bitwise_or
 -------------------
-.. c:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
+.. cpp:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
+.. cpp:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
    Performs per-element bitwise disjunction of two matrices.
    :param src1: First source matrix.
-    :param src2: Second source matrix. It must have the same size and type as  ``src1`` .
+    :param src2: Second source matrix. It must have the same size and type as ``src1``.
-    :param dst: Destination matrix. Will have the same size and type as  ``src1`` .
+    :param dst: Destination matrix. Will have the same size and type as ``src1``.
    :param mask: Optional operation mask. 8-bit single channel image.
    :param stream: Stream for the asynchronous version.
 .. index:: gpu::bitwise_and
 gpu::bitwise_and
 --------------------
-.. c:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
+.. cpp:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
+.. cpp:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
    Performs per-element bitwise conjunction of two matrices.
    :param src1: First source matrix.
-    :param src2: Second source matrix. It must have the same size and type as  ``src1`` .
+    :param src2: Second source matrix. It must have the same size and type as ``src1``.
-    :param dst: Destination matrix. Will have the same size and type as  ``src1`` .
+    :param dst: Destination matrix. Will have the same size and type as ``src1``.
    :param mask: Optional operation mask. 8-bit single channel image.
    :param stream: Stream for the asynchronous version.
 .. index:: gpu::bitwise_xor
 gpu::bitwise_xor
 --------------------
-.. c:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
+.. cpp:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask=GpuMat())
-.. c:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
+.. cpp:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
    Performs per-element bitwise "exclusive or" of two matrices.
    :param src1: First source matrix.
-    :param src2: Second source matrix. It must have the same size and type as  ``src1`` .
+    :param src2: Second source matrix. It must have the same size and type as ``src1``.
-    :param dst: Destination matrix. Will have the same size and type as  ``src1`` .
+    :param dst: Destination matrix. Will have the same size and type as ``src1``.
    :param mask: Optional operation mask. 8-bit single channel image.
    :param stream: Stream for the asynchronous version.
 .. index:: gpu::min
 gpu::min
 ------------
-.. c:function:: void gpu::min(const GpuMat& src1, const GpuMat& src2, GpuMat& dst)
+.. cpp:function:: void gpu::min(const GpuMat& src1, const GpuMat& src2, GpuMat& dst)
-.. c:function:: void gpu::min(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const Stream& stream)
+.. cpp:function:: void gpu::min(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const Stream& stream)
+.. cpp:function:: void gpu::min(const GpuMat& src1, double value, GpuMat& dst)
+.. cpp:function:: void gpu::min(const GpuMat& src1, double value, GpuMat& dst, const Stream& stream)
    Computes per-element minimum of two matrices (or a matrix and a scalar).
@@ -283,37 +300,27 @@ gpu::min
    :param src2: Second source matrix.
-    :param dst: Destination matrix. Will have the same size and type as  ``src1`` .
+    :param value: Scalar value to compare ``src1`` elements with.
-    :param stream: Stream for the asynchronous version.
-.. c:function:: void gpu::min(const GpuMat& src1, double src2, GpuMat& dst)
+    :param dst: Destination matrix. Will have the same size and type as ``src1``.
-.. c:function:: void gpu::min(const GpuMat& src1, double src2, GpuMat& dst,
+    :param stream: Stream for the asynchronous version.
-   const Stream& stream)
-    * **src1** Source matrix.
-    * **src2** Scalar to be compared with.
-    * **dst** Destination matrix. Will have the same size and type as  ``src1`` .
+See also: :c:func:`min`.
-    * **stream** Stream for the asynchronous version.
-See also:
-:func:`min` .
 .. index:: gpu::max
 gpu::max
 ------------
-.. c:function:: void gpu::max(const GpuMat& src1, const GpuMat& src2, GpuMat& dst)
+.. cpp:function:: void gpu::max(const GpuMat& src1, const GpuMat& src2, GpuMat& dst)
-.. c:function:: void gpu::max(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const Stream& stream)
+.. cpp:function:: void gpu::max(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const Stream& stream)
-.. c:function:: void gpu::max(const GpuMat& src1, double value, GpuMat& dst)
+.. cpp:function:: void gpu::max(const GpuMat& src1, double value, GpuMat& dst)
-.. c:function:: void gpu::max(const GpuMat& src1, double value, GpuMat& dst, const Stream& stream)
+.. cpp:function:: void gpu::max(const GpuMat& src1, double value, GpuMat& dst, const Stream& stream)
    Computes per-element maximum of two matrices (or a matrix and a scalar).
@@ -321,11 +328,10 @@ gpu::max
    :param src2: Second source matrix.
-    :param value: The scalar value to compare ``src1`` elements with
+    :param value: Scalar value to compare ``src1`` elements with.
-    :param dst: Destination matrix. Will have the same size and type as  ``src1`` .
+    :param dst: Destination matrix. Will have the same size and type as ``src1``.
    :param stream: Stream for the asynchronous version.
-See also:
+See also: :c:func:`max`.
-:func:`max` .
--- a/modules/imgproc/doc/filtering.rst
+++ b/modules/imgproc/doc/filtering.rst
+.. _ImageFiltering:
 Image Filtering
 ===============