Doxygen documentation: cuda

ceb6e8bd · Maksim Shabunin · 472c2106 · ceb6e8bd · ceb6e8bd · ceb6e8bd
Commit ceb6e8bd authored Nov 20, 2014 by Maksim Shabunin
80 changed files
--- a/doc/CMakeLists.txt
+++ b/doc/CMakeLists.txt
@@ -159,12 +159,18 @@ if(BUILD_DOCS AND HAVE_DOXYGEN)
  set(reflist) # modules reference
  foreach(m ${candidates})
    set(reflist "${reflist} \n- @subpage ${m}")
-    set(all_headers ${all_headers} "${OPENCV_MODULE_opencv_${m}_HEADERS}")
+    set(header_dir "${OPENCV_MODULE_opencv_${m}_LOCATION}/include")
+    if(EXISTS ${header_dir})
+        set(all_headers ${all_headers} ${header_dir})
+    endif()
    set(docs_dir "${OPENCV_MODULE_opencv_${m}_LOCATION}/doc")
    if(EXISTS ${docs_dir})
      set(all_images ${all_images} ${docs_dir})
      set(all_headers ${all_headers} ${docs_dir})
    endif()
  endforeach()
  # additional config

--- a/doc/Doxyfile.in
+++ b/doc/Doxyfile.in
@@ -99,7 +99,7 @@ FILE_PATTERNS          =
 RECURSIVE              = YES
 EXCLUDE                =
 EXCLUDE_SYMLINKS       = NO
-EXCLUDE_PATTERNS       =
+EXCLUDE_PATTERNS       = *.inl.hpp *.impl.hpp *_detail.hpp */cudev/**/detail/*.hpp
 EXCLUDE_SYMBOLS        = cv::DataType<*> int
 EXAMPLE_PATH           = @CMAKE_DOXYGEN_EXAMPLE_PATH@
 EXAMPLE_PATTERNS       = *

--- a/modules/core/include/opencv2/core/cuda.hpp
+++ b/modules/core/include/opencv2/core/cuda.hpp
--- a/modules/core/include/opencv2/core/cuda_stream_accessor.hpp
+++ b/modules/core/include/opencv2/core/cuda_stream_accessor.hpp
@@ -66,6 +66,11 @@ namespace cv
        class Stream;
        class Event;
+        /** @brief Class that enables getting cudaStream\_t from cuda::Stream
+        because it is the only public header that depends on the CUDA Runtime API. Including it
+        brings a dependency to your code.
+         */
        struct StreamAccessor
        {
            CV_EXPORTS static cudaStream_t getStream(const Stream& stream);

--- a/modules/core/include/opencv2/core/cuda_types.hpp
+++ b/modules/core/include/opencv2/core/cuda_types.hpp
@@ -89,6 +89,11 @@ namespace cv
            size_t size;
        };
+        /** @brief Structure similar to cuda::PtrStepSz but containing only a pointer and row step.
+        Width and height fields are excluded due to performance reasons. The structure is intended
+        for internal use or for users who write device code.
+         */
        template <typename T> struct PtrStep : public DevPtr<T>
        {
            __CV_CUDA_HOST_DEVICE__ PtrStep() : step(0) {}
@@ -104,6 +109,12 @@ namespace cv
            __CV_CUDA_HOST_DEVICE__ const T& operator ()(int y, int x) const { return ptr(y)[x]; }
        };
+        /** @brief Lightweight class encapsulating pitched memory on a GPU and passed to nvcc-compiled code (CUDA
+        kernels).
+        Typically, it is used internally by OpenCV and by users who write device code. You can call
+        its members from both host and device code.
+         */
        template <typename T> struct PtrStepSz : public PtrStep<T>
        {
            __CV_CUDA_HOST_DEVICE__ PtrStepSz() : cols(0), rows(0) {}

--- a/modules/cuda/doc/introduction.markdown
+++ b/modules/cuda/doc/introduction.markdown
+CUDA Module Introduction {#cuda_intro}
+========================
+General Information
+-------------------
+The OpenCV CUDA module is a set of classes and functions to utilize CUDA computational capabilities.
+It is implemented using NVIDIA\* CUDA\* Runtime API and supports only NVIDIA GPUs. The OpenCV CUDA
+module includes utility functions, low-level vision primitives, and high-level algorithms. The
+utility functions and low-level primitives provide a powerful infrastructure for developing fast
+vision algorithms taking advantage of CUDA whereas the high-level functionality includes some
+state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others)
+ready to be used by the application developers.
+The CUDA module is designed as a host-level API. This means that if you have pre-compiled OpenCV
+CUDA binaries, you are not required to have the CUDA Toolkit installed or write any extra code to
+make use of the CUDA.
+The OpenCV CUDA module is designed for ease of use and does not require any knowledge of CUDA.
+Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest
+performance. It is helpful to understand the cost of various operations, what the GPU does, what the
+preferred data formats are, and so on. The CUDA module is an effective instrument for quick
+implementation of CUDA-accelerated computer vision algorithms. However, if your algorithm involves
+many simple operations, then, for the best possible performance, you may still need to write your
+own kernels to avoid extra write and read operations on the intermediate results.
+To enable CUDA support, configure OpenCV using CMake with WITH\_CUDA=ON . When the flag is set and
+if CUDA is installed, the full-featured OpenCV CUDA module is built. Otherwise, the module is still
+built but at runtime all functions from the module throw Exception with CV\_GpuNotSupported error
+code, except for cuda::getCudaEnabledDeviceCount(). The latter function returns zero GPU count in
+this case. Building OpenCV without CUDA support does not perform device code compilation, so it does
+not require the CUDA Toolkit installed. Therefore, using the cuda::getCudaEnabledDeviceCount()
+function, you can implement a high-level algorithm that will detect GPU presence at runtime and
+choose an appropriate implementation (CPU or GPU) accordingly.
+Compilation for Different NVIDIA\* Platforms
+--------------------------------------------
+NVIDIA\* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX).
+Binary code often implies a specific GPU architecture and generation, so the compatibility with
+other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the
+set of capabilities or features. Depending on the selected virtual platform, some of the
+instructions are emulated or disabled, even if the real hardware supports all the features.
+At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT
+compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails. By
+default, the OpenCV CUDA module includes:
+\*
+   Binaries for compute capabilities 1.3 and 2.0 (controlled by CUDA\_ARCH\_BIN in CMake)
+\*
+   PTX code for compute capabilities 1.1 and 1.3 (controlled by CUDA\_ARCH\_PTX in CMake)
+This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer
+platforms, the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.1 and 1.2, the
+PTX for 1.1 is JIT'ed. For devices with CC 1.0, no code is available and the functions throw
+Exception. For platforms where JIT compilation is performed first, the run is slow.
+On a GPU with CC 1.0, you can still compile the CUDA module and most of the functions will run
+flawlessly. To achieve this, add "1.0" to the list of binaries, for example,
+CUDA\_ARCH\_BIN="1.0 1.3 2.0" . The functions that cannot be run on CC 1.0 GPUs throw an exception.
+You can always determine at runtime whether the OpenCV GPU-built binaries (or PTX code) are
+compatible with your GPU. The function cuda::DeviceInfo::isCompatible returns the compatibility
+status (true/false).
+Utilizing Multiple GPUs
+-----------------------
+In the current version, each of the OpenCV CUDA algorithms can use only a single GPU. So, to utilize
+multiple GPUs, you have to manually distribute the work between GPUs. Switching active devie can be
+done using cuda::setDevice() function. For more details please read Cuda C Programming Guide.
+While developing algorithms for multiple GPUs, note a data passing overhead. For primitive functions
+and small images, it can be significant, which may eliminate all the advantages of having multiple
+GPUs. But for high-level algorithms, consider using multi-GPU acceleration. For example, the Stereo
+Block Matching algorithm has been successfully parallelized using the following algorithm:
+1.  Split each image of the stereo pair into two horizontal overlapping stripes.
+2.  Process each pair of stripes (from the left and right images) on a separate Fermi\* GPU.
+3.  Merge the results into a single disparity map.
+With this algorithm, a dual GPU gave a 180% performance increase comparing to the single Fermi GPU.
+For a source code example, see <https://github.com/Itseez/opencv/tree/master/samples/gpu/>.
--- a/modules/cuda/include/opencv2/cuda.hpp
+++ b/modules/cuda/include/opencv2/cuda.hpp
--- a/modules/cudaarithm/include/opencv2/cudaarithm.hpp
+++ b/modules/cudaarithm/include/opencv2/cudaarithm.hpp
--- a/modules/cudabgsegm/include/opencv2/cudabgsegm.hpp
+++ b/modules/cudabgsegm/include/opencv2/cudabgsegm.hpp
@@ -50,11 +50,33 @@
 #include "opencv2/core/cuda.hpp"
 #include "opencv2/video/background_segm.hpp"
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudabgsegm Background Segmentation
+  @}
+ */
 namespace cv { namespace cuda {
+//! @addtogroup cudabgsegm
+//! @{
 ////////////////////////////////////////////////////
 // MOG
+/** @brief Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite MOG2001.
+@sa BackgroundSubtractorMOG
+@note
+   -   An example on gaussian mixture based background/foreground segmantation can be found at
+        opencv\_source\_code/samples/gpu/bgfg\_segm.cpp
+ */
 class CV_EXPORTS BackgroundSubtractorMOG : public cv::BackgroundSubtractor
 {
 public:
@@ -78,6 +100,14 @@ public:
    virtual void setNoiseSigma(double noiseSigma) = 0;
 };
+/** @brief Creates mixture-of-gaussian background subtractor
+@param history Length of the history.
+@param nmixtures Number of Gaussian mixtures.
+@param backgroundRatio Background ratio.
+@param noiseSigma Noise strength (standard deviation of the brightness or each color channel). 0
+means some automatic value.
+ */
 CV_EXPORTS Ptr<cuda::BackgroundSubtractorMOG>
    createBackgroundSubtractorMOG(int history = 200, int nmixtures = 5,
                                  double backgroundRatio = 0.7, double noiseSigma = 0);
@@ -85,6 +115,14 @@ CV_EXPORTS Ptr<cuda::BackgroundSubtractorMOG>
 ////////////////////////////////////////////////////
 // MOG2
+/** @brief Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite MOG2004.
+@sa BackgroundSubtractorMOG2
+ */
 class CV_EXPORTS BackgroundSubtractorMOG2 : public cv::BackgroundSubtractorMOG2
 {
 public:
@@ -96,6 +134,15 @@ public:
    virtual void getBackgroundImage(OutputArray backgroundImage, Stream& stream) const = 0;
 };
+/** @brief Creates MOG2 Background Subtractor
+@param history Length of the history.
+@param varThreshold Threshold on the squared Mahalanobis distance between the pixel and the model
+to decide whether a pixel is well described by the background model. This parameter does not
+affect the background update.
+@param detectShadows If true, the algorithm will detect shadows and mark them. It decreases the
+speed a bit, so if you do not need this feature, set the parameter to false.
+ */
 CV_EXPORTS Ptr<cuda::BackgroundSubtractorMOG2>
    createBackgroundSubtractorMOG2(int history = 500, double varThreshold = 16,
                                   bool detectShadows = true);
@@ -103,6 +150,12 @@ CV_EXPORTS Ptr<cuda::BackgroundSubtractorMOG2>
 ////////////////////////////////////////////////////
 // GMG
+/** @brief Background/Foreground Segmentation Algorithm.
+The class discriminates between foreground and background pixels by building and maintaining a model
+of the background. Any pixel which does not fit this model is then deemed to be foreground. The
+class implements algorithm described in @cite GMG2012.
+ */
 class CV_EXPORTS BackgroundSubtractorGMG : public cv::BackgroundSubtractor
 {
 public:
@@ -140,54 +193,71 @@ public:
    virtual void setMaxVal(double val) = 0;
 };
+/** @brief Creates GMG Background Subtractor
+@param initializationFrames Number of frames of video to use to initialize histograms.
+@param decisionThreshold Value above which pixel is determined to be FG.
+ */
 CV_EXPORTS Ptr<cuda::BackgroundSubtractorGMG>
    createBackgroundSubtractorGMG(int initializationFrames = 120, double decisionThreshold = 0.8);
 ////////////////////////////////////////////////////
 // FGD
-/**
+/** @brief The class discriminates between foreground and background pixels by building and maintaining a model
- * Foreground Object Detection from Videos Containing Complex Background.
+of the background.
- * Liyuan Li, Weimin Huang, Irene Y.H. Gu, and Qi Tian.
- * ACM MM2003 9p
+Any pixel which does not fit this model is then deemed to be foreground. The class implements
+algorithm described in @cite FGD2003.
+@sa BackgroundSubtractor
 */
 class CV_EXPORTS BackgroundSubtractorFGD : public cv::BackgroundSubtractor
 {
 public:
+    /** @brief Returns the output foreground regions calculated by findContours.
+    @param foreground\_regions Output array (CPU memory).
+     */
    virtual void getForegroundRegions(OutputArrayOfArrays foreground_regions) = 0;
 };
 struct CV_EXPORTS FGDParams
 {
-    int Lc;  // Quantized levels per 'color' component. Power of two, typically 32, 64 or 128.
+    int Lc;  //!< Quantized levels per 'color' component. Power of two, typically 32, 64 or 128.
-    int N1c; // Number of color vectors used to model normal background color variation at a given pixel.
+    int N1c; //!< Number of color vectors used to model normal background color variation at a given pixel.
-    int N2c; // Number of color vectors retained at given pixel.  Must be > N1c, typically ~ 5/3 of N1c.
+    int N2c; //!< Number of color vectors retained at given pixel.  Must be > N1c, typically ~ 5/3 of N1c.
-    // Used to allow the first N1c vectors to adapt over time to changing background.
+    //!< Used to allow the first N1c vectors to adapt over time to changing background.
-    int Lcc;  // Quantized levels per 'color co-occurrence' component.  Power of two, typically 16, 32 or 64.
+    int Lcc;  //!< Quantized levels per 'color co-occurrence' component.  Power of two, typically 16, 32 or 64.
-    int N1cc; // Number of color co-occurrence vectors used to model normal background color variation at a given pixel.
+    int N1cc; //!< Number of color co-occurrence vectors used to model normal background color variation at a given pixel.
-    int N2cc; // Number of color co-occurrence vectors retained at given pixel.  Must be > N1cc, typically ~ 5/3 of N1cc.
+    int N2cc; //!< Number of color co-occurrence vectors retained at given pixel.  Must be > N1cc, typically ~ 5/3 of N1cc.
-    // Used to allow the first N1cc vectors to adapt over time to changing background.
+    //!< Used to allow the first N1cc vectors to adapt over time to changing background.
-    bool is_obj_without_holes; // If TRUE we ignore holes within foreground blobs. Defaults to TRUE.
+    bool is_obj_without_holes; //!< If TRUE we ignore holes within foreground blobs. Defaults to TRUE.
-    int perform_morphing;     // Number of erode-dilate-erode foreground-blob cleanup iterations.
+    int perform_morphing;     //!< Number of erode-dilate-erode foreground-blob cleanup iterations.
-    // These erase one-pixel junk blobs and merge almost-touching blobs. Default value is 1.
+    //!< These erase one-pixel junk blobs and merge almost-touching blobs. Default value is 1.
-    float alpha1; // How quickly we forget old background pixel values seen. Typically set to 0.1.
+    float alpha1; //!< How quickly we forget old background pixel values seen. Typically set to 0.1.
-    float alpha2; // "Controls speed of feature learning". Depends on T. Typical value circa 0.005.
+    float alpha2; //!< "Controls speed of feature learning". Depends on T. Typical value circa 0.005.
-    float alpha3; // Alternate to alpha2, used (e.g.) for quicker initial convergence. Typical value 0.1.
+    float alpha3; //!< Alternate to alpha2, used (e.g.) for quicker initial convergence. Typical value 0.1.
-    float delta;   // Affects color and color co-occurrence quantization, typically set to 2.
+    float delta;   //!< Affects color and color co-occurrence quantization, typically set to 2.
-    float T;       // A percentage value which determines when new features can be recognized as new background. (Typically 0.9).
+    float T;       //!< A percentage value which determines when new features can be recognized as new background. (Typically 0.9).
-    float minArea; // Discard foreground blobs whose bounding box is smaller than this threshold.
+    float minArea; //!< Discard foreground blobs whose bounding box is smaller than this threshold.
-    // default Params
+    //! default Params
    FGDParams();
 };
+/** @brief Creates FGD Background Subtractor
+@param params Algorithm's parameters. See @cite FGD2003 for explanation.
+ */
 CV_EXPORTS Ptr<cuda::BackgroundSubtractorFGD>
    createBackgroundSubtractorFGD(const FGDParams& params = FGDParams());
+//! @}
 }} // namespace cv { namespace cuda {
 #endif /* __OPENCV_CUDABGSEGM_HPP__ */
--- a/modules/cudacodec/include/opencv2/cudacodec.hpp
+++ b/modules/cudacodec/include/opencv2/cudacodec.hpp
--- a/modules/cudafeatures2d/include/opencv2/cudafeatures2d.hpp
+++ b/modules/cudafeatures2d/include/opencv2/cudafeatures2d.hpp
--- a/modules/cudafilters/include/opencv2/cudafilters.hpp
+++ b/modules/cudafilters/include/opencv2/cudafilters.hpp
--- a/modules/cudaimgproc/include/opencv2/cudaimgproc.hpp
+++ b/modules/cudaimgproc/include/opencv2/cudaimgproc.hpp
--- a/modules/cudalegacy/include/opencv2/cudalegacy.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy.hpp
@@ -49,4 +49,11 @@
 #include "opencv2/cudalegacy/NCVHaarObjectDetection.hpp"
 #include "opencv2/cudalegacy/NCVBroxOpticalFlow.hpp"
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudalegacy Legacy support
+  @}
+*/
 #endif /* __OPENCV_CUDALEGACY_HPP__ */
--- a/modules/cudalegacy/include/opencv2/cudalegacy/NCV.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCV.hpp
@@ -60,6 +60,8 @@
 //
 //==============================================================================
+//! @addtogroup cudalegacy
+//! @{
 /**
 * Compile-time assert namespace
@@ -1023,6 +1025,6 @@ CV_EXPORTS NCVStatus ncvDrawRects_32u_device(Ncv32u *d_dst, Ncv32u dstStride, Nc
    NCVMatrixAlloc<type> name(alloc, width, height); \
    ncvAssertReturn(name.isMemAllocated(), err);
+//! @}
 #endif // _ncv_hpp_
--- a/modules/cudalegacy/include/opencv2/cudalegacy/NCVBroxOpticalFlow.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVBroxOpticalFlow.hpp
@@ -62,6 +62,9 @@
 #include "opencv2/cudalegacy/NCV.hpp"
+//! @addtogroup cudalegacy
+//! @{
 /// \brief Model and solver parameters
 struct NCVBroxOpticalFlowDescriptor
 {
@@ -89,6 +92,7 @@ struct NCVBroxOpticalFlowDescriptor
 /// \param [in]  frame1            frame to track
 /// \param [out] u                 flow horizontal component (along \b x axis)
 /// \param [out] v                 flow vertical component (along \b y axis)
+/// \param       stream
 /// \return                        computation status
 /////////////////////////////////////////////////////////////////////////////////////////
@@ -101,4 +105,6 @@ NCVStatus NCVBroxOpticalFlow(const NCVBroxOpticalFlowDescriptor desc,
                             NCVMatrix<Ncv32f> &v,
                             cudaStream_t stream);
+//! @}
 #endif
--- a/modules/cudalegacy/include/opencv2/cudalegacy/NCVHaarObjectDetection.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVHaarObjectDetection.hpp
@@ -61,6 +61,8 @@
 #include "opencv2/cudalegacy/NCV.hpp"
+//! @addtogroup cudalegacy
+//! @{
 //==============================================================================
 //
@@ -456,6 +458,6 @@ CV_EXPORTS NCVStatus ncvHaarStoreNVBIN_host(const cv::String &filename,
                                             NCVVector<HaarClassifierNode128> &h_HaarNodes,
                                             NCVVector<HaarFeature64> &h_HaarFeatures);
+//! @}
 #endif // _ncvhaarobjectdetection_hpp_
--- a/modules/cudalegacy/include/opencv2/cudalegacy/NCVPyramid.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NCVPyramid.hpp
@@ -48,6 +48,8 @@
 #include "opencv2/cudalegacy/NCV.hpp"
 #include "opencv2/core/cuda/common.hpp"
+//! @cond IGNORED
 namespace cv { namespace cuda { namespace device
 {
    namespace pyramid
@@ -106,4 +108,6 @@ private:
 #endif //_WIN32
+//! @endcond
 #endif //_ncvpyramid_hpp_
--- a/modules/cudalegacy/include/opencv2/cudalegacy/NPP_staging.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/NPP_staging.hpp
@@ -45,19 +45,14 @@
 #include "opencv2/cudalegacy/NCV.hpp"
+//! @addtogroup cudalegacy
-/**
+//! @{
-* \file NPP_staging.hpp
-* NPP Staging Library
-*/
 /** \defgroup core_npp NPPST Core
 * Basic functions for CUDA streams management.
 * @{
 */
 /**
 * Gets an active CUDA stream used by NPPST
 * NOT THREAD SAFE
@@ -168,6 +163,7 @@ NCVStatus nppiStInterpolateFrames(const NppStInterpolationState *pState);
 * \param nSrcStep          [IN]  Source image line step
 * \param pDst              [OUT] Destination image pointer (CUDA device memory)
 * \param dstSize           [OUT] Destination image size
+ * \param nDstStep
 * \param oROI              [IN]  Region of interest in the source image
 * \param borderType        [IN]  Type of border
 * \param pKernel           [IN]  Pointer to row kernel values (CUDA device memory)
@@ -201,6 +197,7 @@ NCVStatus nppiStFilterRowBorder_32f_C1R(const Ncv32f *pSrc,
 * \param nSrcStep          [IN]  Source image line step
 * \param pDst              [OUT] Destination image pointer (CUDA device memory)
 * \param dstSize           [OUT] Destination image size
+ * \param nDstStep          [IN]
 * \param oROI              [IN]  Region of interest in the source image
 * \param borderType        [IN]  Type of border
 * \param pKernel           [IN]  Pointer to column kernel values (CUDA device memory)
@@ -228,7 +225,7 @@ NCVStatus nppiStFilterColumnBorder_32f_C1R(const Ncv32f *pSrc,
 /** Size of buffer required for vector image warping.
 *
 * \param srcSize           [IN]  Source image size
- * \param nStep             [IN]  Source image line step
+ * \param nSrcStep          [IN]  Source image line step
 * \param hpSize            [OUT] Where to store computed size (host memory)
 *
 * \return NCV status code
@@ -285,6 +282,7 @@ NCVStatus nppiStVectorWarp_PSF1x1_32f_C1(const Ncv32f *pSrc,
 * \param pU                [IN]  Pointer to horizontal displacement field (CUDA device memory)
 * \param pV                [IN]  Pointer to vertical displacement field (CUDA device memory)
 * \param nVFStep           [IN]  Displacement field line step
+ * \param pBuffer
 * \param timeScale         [IN]  Value by which displacement field will be scaled for warping
 * \param pDst              [OUT] Destination image pointer (CUDA device memory)
 *
@@ -903,5 +901,6 @@ NCVStatus nppsStCompact_32f_host(Ncv32f *h_src, Ncv32u srcLen,
 /*@}*/
+//! @}
 #endif // _npp_staging_hpp_
--- a/modules/cudalegacy/include/opencv2/cudalegacy/private.hpp
+++ b/modules/cudalegacy/include/opencv2/cudalegacy/private.hpp
@@ -56,6 +56,8 @@
 #include "opencv2/cudalegacy.hpp"
+//! @cond IGNORED
 namespace cv { namespace cuda
 {
    class NppStStreamHandler
@@ -89,4 +91,6 @@ namespace cv { namespace cuda
 #define ncvSafeCall(expr)  cv::cuda::checkNcvError(expr, __FILE__, __LINE__, CV_Func)
+//! @endcond
 #endif // __OPENCV_CORE_CUDALEGACY_PRIVATE_HPP__
--- a/modules/cudaoptflow/include/opencv2/cudaoptflow.hpp
+++ b/modules/cudaoptflow/include/opencv2/cudaoptflow.hpp
@@ -49,8 +49,21 @@
 #include "opencv2/core/cuda.hpp"
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudaoptflow Optical Flow
+  @}
+ */
 namespace cv { namespace cuda {
+//! @addtogroup cudaoptflow
+//! @{
+/** @brief Class computing the optical flow for two images using Brox et al Optical Flow algorithm
+(@cite Brox2004). :
+ */
 class CV_EXPORTS BroxOpticalFlow
 {
 public:
@@ -88,16 +101,58 @@ public:
    GpuMat buf;
 };
+/** @brief Class used for calculating an optical flow.
+The class can calculate an optical flow for a sparse feature set or dense optical flow using the
+iterative Lucas-Kanade method with pyramids.
+@sa calcOpticalFlowPyrLK
+@note
+   -   An example of the Lucas Kanade optical flow algorithm can be found at
+        opencv\_source\_code/samples/gpu/pyrlk\_optical\_flow.cpp
+ */
 class CV_EXPORTS PyrLKOpticalFlow
 {
 public:
    PyrLKOpticalFlow();
+    /** @brief Calculate an optical flow for a sparse feature set.
+    @param prevImg First 8-bit input image (supports both grayscale and color images).
+    @param nextImg Second input image of the same size and the same type as prevImg .
+    @param prevPts Vector of 2D points for which the flow needs to be found. It must be one row matrix
+    with CV\_32FC2 type.
+    @param nextPts Output vector of 2D points (with single-precision floating-point coordinates)
+    containing the calculated new positions of input features in the second image. When useInitialFlow
+    is true, the vector must have the same size as in the input.
+    @param status Output status vector (CV\_8UC1 type). Each element of the vector is set to 1 if the
+    flow for the corresponding features has been found. Otherwise, it is set to 0.
+    @param err Output vector (CV\_32FC1 type) that contains the difference between patches around the
+    original and moved points or min eigen value if getMinEigenVals is checked. It can be NULL, if not
+    needed.
+    @sa calcOpticalFlowPyrLK
+     */
    void sparse(const GpuMat& prevImg, const GpuMat& nextImg, const GpuMat& prevPts, GpuMat& nextPts,
        GpuMat& status, GpuMat* err = 0);
+    /** @brief Calculate dense optical flow.
+    @param prevImg First 8-bit grayscale input image.
+    @param nextImg Second input image of the same size and the same type as prevImg .
+    @param u Horizontal component of the optical flow of the same size as input images, 32-bit
+    floating-point, single-channel
+    @param v Vertical component of the optical flow of the same size as input images, 32-bit
+    floating-point, single-channel
+    @param err Output vector (CV\_32FC1 type) that contains the difference between patches around the
+    original and moved points or min eigen value if getMinEigenVals is checked. It can be NULL, if not
+    needed.
+     */
    void dense(const GpuMat& prevImg, const GpuMat& nextImg, GpuMat& u, GpuMat& v, GpuMat* err = 0);
+    /** @brief Releases inner buffers memory.
+    */
    void releaseMemory();
    Size winSize;
@@ -115,6 +170,8 @@ private:
    GpuMat vPyr_[2];
 };
+/** @brief Class computing a dense optical flow using the Gunnar Farneback’s algorithm. :
+ */
 class CV_EXPORTS FarnebackOpticalFlow
 {
 public:
@@ -139,8 +196,20 @@ public:
    double polySigma;
    int flags;
+    /** @brief Computes a dense optical flow using the Gunnar Farneback’s algorithm.
+    @param frame0 First 8-bit gray-scale input image
+    @param frame1 Second 8-bit gray-scale input image
+    @param flowx Flow horizontal component
+    @param flowy Flow vertical component
+    @param s Stream
+    @sa calcOpticalFlowFarneback
+     */
    void operator ()(const GpuMat &frame0, const GpuMat &frame1, GpuMat &flowx, GpuMat &flowy, Stream &s = Stream::Null());
+    /** @brief Releases unused auxiliary memory buffers.
+     */
    void releaseMemory()
    {
        frames_[0].release();
@@ -295,20 +364,22 @@ private:
    GpuMat extended_I1;
 };
-//! Interpolate frames (images) using provided optical flow (displacement field).
+/** @brief Interpolates frames (images) using provided optical flow (displacement field).
-//! frame0   - frame 0 (32-bit floating point images, single channel)
-//! frame1   - frame 1 (the same type and size)
+@param frame0 First frame (32-bit floating point images, single channel).
-//! fu       - forward horizontal displacement
+@param frame1 Second frame. Must have the same type and size as frame0 .
-//! fv       - forward vertical displacement
+@param fu Forward horizontal displacement.
-//! bu       - backward horizontal displacement
+@param fv Forward vertical displacement.
-//! bv       - backward vertical displacement
+@param bu Backward horizontal displacement.
-//! pos      - new frame position
+@param bv Backward vertical displacement.
-//! newFrame - new frame
+@param pos New frame position.
-//! buf      - temporary buffer, will have width x 6*height size, CV_32FC1 type and contain 6 GpuMat;
+@param newFrame Output image.
-//!            occlusion masks            0, occlusion masks            1,
+@param buf Temporary buffer, will have width x 6\*height size, CV\_32FC1 type and contain 6
-//!            interpolated forward flow  0, interpolated forward flow  1,
+GpuMat: occlusion masks for first frame, occlusion masks for second, interpolated forward
-//!            interpolated backward flow 0, interpolated backward flow 1
+horizontal flow, interpolated forward vertical flow, interpolated backward horizontal flow,
-//!
+interpolated backward vertical flow.
+@param stream Stream for the asynchronous version.
+ */
 CV_EXPORTS void interpolateFrames(const GpuMat& frame0, const GpuMat& frame1,
                                  const GpuMat& fu, const GpuMat& fv,
                                  const GpuMat& bu, const GpuMat& bv,
@@ -317,6 +388,8 @@ CV_EXPORTS void interpolateFrames(const GpuMat& frame0, const GpuMat& frame1,
 CV_EXPORTS void createOpticalFlowNeedleMap(const GpuMat& u, const GpuMat& v, GpuMat& vertex, GpuMat& colors);
+//! @}
 }} // namespace cv { namespace cuda {
 #endif /* __OPENCV_CUDAOPTFLOW_HPP__ */
--- a/modules/cudastereo/include/opencv2/cudastereo.hpp
+++ b/modules/cudastereo/include/opencv2/cudastereo.hpp
--- a/modules/cudawarping/include/opencv2/cudawarping.hpp
+++ b/modules/cudawarping/include/opencv2/cudawarping.hpp
@@ -50,54 +50,178 @@
 #include "opencv2/core/cuda.hpp"
 #include "opencv2/imgproc.hpp"
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudawarping Image Warping
+  @}
+ */
 namespace cv { namespace cuda {
-//! DST[x,y] = SRC[xmap[x,y],ymap[x,y]]
+//! @addtogroup cudawarping
-//! supports only CV_32FC1 map type
+//! @{
+/** @brief Applies a generic geometrical transformation to an image.
+@param src Source image.
+@param dst Destination image with the size the same as xmap and the type the same as src .
+@param xmap X values. Only CV\_32FC1 type is supported.
+@param ymap Y values. Only CV\_32FC1 type is supported.
+@param interpolation Interpolation method (see resize ). INTER\_NEAREST , INTER\_LINEAR and
+INTER\_CUBIC are supported for now.
+@param borderMode Pixel extrapolation method (see borderInterpolate ). BORDER\_REFLECT101 ,
+BORDER\_REPLICATE , BORDER\_CONSTANT , BORDER\_REFLECT and BORDER\_WRAP are supported for now.
+@param borderValue Value used in case of a constant border. By default, it is 0.
+@param stream Stream for the asynchronous version.
+The function transforms the source image using the specified map:
+\f[\texttt{dst} (x,y) =  \texttt{src} (xmap(x,y), ymap(x,y))\f]
+Values of pixels with non-integer coordinates are computed using the bilinear interpolation.
+@sa remap
+ */
 CV_EXPORTS void remap(InputArray src, OutputArray dst, InputArray xmap, InputArray ymap,
                      int interpolation, int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(),
                      Stream& stream = Stream::Null());
-//! resizes the image
+/** @brief Resizes an image.
-//! Supports INTER_NEAREST, INTER_LINEAR, INTER_CUBIC, INTER_AREA
+@param src Source image.
+@param dst Destination image with the same type as src . The size is dsize (when it is non-zero)
+or the size is computed from src.size() , fx , and fy .
+@param dsize Destination image size. If it is zero, it is computed as:
+\f[\texttt{dsize = Size(round(fx*src.cols), round(fy*src.rows))}\f]
+Either dsize or both fx and fy must be non-zero.
+@param fx Scale factor along the horizontal axis. If it is zero, it is computed as:
+\f[\texttt{(double)dsize.width/src.cols}\f]
+@param fy Scale factor along the vertical axis. If it is zero, it is computed as:
+\f[\texttt{(double)dsize.height/src.rows}\f]
+@param interpolation Interpolation method. INTER\_NEAREST , INTER\_LINEAR and INTER\_CUBIC are
+supported for now.
+@param stream Stream for the asynchronous version.
+@sa resize
+ */
 CV_EXPORTS void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation = INTER_LINEAR, Stream& stream = Stream::Null());
-//! warps the image using affine transformation
+/** @brief Applies an affine transformation to an image.
-//! Supports INTER_NEAREST, INTER_LINEAR, INTER_CUBIC
+@param src Source image. CV\_8U , CV\_16U , CV\_32S , or CV\_32F depth and 1, 3, or 4 channels are
+supported.
+@param dst Destination image with the same type as src . The size is dsize .
+@param M *2x3* transformation matrix.
+@param dsize Size of the destination image.
+@param flags Combination of interpolation methods (see resize) and the optional flag
+WARP\_INVERSE\_MAP specifying that M is an inverse transformation ( dst=\>src ). Only
+INTER\_NEAREST , INTER\_LINEAR , and INTER\_CUBIC interpolation methods are supported.
+@param borderMode
+@param borderValue
+@param stream Stream for the asynchronous version.
+@sa warpAffine
+ */
 CV_EXPORTS void warpAffine(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null());
+/** @brief Builds transformation maps for affine transformation.
+@param M *2x3* transformation matrix.
+@param inverse Flag specifying that M is an inverse transformation ( dst=\>src ).
+@param dsize Size of the destination image.
+@param xmap X values with CV\_32FC1 type.
+@param ymap Y values with CV\_32FC1 type.
+@param stream Stream for the asynchronous version.
+@sa cuda::warpAffine , cuda::remap
+ */
 CV_EXPORTS void buildWarpAffineMaps(InputArray M, bool inverse, Size dsize, OutputArray xmap, OutputArray ymap, Stream& stream = Stream::Null());
-//! warps the image using perspective transformation
+/** @brief Applies a perspective transformation to an image.
-//! Supports INTER_NEAREST, INTER_LINEAR, INTER_CUBIC
+@param src Source image. CV\_8U , CV\_16U , CV\_32S , or CV\_32F depth and 1, 3, or 4 channels are
+supported.
+@param dst Destination image with the same type as src . The size is dsize .
+@param M *3x3* transformation matrix.
+@param dsize Size of the destination image.
+@param flags Combination of interpolation methods (see resize ) and the optional flag
+WARP\_INVERSE\_MAP specifying that M is the inverse transformation ( dst =\> src ). Only
+INTER\_NEAREST , INTER\_LINEAR , and INTER\_CUBIC interpolation methods are supported.
+@param borderMode
+@param borderValue
+@param stream Stream for the asynchronous version.
+@sa warpPerspective
+ */
 CV_EXPORTS void warpPerspective(InputArray src, OutputArray dst, InputArray M, Size dsize, int flags = INTER_LINEAR,
    int borderMode = BORDER_CONSTANT, Scalar borderValue = Scalar(), Stream& stream = Stream::Null());
+/** @brief Builds transformation maps for perspective transformation.
+@param M *3x3* transformation matrix.
+@param inverse Flag specifying that M is an inverse transformation ( dst=\>src ).
+@param dsize Size of the destination image.
+@param xmap X values with CV\_32FC1 type.
+@param ymap Y values with CV\_32FC1 type.
+@param stream Stream for the asynchronous version.
+@sa cuda::warpPerspective , cuda::remap
+ */
 CV_EXPORTS void buildWarpPerspectiveMaps(InputArray M, bool inverse, Size dsize, OutputArray xmap, OutputArray ymap, Stream& stream = Stream::Null());
-//! builds plane warping maps
+/** @brief Builds plane warping maps.
+ */
 CV_EXPORTS void buildWarpPlaneMaps(Size src_size, Rect dst_roi, InputArray K, InputArray R, InputArray T, float scale,
                                   OutputArray map_x, OutputArray map_y, Stream& stream = Stream::Null());
-//! builds cylindrical warping maps
+/** @brief Builds cylindrical warping maps.
+ */
 CV_EXPORTS void buildWarpCylindricalMaps(Size src_size, Rect dst_roi, InputArray K, InputArray R, float scale,
                                         OutputArray map_x, OutputArray map_y, Stream& stream = Stream::Null());
-//! builds spherical warping maps
+/** @brief Builds spherical warping maps.
+ */
 CV_EXPORTS void buildWarpSphericalMaps(Size src_size, Rect dst_roi, InputArray K, InputArray R, float scale,
                                       OutputArray map_x, OutputArray map_y, Stream& stream = Stream::Null());
-//! rotates an image around the origin (0,0) and then shifts it
+/** @brief Rotates an image around the origin (0,0) and then shifts it.
-//! supports INTER_NEAREST, INTER_LINEAR, INTER_CUBIC
-//! supports 1, 3 or 4 channels images with CV_8U, CV_16U or CV_32F depth
+@param src Source image. Supports 1, 3 or 4 channels images with CV\_8U , CV\_16U or CV\_32F
+depth.
+@param dst Destination image with the same type as src . The size is dsize .
+@param dsize Size of the destination image.
+@param angle Angle of rotation in degrees.
+@param xShift Shift along the horizontal axis.
+@param yShift Shift along the vertical axis.
+@param interpolation Interpolation method. Only INTER\_NEAREST , INTER\_LINEAR , and INTER\_CUBIC
+are supported.
+@param stream Stream for the asynchronous version.
+@sa cuda::warpAffine
+ */
 CV_EXPORTS void rotate(InputArray src, OutputArray dst, Size dsize, double angle, double xShift = 0, double yShift = 0,
                       int interpolation = INTER_LINEAR, Stream& stream = Stream::Null());
-//! smoothes the source image and downsamples it
+/** @brief Smoothes an image and downsamples it.
+@param src Source image.
+@param dst Destination image. Will have Size((src.cols+1)/2, (src.rows+1)/2) size and the same
+type as src .
+@param stream Stream for the asynchronous version.
+@sa pyrDown
+ */
 CV_EXPORTS void pyrDown(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
-//! upsamples the source image and then smoothes it
+/** @brief Upsamples an image and then smoothes it.
+@param src Source image.
+@param dst Destination image. Will have Size(src.cols\*2, src.rows\*2) size and the same type as
+src .
+@param stream Stream for the asynchronous version.
+ */
 CV_EXPORTS void pyrUp(InputArray src, OutputArray dst, Stream& stream = Stream::Null());
 class CV_EXPORTS ImagePyramid : public Algorithm
@@ -108,6 +232,8 @@ public:
 CV_EXPORTS Ptr<ImagePyramid> createImagePyramid(InputArray img, int nLayers = -1, Stream& stream = Stream::Null());
+//! @}
 }} // namespace cv { namespace cuda {
 #endif /* __OPENCV_CUDAWARPING_HPP__ */
--- a/modules/cudev/include/opencv2/cudev.hpp
+++ b/modules/cudev/include/opencv2/cudev.hpp
@@ -109,4 +109,11 @@
 #include "cudev/expr/unary_op.hpp"
 #include "cudev/expr/warping.hpp"
+/**
+  @addtogroup cuda
+  @{
+    @defgroup cudev Device layer
+  @}
+*/
 #endif
--- a/modules/cudev/include/opencv2/cudev/block/block.hpp
+++ b/modules/cudev/include/opencv2/cudev/block/block.hpp
@@ -50,6 +50,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 struct Block
 {
    __device__ __forceinline__ static uint blockId()
@@ -122,6 +125,9 @@ __device__ __forceinline__ static void blockTransfrom(InIt1 beg1, InIt1 end1, In
    for(; t1 < end1; t1 += STRIDE, t2 += STRIDE, o += STRIDE)
        *o = op(*t1, *t2);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/block/dynamic_smem.hpp
+++ b/modules/cudev/include/opencv2/cudev/block/dynamic_smem.hpp
@@ -50,6 +50,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class T> struct DynamicSharedMem
 {
    __device__ __forceinline__ operator T*()
@@ -81,6 +84,8 @@ template <> struct DynamicSharedMem<double>
    }
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/block/reduce.hpp
+++ b/modules/cudev/include/opencv2/cudev/block/reduce.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // blockReduce
 template <int N, typename T, class Op>
@@ -123,6 +126,8 @@ __device__ __forceinline__ void blockReduceKeyVal(const tuple<KP0, KP1, KP2, KP3
            >(skeys, key, svals, val, tid, cmp);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/block/scan.hpp
+++ b/modules/cudev/include/opencv2/cudev/block/scan.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <int THREADS_NUM, typename T>
 __device__ T blockScanInclusive(T data, volatile T* smem, uint tid)
 {
@@ -96,6 +99,8 @@ __device__ __forceinline__ T blockScanExclusive(T data, volatile T* smem, uint t
    return blockScanInclusive<THREADS_NUM>(data, smem, tid) - data;
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/block/vec_distance.hpp
+++ b/modules/cudev/include/opencv2/cudev/block/vec_distance.hpp
@@ -53,6 +53,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // NormL1
 template <typename T> struct NormL1
@@ -179,6 +182,8 @@ struct NormHamming
    }
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/common.hpp
+++ b/modules/cudev/include/opencv2/cudev/common.hpp
@@ -52,6 +52,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 using namespace cv::cuda;
 // CV_CUDEV_ARCH
@@ -84,6 +87,8 @@ __host__ __device__ __forceinline__ int divUp(int total, int grain)
 #define CV_PI_F   ((float)CV_PI)
 #define CV_LOG2_F ((float)CV_LOG2)
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/binary_func.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/binary_func.hpp
@@ -55,6 +55,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 #define CV_CUDEV_EXPR_BINARY_FUNC(name) \
    template <class SrcPtr1, class SrcPtr2> \
    __host__ Expr<BinaryTransformPtrSz<typename PtrTraits<SrcPtr1>::ptr_type, typename PtrTraits<SrcPtr2>::ptr_type, name ## _func<typename LargerType<typename PtrTraits<SrcPtr1>::value_type, typename PtrTraits<SrcPtr2>::value_type>::type> > > \
@@ -70,6 +73,8 @@ CV_CUDEV_EXPR_BINARY_FUNC(absdiff)
 #undef CV_CUDEV_EXPR_BINARY_FUNC
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/binary_op.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/binary_op.hpp
@@ -58,6 +58,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // Binary Operations
 #define CV_CUDEV_EXPR_BINOP_INST(op, functor) \
@@ -230,6 +233,8 @@ CV_CUDEV_EXPR_BINOP_INST(>>, bit_rshift)
 #undef CV_CUDEV_EXPR_BINOP_INST
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/color.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/color.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 #define CV_CUDEV_EXPR_CVTCOLOR_INST(name) \
    template <class SrcPtr> \
    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, name ## _func<typename VecTraits<typename PtrTraits<SrcPtr>::value_type>::elem_type> > > \
@@ -277,6 +280,8 @@ CV_CUDEV_EXPR_CVTCOLOR_INST(Luv4_to_LBGRA)
 #undef CV_CUDEV_EXPR_CVTCOLOR_INST
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/deriv.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/deriv.hpp
@@ -53,6 +53,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // derivX
 template <class SrcPtr>
@@ -116,6 +119,8 @@ laplacian_(const SrcPtr& src)
    return makeExpr(laplacianPtr<ksize>(src));
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/expr.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/expr.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Body> struct Expr
 {
    Body body;
@@ -87,6 +90,8 @@ template <class Body> struct PtrTraits< Expr<Body> >
    }
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/per_element_func.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/per_element_func.hpp
@@ -56,6 +56,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // min/max
 template <class SrcPtr1, class SrcPtr2>
@@ -127,6 +130,8 @@ lut_(const SrcPtr& src, const TablePtr& tbl)
    return makeExpr(lutPtr(src, tbl));
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/reduction.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/reduction.hpp
@@ -56,6 +56,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // sum
 template <class SrcPtr> struct SumExprBody
@@ -254,6 +257,8 @@ integral_(const SrcPtr& src)
    return makeExpr(body);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/unary_func.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/unary_func.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 #define CV_CUDEV_EXPR_UNARY_FUNC(name) \
    template <class SrcPtr> \
    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<SrcPtr>::ptr_type, name ## _func<typename PtrTraits<SrcPtr>::value_type> > > \
@@ -93,6 +96,8 @@ pow_(const SrcPtr& src, float power)
    return makeExpr(transformPtr(src, bind2nd(pow_func<typename PtrTraits<SrcPtr>::value_type>(), power)));
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/unary_op.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/unary_op.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 #define CV_CUDEV_EXPR_UNOP_INST(op, functor) \
    template <typename T> \
    __host__ Expr<UnaryTransformPtrSz<typename PtrTraits<GpuMat_<T> >::ptr_type, functor<T> > > \
@@ -89,6 +92,8 @@ CV_CUDEV_EXPR_UNOP_INST(~, bit_not)
 #undef CV_CUDEV_EXPR_UNOP_INST
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/expr/warping.hpp
+++ b/modules/cudev/include/opencv2/cudev/expr/warping.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // resize
 template <class SrcPtr>
@@ -166,6 +169,8 @@ transpose_(const SrcPtr& src)
    return makeExpr(body);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/functional/color_cvt.hpp
+++ b/modules/cudev/include/opencv2/cudev/functional/color_cvt.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // Various 3/4-channel to 3/4-channel RGB transformations
 #define CV_CUDEV_RGB2RGB_INST(name, scn, dcn, bidx) \
@@ -469,6 +472,8 @@ CV_CUDEV_RGB5x52GRAY_INST(BGR565_to_GRAY, 6)
 #undef CV_CUDEV_RGB5x52GRAY_INST
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/functional/functional.hpp
+++ b/modules/cudev/include/opencv2/cudev/functional/functional.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // Function Objects
 template <typename _Arg, typename _Result> struct unary_function
@@ -873,6 +876,8 @@ template <typename F> struct IsBinaryFunction
    enum { value = (sizeof(check(makeF())) == sizeof(Yes)) };
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/functional/tuple_adapter.hpp
+++ b/modules/cudev/include/opencv2/cudev/functional/tuple_adapter.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Op, int n> struct UnaryTupleAdapter
 {
    typedef typename Op::result_type result_type;
@@ -93,6 +96,8 @@ __host__ __device__ BinaryTupleAdapter<Op, n0, n1> binaryTupleAdapter(const Op&
    return a;
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/copy.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/copy.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Policy, class SrcPtr, typename DstType, class MaskPtr>
 __host__ void gridCopy_(const SrcPtr& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
 {
@@ -447,6 +450,8 @@ __host__ void gridCopy_(const SrcPtrTuple& src, const tuple< GlobPtrSz<D0>, Glob
    gridCopy_<DefaultCopyPolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/histogram.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/histogram.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <int BIN_COUNT, class Policy, class SrcPtr, typename ResType, class MaskPtr>
 __host__ void gridHistogram_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
 {
@@ -114,6 +117,8 @@ __host__ void gridHistogram(const SrcPtr& src, GpuMat_<ResType>& dst, Stream& st
    gridHistogram_<BIN_COUNT, DefaultHistogramPolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/integral.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/integral.hpp
@@ -53,6 +53,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class SrcPtr, typename DstType>
 __host__ void gridIntegral(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
 {
@@ -64,6 +67,8 @@ __host__ void gridIntegral(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& str
    integral_detail::integral(shrinkPtr(src), shrinkPtr(dst), rows, cols, StreamAccessor::getStream(stream));
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/pyramids.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/pyramids.hpp
@@ -55,6 +55,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Brd, class SrcPtr, typename DstType>
 __host__ void gridPyrDown_(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
 {
@@ -83,6 +86,8 @@ __host__ void gridPyrUp(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream
    pyramids_detail::pyrUp(shrinkPtr(src), shrinkPtr(dst), rows, cols, dst.rows, dst.cols, StreamAccessor::getStream(stream));
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/reduce.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/reduce.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Policy, class SrcPtr, typename ResType, class MaskPtr>
 __host__ void gridCalcSum_(const SrcPtr& src, GpuMat_<ResType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
 {
@@ -370,6 +373,8 @@ __host__ void gridCountNonZero(const SrcPtr& src, GpuMat_<ResType>& dst, Stream&
    gridCountNonZero_<DefaultGlobReducePolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/reduce_to_vec.hpp
@@ -59,6 +59,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <typename T> struct Sum : plus<T>
 {
    typedef T work_type;
@@ -225,6 +228,8 @@ __host__ void gridReduceToColumn(const SrcPtr& src, GpuMat_<ResType>& dst, Strea
    gridReduceToColumn_<Reductor, DefaultReduceToVecPolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/split_merge.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/split_merge.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Policy, class SrcPtrTuple, typename DstType, class MaskPtr>
 __host__ void gridMerge_(const SrcPtrTuple& src, GpuMat_<DstType>& dst, const MaskPtr& mask, Stream& stream = Stream::Null())
 {
@@ -579,6 +582,8 @@ __host__ void gridSplit(const SrcPtr& src, GlobPtrSz<DstType> (&dst)[COUNT], Str
    gridSplit_<DefaultSplitMergePolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/transform.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/transform.hpp
@@ -57,6 +57,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Policy, class SrcPtr, typename DstType, class UnOp, class MaskPtr>
 __host__ void gridTransformUnary_(const SrcPtr& src, GpuMat_<DstType>& dst, const UnOp& op, const MaskPtr& mask, Stream& stream = Stream::Null())
 {
@@ -536,6 +539,8 @@ __host__ void gridTransformTuple(const SrcPtr& src, const tuple< GlobPtrSz<D0>,
    gridTransformTuple_<DefaultTransformPolicy>(src, dst, op, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/grid/transpose.hpp
+++ b/modules/cudev/include/opencv2/cudev/grid/transpose.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class Policy, class SrcPtr, typename DstType>
 __host__ void gridTranspose_(const SrcPtr& src, GpuMat_<DstType>& dst, Stream& stream = Stream::Null())
 {
@@ -98,6 +101,8 @@ __host__ void gridTranspose(const SrcPtr& src, const GlobPtrSz<DstType>& dst, St
    gridTranspose_<DefaultTransposePolicy>(src, dst, stream);
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/constant.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/constant.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <typename T> struct ConstantPtr
 {
    typedef T   value_type;
@@ -88,6 +91,8 @@ template <typename T> struct PtrTraits< ConstantPtrSz<T> > : PtrTraitsBase< Cons
 {
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/deriv.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/deriv.hpp
@@ -53,6 +53,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // derivX
 template <class SrcPtr> struct DerivXPtr
@@ -388,6 +391,8 @@ template <int ksize, class SrcPtr> struct PtrTraits< LaplacianPtrSz<ksize, SrcPt
 {
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/extrapolation.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/extrapolation.hpp
@@ -52,6 +52,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // BrdConstant
 template <class SrcPtr> struct BrdConstant
@@ -214,6 +217,8 @@ __host__ BrdBase<BrdWrap, typename PtrTraits<SrcPtr>::ptr_type> brdWrap(const Sr
    return b;
 }
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/glob.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/glob.hpp
@@ -51,6 +51,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <typename T> struct GlobPtr
 {
    typedef T   value_type;
@@ -106,6 +109,8 @@ template <typename T> struct PtrTraits< GlobPtrSz<T> > : PtrTraitsBase<GlobPtrSz
 {
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/gpumat.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/gpumat.hpp
@@ -53,6 +53,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <typename T>
 class GpuMat_ : public GpuMat
 {
@@ -154,6 +157,8 @@ template <typename T> struct PtrTraits< GpuMat_<T> > : PtrTraitsBase<GpuMat_<T>,
 {
 };
+//! @}
 }}
 #include "detail/gpumat.hpp"

--- a/modules/cudev/include/opencv2/cudev/ptr2d/interpolation.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/interpolation.hpp
@@ -55,6 +55,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 // Nearest
 template <class SrcPtr> struct NearestInterPtr
@@ -380,6 +383,8 @@ template <class SrcPtr> struct PtrTraits< CommonAreaInterPtrSz<SrcPtr> > : PtrTr
 {
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/lut.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/lut.hpp
@@ -54,6 +54,9 @@
 namespace cv { namespace cudev {
+//! @addtogroup cudev
+//! @{
 template <class SrcPtr, class TablePtr> struct LutPtr
 {
    typedef typename PtrTraits<TablePtr>::value_type value_type;
@@ -95,6 +98,8 @@ template <class SrcPtr, class TablePtr> struct PtrTraits< LutPtrSz<SrcPtr, Table
 {
 };
+//! @}
 }}
 #endif
--- a/modules/cudev/include/opencv2/cudev/ptr2d/mask.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/mask.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/remap.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/remap.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/resize.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/resize.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/texture.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/texture.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/traits.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/traits.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/transform.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/transform.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/warping.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/warping.hpp
--- a/modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp
+++ b/modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp
--- a/modules/cudev/include/opencv2/cudev/util/atomic.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/atomic.hpp
--- a/modules/cudev/include/opencv2/cudev/util/limits.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/limits.hpp
--- a/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/saturate_cast.hpp
--- a/modules/cudev/include/opencv2/cudev/util/simd_functions.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/simd_functions.hpp
--- a/modules/cudev/include/opencv2/cudev/util/tuple.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/tuple.hpp
--- a/modules/cudev/include/opencv2/cudev/util/type_traits.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/type_traits.hpp
--- a/modules/cudev/include/opencv2/cudev/util/vec_math.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/vec_math.hpp
--- a/modules/cudev/include/opencv2/cudev/util/vec_traits.hpp
+++ b/modules/cudev/include/opencv2/cudev/util/vec_traits.hpp
--- a/modules/cudev/include/opencv2/cudev/warp/reduce.hpp
+++ b/modules/cudev/include/opencv2/cudev/warp/reduce.hpp
--- a/modules/cudev/include/opencv2/cudev/warp/scan.hpp
+++ b/modules/cudev/include/opencv2/cudev/warp/scan.hpp
--- a/modules/cudev/include/opencv2/cudev/warp/shuffle.hpp
+++ b/modules/cudev/include/opencv2/cudev/warp/shuffle.hpp
--- a/modules/cudev/include/opencv2/cudev/warp/warp.hpp
+++ b/modules/cudev/include/opencv2/cudev/warp/warp.hpp
--- a/modules/viz/include/opencv2/viz/types.hpp
+++ b/modules/viz/include/opencv2/viz/types.hpp