Purpose: updated the last section of chapter 10

3f2daa1d · Elena Fedotova · 60633fdd · 3f2daa1d · 3f2daa1d · 3f2daa1d
Commit 3f2daa1d authored Mar 31, 2011 by Elena Fedotova
10 changed files
--- a/modules/gpu/doc/data_structures.rst
+++ b/modules/gpu/doc/data_structures.rst
@@ -7,8 +7,7 @@ gpu::DevMem2D\_
 ---------------
 .. cpp:class:: gpu::DevMem2D\_
-This lightweight class encapsulates pitched memory on a GPU and is passed to nvcc-compiled code (CUDA kernels). Typically, it is used internally by OpenCV and by users who write device code. You can call its members from both host and device code. 
+This lightweight class encapsulates pitched memory on a GPU and is passed to nvcc-compiled code (CUDA kernels). Typically, it is used internally by OpenCV and by users who write device code. You can call its members from both host and device code. ::
-::
    template <typename T> struct DevMem2D_
    {
@@ -103,9 +102,10 @@ This is a base storage class for GPU memory with reference counting. Its interfa
 *   
    no expression templates technique support
-Beware that the latter limitation may lead to overloaded matrix operators that cause memory allocations. The ``GpuMat`` class is convertible to :cpp:class:`gpu::DevMem2D_` and :cpp:class:`gpu::PtrStep_` so it can be passed to directly to kernel.
+Beware that the latter limitation may lead to overloaded matrix operators that cause memory allocations. The ``GpuMat`` class is convertible to :cpp:class:`gpu::DevMem2D_` and :cpp:class:`gpu::PtrStep_` so it can be passed directly to kernel.
 **Note:**
 In contrast with :c:type:`Mat`, in most cases ``GpuMat::isContinuous() == false`` . This means that rows are aligned to size depending on the hardware. Single-row ``GpuMat`` is always a continuous matrix. ::
    class CV_EXPORTS GpuMat
@@ -141,6 +141,7 @@ In contrast with :c:type:`Mat`, in most cases ``GpuMat::isContinuous() == false`
 **Note:**
 You are not recommended to leave static or global ``GpuMat`` variables allocated, that is to rely on its destructor. The destruction order of such variables and CUDA context is undefined. GPU memory release function returns error if the CUDA context has been destroyed before.
 See Also:
@@ -156,14 +157,15 @@ This class with reference counting wraps special memory type allocation function
 :func:`Mat`-like but with additional memory type parameters.
 *
-    ``ALLOC_PAGE_LOCKED``:  Sets a page locked memory type, used commonly for fast and asynchronous upload/download data from/to GPU.
+    ``ALLOC_PAGE_LOCKED``:  Sets a page locked memory type, used commonly for fast and asynchronous uploading/downloading data from/to GPU.
 *
    ``ALLOC_ZEROCOPY``:  Specifies a zero copy memory allocation that enables mapping the host memory to GPU address space, if supported.
 *
    ``ALLOC_WRITE_COMBINED``:  Sets the write combined buffer that is not cached by CPU. Such buffers are used to supply GPU with data when GPU only reads it. The advantage is a better CPU cache utilization.
 **Note:**
-Allocation size of such memory types is usually limited. For more details please see "CUDA 2.2 Pinned Memory APIs" document or "CUDA_C Programming Guide".
+Allocation size of such memory types is usually limited. For more details, see "CUDA 2.2 Pinned Memory APIs" document or "CUDA C Programming Guide".
 ::
    class CV_EXPORTS CudaMem
@@ -212,7 +214,7 @@ gpu::CudaMem::createGpuMatHeader
 .. cpp:function:: GpuMat gpu::CudaMem::createGpuMatHeader() const
-    Maps CPU memory to GPU address space and creates :cpp:class:`gpu::GpuMat` header without reference counting for it. This can be done only if memory was allocated with ``ALLOC_ZEROCOPY`` flag and if it is supported by the hardware (laptops often share video and CPU memory, so address spaces can be mapped, and that eliminates an extra copy).
+    Maps CPU memory to GPU address space and creates the :cpp:class:`gpu::GpuMat` header without reference counting for it. This can be done only if memory was allocated with the ``ALLOC_ZEROCOPY`` flag and if it is supported by the hardware (laptops often share video and CPU memory, so address spaces can be mapped, which eliminates an extra copy).
 .. index:: gpu::CudaMem::canMapHostMemory
@@ -220,7 +222,7 @@ gpu::CudaMem::canMapHostMemory
 ----------------------------------
 .. cpp:function:: static bool gpu::CudaMem::canMapHostMemory()
-    Returns true if the current hardware supports address space mapping and ``ALLOC_ZEROCOPY`` memory allocation.
+    Returns ``true`` if the current hardware supports address space mapping and ``ALLOC_ZEROCOPY`` memory allocation.
 .. index:: gpu::Stream
@@ -228,9 +230,10 @@ gpu::Stream
 -----------
 .. cpp:class:: gpu::Stream
-This class encapsulated a queue of asynchronous calls. Some functions have overloads with the additional ``gpu::Stream`` parameter. The overloads do initialization work (allocate output buffers, upload constants, and so on), start the GPU kernel, and return before results are ready. You can check whether all operation are complete via :cpp:func:`gpu::Stream::queryIfComplete`. You can asynchronously upload/download data from/to page-locked buffers, using :cpp:class:`gpu::CudaMem` or :c:type:`Mat` header that points to a region of :cpp:class:`gpu::CudaMem`.
+This class encapsulates a queue of asynchronous calls. Some functions have overloads with the additional ``gpu::Stream`` parameter. The overloads do initialization work (allocate output buffers, upload constants, and so on), start the GPU kernel, and return before results are ready. You can check whether all operations are complete via :cpp:func:`gpu::Stream::queryIfComplete`. You can asynchronously upload/download data from/to page-locked buffers, using the :cpp:class:`gpu::CudaMem` or :c:type:`Mat` header that points to a region of :cpp:class:`gpu::CudaMem`.
 **Note:**
 Currently, you may face problems if an operation is enqueued twice with different data. Some functions use the constant GPU memory, and next call may update the memory before the previous one has been finished. But calling different operations asynchronously is safe because each operation has its own constant buffer. Memory copy/upload/download/set operations to the buffers you hold are also safe. 
 ::
@@ -275,7 +278,7 @@ gpu::Stream::queryIfComplete
 --------------------------------
 .. cpp:function:: bool gpu::Stream::queryIfComplete()
-    Returns true if the current stream queue is finished, otherwise false.
+    Returns ``true`` if the current stream queue is finished. Otherwise, it returns false.
 .. index:: gpu::Stream::waitForCompletion
@@ -283,7 +286,7 @@ gpu::Stream::waitForCompletion
 ----------------------------------
 .. cpp:function:: void gpu::Stream::waitForCompletion()
-    Blocks until all operations in the stream are complete.
+    Blocks ?? until all operations in the stream are complete.
 .. index:: gpu::StreamAccessor
@@ -318,14 +321,14 @@ gpu::createContinuous
    The following wrappers are also available:
-    *
-        .. cpp:function:: GpuMat gpu::createContinuous(int rows, int cols, int type)
+		* .. cpp:function:: GpuMat gpu::createContinuous(int rows, int cols, int type)
-    *
-        .. cpp:function:: void gpu::createContinuous(Size size, int type, GpuMat& m)
+		* .. cpp:function:: void gpu::createContinuous(Size size, int type, GpuMat& m)
-    *
-        .. cpp:function:: GpuMat gpu::createContinuous(Size size, int type)
+		* .. cpp:function:: GpuMat gpu::createContinuous(Size size, int type)
-    Matrix is called continuous if its elements are stored continuously, that is wuthout gaps in the end of each row.
+    Matrix is called continuous if its elements are stored continuously, that is without gaps in the end of each row.
 .. index:: gpu::ensureSizeIsEnough
@@ -341,13 +344,13 @@ gpu::ensureSizeIsEnough
    :param cols: Minimum desired number of columns.
-    :param size: rows and cols passed as a structure
+    :param size: Rows and coumns passed as a structure.
    :param type: Desired matrix type.
    :param m: Destination matrix.
-    The following wrapper is also available:
+    The following wrapper is also available: ??
--- a/modules/gpu/doc/feature_detection_and_description.rst
+++ b/modules/gpu/doc/feature_detection_and_description.rst
--- a/modules/gpu/doc/image_filtering.rst
+++ b/modules/gpu/doc/image_filtering.rst
--- a/modules/gpu/doc/image_processing.rst
+++ b/modules/gpu/doc/image_processing.rst
--- a/modules/gpu/doc/initalization_and_information.rst
+++ b/modules/gpu/doc/initalization_and_information.rst
@@ -17,7 +17,7 @@ gpu::setDevice
 ------------------
 .. cpp:function:: void setDevice(int device)
-    Sets a device and initializes it for the current thread. If call of this function is omitted, a default device is initialized at the fist GPU usage.
+    Sets a device and initializes it for the current thread. If the call of this function is omitted, a default device is initialized at the fist GPU usage.
    :param device: System index of a GPU device starting with 0.
@@ -27,7 +27,7 @@ gpu::getDevice
 ------------------
 .. cpp:function:: int getDevice()
-    Returns the current device index that was set by {gpu::getDevice} or initialized by default.
+    Returns the current device index that was set by ``{gpu::getDevice}`` or initialized by default.
 .. index:: gpu::GpuFeature
@@ -81,7 +81,7 @@ This class provides functionality for querying the specified GPU properties.
 .. Comment: two lines below look like a bug
 gpu::DeviceInfo::DeviceInfo
------------------------------- ``_``
+------------------------------- 
 .. cpp:function:: gpu::DeviceInfo::DeviceInfo()
 .. cpp:function:: gpu::DeviceInfo::DeviceInfo(int device_id)
@@ -144,7 +144,7 @@ gpu::DeviceInfo::supports
 -----------------------------
 .. cpp:function:: bool gpu::DeviceInfo::supports(GpuFeature feature)
-    Provides information on GPU feature support. This function returns true if the device has the specified GPU feature, otherwise returns false.
+    Provides information on GPU feature support. This function returns true if the device has the specified GPU feature. Otherwise, it returns false.
    :param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.
@@ -154,7 +154,7 @@ gpu::DeviceInfo::isCompatible
 ---------------------------------
 .. cpp:function:: bool gpu::DeviceInfo::isCompatible()
-    Checks the GPU module and device compatibility. This function returns true if the GPU module can be run on the specified device, otherwise returns false.
+    Checks the GPU module and device compatibility. This function returns ``true`` if the GPU module can be run on the specified device. Otherwise, it returns false.
 .. index:: gpu::TargetArchs
@@ -164,13 +164,13 @@ gpu::TargetArchs
 ----------------
 .. cpp:class:: gpu::TargetArchs
-This class provides a set of static methods to check what NVIDIA card architecture the GPU module was built for.
+This class provides a set of static methods to check what NVIDIA* card architecture the GPU module was built for.
 The following method checks whether the module was built with the support of the given feature:
-.. cpp:function:: static bool gpu::TargetArchs::builtWith(GpuFeature feature)
+	.. cpp:function:: static bool gpu::TargetArchs::builtWith(GpuFeature feature)
-    :param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.
+		:param feature: Feature to be checked. See :cpp:class:`gpu::GpuFeature`.
 There is a set of methods to check whether the module contains intermediate (PTX) or binary GPU code for the given architecture(s):
@@ -192,7 +192,7 @@ There is a set of methods to check whether the module contains intermediate (PTX
        :param minor: Minor compute capability version.
-    According to the CUDA C Programming Guide Version 3.2: "PTX code produced for some specific compute capability can always be compiled to binary code of greater or equal compute capability".
+According to the CUDA C Programming Guide Version 3.2: "PTX code produced for some specific compute capability can always be compiled to binary code of greater or equal compute capability".
 .. index:: gpu::MultiGpuManager
@@ -201,7 +201,7 @@ gpu::MultiGpuManager
 --------------------
 .. cpp:class:: gpu::MultiGpuManager
-Provides functionality for working with many GPUs. ::
+This class provides functionality for working with many GPUs. ::
    class MultiGpuManager
    {
@@ -229,7 +229,7 @@ gpu::MultiGpuManager::MultiGpuManager
 ----------------------------------------
 .. cpp:function:: gpu::MultiGpuManager::MultiGpuManager()
-    Creates multi GPU manager, but doesn't initialize it.
+    Creates a multi-GPU manager but does not initialize it.
@@ -239,7 +239,7 @@ gpu::MultiGpuManager::~MultiGpuManager
 ----------------------------------------
 .. cpp:function:: gpu::MultiGpuManager::~MultiGpuManager()
-    Releases multi GPU manager.
+    Releases a multi-GPU manager.
@@ -249,7 +249,7 @@ gpu::MultiGpuManager::init
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::init()
-    Initializes multi GPU manager.
+    Initializes a multi-GPU manager.
@@ -259,9 +259,9 @@ gpu::MultiGpuManager::gpuOn
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::gpuOn(int gpu_id)
-    Makes the given GPU active.
+    Activates the given GPU.
-    :param gpu_id: Index of the GPU device in system starting with 0.
+    :param gpu_id: System index of the GPU device starting with 0.
@@ -271,5 +271,5 @@ gpu::MultiGpuManager::gpuOff
 ----------------------------------------
 .. cpp:function:: void gpu::MultiGpuManager::gpuOff()
-    Finishes the piece of work on the current GPU.
+    Finishes a piece of work on the current GPU.
--- a/modules/gpu/doc/introduction.rst
+++ b/modules/gpu/doc/introduction.rst
@@ -6,35 +6,35 @@ GPU Module Introduction
 General Information
 -------------------
-The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVidia* CUDA Runtime API and supports only NVidia GPUs. The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others), ready to be used by the application developers.
+The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVIDIA* CUDA Runtime API and supports only NVIDIA GPUs. The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others), ready to be used by the application developers.
 The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.
-The GPU module depends on the CUDA Toolkit and NVidia Performance Primitives library (NPP). Make sure you have the latest versions of this software installed. You can download two libraries for all supported platforms from the NVidia site. To compile the OpenCV GPU module, you need a compiler compatible with the Cuda Runtime Toolkit.
+The GPU module depends on the CUDA Toolkit and NVIDIA Performance Primitives library (NPP). Make sure you have the latest versions of this software installed. You can download two libraries for all supported platforms from the NVIDIA site. To compile the OpenCV GPU module, you need a compiler compatible with the CUDA Runtime Toolkit.
 The OpenCV GPU module is designed for ease of use and does not require any knowledge of CUDA. Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest performance. It is helpful to understand the cost of various operations, what the GPU does, what the preferred data formats are, and so on. The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if your algorithm involves many simple operations, then, for the best possible performance, you may still need to write your own kernels to avoid extra write and read operations on the intermediate results.
-To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built. Otherwise, the module is still built, but at runtime all functions from the module throw
+To enable CUDA support, configure OpenCV using ``CMake`` with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built. Otherwise, the module is still built, but at runtime all functions from the module throw
 :func:`Exception` with ``CV_GpuNotSupported`` error code, except for
-:func:`gpu::getCudaEnabledDeviceCount()`. The latter function returns zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using
+:func:`gpu::getCudaEnabledDeviceCount()`. The latter function returns zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using the
-:func:`gpu::getCudaEnabledDeviceCount()` function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
+:func:`gpu::getCudaEnabledDeviceCount()` function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose an appropriate implementation (CPU or GPU) accordingly.
-Compilation for Different NVidia* Platforms
+Compilation for Different NVIDIA* Platforms
 -------------------------------------------
-NVidia* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
+NVIDIA* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
 At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails.
 By default, the OpenCV GPU module includes:
 *
-    Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN``     in CMake)
+    Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN``     in ``CMake``)
 *
-    PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX``     in CMake)
+    PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX``     in ``CMake``)
 This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT'ed. For devices with CC 1.0, no code is available and the functions throw
-:func:`Exception`. For platforms where JIT compilation is performed first, run is slow.
+:func:`Exception`. For platforms where JIT compilation is performed first, the run is slow.
 On a GPU with CC 1.0, you can still compile the GPU module and most of the functions will run flawlessly. To achieve this, add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"`` . The functions that cannot be run on CC 1.0 GPUs throw an exception.
@@ -44,7 +44,7 @@ You can always determine at runtime whether the OpenCV GPU-built binaries (or PT
 Threading and Multi-threading
 ------------------------------
-The OpenCV GPU module follows the CUDA Runtime API conventions regarding the multi-threaded programming. This means that for the first API call a CUDA context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as memory allocation, GPU code compilation, are associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread cannot be accessed by the other thread. Instead, for this other thread CUDA creates another context associated with it. In short, by default, different threads do not share resources.
+The OpenCV GPU module follows the CUDA Runtime API conventions regarding the multi-threaded programming. This means that for the first API call a CUDA context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as a memory allocation, GPU code compilation, are associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread cannot be accessed by the other thread. Instead, for this other thread CUDA creates another context associated with it. In short, by default, different threads do not share resources.
 But you can remove this limitation by using the CUDA Driver API (version 3.1 or later). You can retrieve context reference for one thread, attach it to another thread, and make it "current" for that thread. As a result, the threads can share memory and other resources. It is also possible to create a context explicitly before calling any GPU code and attach it to all the threads you want to share the resources with.
@@ -56,7 +56,7 @@ Utilizing Multiple GPUs
 In the current version, each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, you have to manually distribute the work between GPUs. Here are the two ways of utilizing multiple GPUs:
 *
-    If you only use synchronous functions, create several CPU threads (one per each GPU) and from within each thread create a CUDA context for the corresponding GPU using
+    If you use only synchronous functions, create several CPU threads (one per each GPU) and from within each thread create a CUDA context for the corresponding GPU using
    :func:`gpu::setDevice()`     or Driver API. Each of the threads will use the associated GPU.
 *

--- a/modules/gpu/doc/matrix_reductions.rst
+++ b/modules/gpu/doc/matrix_reductions.rst
@@ -17,7 +17,7 @@ gpu::meanStdDev
    :param stddev: Standard deviation value.
-See Also: :c:func:`meanStdDev` .
+See Also: :c:func:`meanStdDev` 
 .. index:: gpu::norm
@@ -37,7 +37,7 @@ gpu::norm
    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.
-See Also: :c:func:`norm`.
+See Also: :c:func:`norm`
 .. index:: gpu::sum
@@ -53,7 +53,7 @@ gpu::sum
    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.
-See Also: :c:func:`sum` .
+See Also: :c:func:`sum` 
 .. index:: gpu::absSum
@@ -103,9 +103,9 @@ gpu::minMax
    :param buf: Optional buffer to avoid extra memory allocations. It is resized automatically.
-	The Function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
+The function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
-See Also: :c:func:`minMaxLoc` .
+See Also: :c:func:`minMaxLoc` 
 .. index:: gpu::minMaxLoc
@@ -135,7 +135,7 @@ gpu::minMaxLoc
 	The function does not work with ``CV_64F`` images on GPU with the compute capability < 1.3.
-See Also: :c:func:`minMaxLoc` .
+See Also: :c:func:`minMaxLoc` 
 .. index:: gpu::countNonZero
@@ -153,4 +153,4 @@ gpu::countNonZero
 	The function does not work with ``CV_64F`` images on GPUs with the compute capability < 1.3.
-	See Also: :c:func:`countNonZero` .
+	See Also: :c:func:`countNonZero` 
--- a/modules/gpu/doc/object_detection.rst
+++ b/modules/gpu/doc/object_detection.rst
@@ -9,7 +9,7 @@ gpu::HOGDescriptor
 ------------------
 .. cpp:class:: gpu::HOGDescriptor
-     Provides a histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector.
+This class provides a histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector.
 ::
    struct CV_EXPORTS HOGDescriptor
@@ -61,7 +61,7 @@ gpu::HOGDescriptor
    }
-	Interfaces of all methods are kept similar to the ``CPU HOG`` descriptor and detector analogues as much as possible.
+Interfaces of all methods are kept similar to the ``CPU HOG`` descriptor and detector analogues as much as possible.
 .. index:: gpu::HOGDescriptor::HOGDescriptor
@@ -150,17 +150,17 @@ gpu::HOGDescriptor::detect
   vector<Point>\& found_locations, double hit_threshold=0,
   Size win_stride=Size(), Size padding=Size())
-    Performs object detection without a multi-scale window.
+	Performs object detection without a multi-scale window.
-    :param img: Source image.  ``CV_8UC1``  and  ``CV_8UC4`` types are supported for now.
+	:param img: Source image.  ``CV_8UC1``  and  ``CV_8UC4`` types are supported for now.
-    :param found_locations: Left-top corner points of detected objects boundaries.
+	:param found_locations: Left-top corner points of detected objects boundaries.
    :param hit_threshold: Threshold for the distance between features and SVM classifying plane. Usually it is 0 and should be specfied in the detector coefficients (as the last free coefficient). But if the free coefficient is omitted (which is allowed), you can specify it manually here.
-    :param win_stride: Window stride. It must be a multiple of block stride.
+	:param win_stride: Window stride. It must be a multiple of block stride.
-    :param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
+	:param padding: Mock parameter to keep the CPU interface compatibility. It must be (0,0).
 .. index:: gpu::HOGDescriptor::detectMultiScale
@@ -171,7 +171,7 @@ gpu::HOGDescriptor::detectMultiScale
   Size win_stride=Size(), Size padding=Size(),
   double scale0=1.05, int group_threshold=2)
-    Performs object detection with a multi-scale window.
+	Performs object detection with a multi-scale window.
    :param img: Source image. See  :func:`gpu::HOGDescriptor::detect`  for type limitations.
@@ -181,12 +181,11 @@ gpu::HOGDescriptor::detectMultiScale
    :param win_stride: Window stride. It must be a multiple of block stride.
-    :param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
+    :param padding: Mock parameter to keep the CPU interface compatibility. It must be (0,0).
    :param scale0: Coefficient of the detection window increase.
-    :param group_threshold: Coefficient to regulate the similarity threshold. When detected, some objects can be covered by many rectangles. 0 means not to perform grouping. See
+    :param group_threshold: Coefficient to regulate the similarity threshold. When detected, some objects can be covered by many rectangles. 0 means not to perform grouping. See  :func:`groupRectangles` .
-    :func:`groupRectangles` .
 .. index:: gpu::HOGDescriptor::getDescriptors
@@ -217,7 +216,7 @@ gpu::CascadeClassifier_GPU
 --------------------------
 .. cpp:class:: gpu::CascadeClassifier_GPU
-    The cascade classifier class used for object detection. 
+This cascade classifier class is used for object detection. 
 ::
    class CV_EXPORTS CascadeClassifier_GPU
@@ -252,7 +251,7 @@ gpu::CascadeClassifier_GPU::CascadeClassifier_GPU
    Loads the classifier from a file.
-    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
+    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the ``haar`` training application) and NVIDIA's ``nvbin`` are supported.
 .. index:: gpu::CascadeClassifier_GPU::empty
@@ -274,7 +273,7 @@ gpu::CascadeClassifier_GPU::load
    Loads the classifier from a file. The previous content is destroyed.
-    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
+    :param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the ``haar`` training application) and NVIDIA's ``nvbin`` are supported.
 .. index:: gpu::CascadeClassifier_GPU::release
@@ -294,7 +293,7 @@ gpu::CascadeClassifier_GPU::detectMultiScale
    :param image: Matrix of type  ``CV_8U``  containing an image where objects should be detected.
-    :param objects: Buffer to store detected objects (rectangles). If it is empty, it is allocated with the default size. If not empty, the function searches not more than N objects, where N = sizeof(objectsBufer's data)/sizeof(cv::Rect).
+    :param objects: Buffer to store detected objects (rectangles). If it is empty, it is allocated with the default size. If not empty, the function searches not more than N objects, where ``N = sizeof(objectsBufer's data)/sizeof(cv::Rect)``.
    :param scaleFactor: Value to specify how much the image size is reduced at each image scale.
@@ -302,7 +301,8 @@ gpu::CascadeClassifier_GPU::detectMultiScale
    :param minSize: Minimum possible object size. Objects smaller than that are ignored.
-    The function returns the number of detected objects, so you can retrieve them as in the following example: ::
+    The function returns the number of detected objects, so you can retrieve them as in the following example: 
+::
    gpu::CascadeClassifier_GPU cascade_gpu(...);
@@ -324,5 +324,5 @@ gpu::CascadeClassifier_GPU::detectMultiScale
    imshow("Faces", image_cpu);
-See Also: :c:func:`CascadeClassifier::detectMultiScale` .
+See Also: :c:func:`CascadeClassifier::detectMultiScale` 
--- a/modules/gpu/doc/operations_on_matrices.rst
+++ b/modules/gpu/doc/operations_on_matrices.rst
@@ -16,7 +16,7 @@ gpu::transpose
    :param dst: Destination matrix.
 See Also:
-:c:func:`transpose` .
+:c:func:`transpose` 
 .. index:: gpu::flip
@@ -40,7 +40,7 @@ gpu::flip
 See Also:
-:c:func:`flip` .
+:c:func:`flip` 
 .. index:: gpu::LUT
@@ -52,12 +52,12 @@ gpu::LUT
    :param src: Source matrix.  ``CV_8UC1``  and  ``CV_8UC3``  matrices are supported for now.
-    :param lut: Look-up table of 256 elements. Must be continuous, ``CV_8U`` matrix.
+    :param lut: Look-up table of 256 elements. It is a continuous ``CV_8U`` matrix.
-    :param dst: Destination matrix with the same depth as  ``lut``  and the same number of channels as  ``src`` .
+    :param dst: Destination matrix with the same depth as  ``lut``  and the same number of channels as  ``src``.
-See Also: :c:func:`LUT` .
+See Also: :c:func:`LUT` 
 .. index:: gpu::merge
@@ -81,7 +81,7 @@ gpu::merge
    :param stream: Stream for the asynchronous version.
-See Also: :c:func:`merge` .
+See Also: :c:func:`merge` 
 .. index:: gpu::split
@@ -103,7 +103,7 @@ gpu::split
    :param stream: Stream for the asynchronous version.
-See Also: :c:func:`split`.
+See Also: :c:func:`split`
 .. index:: gpu::magnitude
@@ -119,16 +119,16 @@ gpu::magnitude
    :param xy: Source complex matrix in the interleaved format (``CV_32FC2``).
-    :param x: Source matrix, containing real components (``CV_32FC1``).
+    :param x: Source matrix containing real components (``CV_32FC1``).
-    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
+    :param y: Source matrix containing imaginary components (``CV_32FC1``).
    :param magnitude: Destination matrix of float magnitudes (``CV_32FC1``).
    :param stream: Stream for the asynchronous version.
 See Also:
-:c:func:`magnitude` .
+:c:func:`magnitude` 
 .. index:: gpu::magnitudeSqr
@@ -144,9 +144,9 @@ gpu::magnitudeSqr
    :param xy: Source complex matrix in the interleaved format (``CV_32FC2``).
-    :param x: Source matrix, containing real components (``CV_32FC1``).
+    :param x: Source matrix containing real components (``CV_32FC1``).
-    :param y: Source matrix, containing imaginary components (``CV_32FC1``).
+    :param y: Source matrix containing imaginary components (``CV_32FC1``).
    :param magnitude: Destination matrix of float magnitude squares (``CV_32FC1``).
@@ -173,7 +173,7 @@ gpu::phase
    :param stream: Stream for the asynchronous version.
 See Also:
-:c:func:`phase` .
+:c:func:`phase` 
 .. index:: gpu::cartToPolar
@@ -198,7 +198,7 @@ gpu::cartToPolar
    :param stream: Stream for the asynchronous version.
 See Also:
-:c:func:`cartToPolar` .
+:c:func:`cartToPolar` 
 .. index:: gpu::polarToCart
@@ -223,4 +223,4 @@ gpu::polarToCart
    :param stream: Stream for the asynchronous version.
 See Also:
-:c:func:`polarToCart` .
+:c:func:`polarToCart` 
--- a/modules/gpu/doc/per_element_operations.rst
+++ b/modules/gpu/doc/per_element_operations.rst
@@ -15,13 +15,13 @@ gpu::add
    Computes a matrix-matrix or matrix-scalar sum.
-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.
    :param src2: Second source matrix or a scalar to be added to ``src1``.
    :param dst: Destination matrix with the same size and type as ``src1``.
-See Also: :c:func:`add`.
+See Also: :c:func:`add`
 .. index:: gpu::subtract
@@ -31,15 +31,15 @@ gpu::subtract
 .. cpp:function:: void gpu::subtract(const GpuMat& src1, const Scalar& src2, GpuMat& dst)
-    Computes matrix-matrix or matrix-scalar difference.
+    Computes a matrix-matrix or matrix-scalar difference.
-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.
    :param src2: Second source matrix or a scalar to be subtracted from ``src1``.
    :param dst: Destination matrix with the same size and type as ``src1``.
-See Also: :c:func:`subtract`.
+See Also: :c:func:`subtract`
@@ -53,13 +53,13 @@ gpu::multiply
    Computes a matrix-matrix or matrix-scalar per-element product.
-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.
    :param src2: Second source matrix or a scalar to be multiplied by ``src1`` elements.
    :param dst: Destination matrix with the same size and type as ``src1``.
-See Also: :c:func:`multiply`.
+See Also: :c:func:`multiply`
 .. index:: gpu::divide
@@ -72,7 +72,7 @@ gpu::divide
    Computes a matrix-matrix or matrix-scalar sum.
-    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1`` and ``CV_32FC1`` matrices are supported for now.
+    :param src1: First source matrix. ``CV_8UC1``, ``CV_8UC4``, ``CV_32SC1``, and ``CV_32FC1`` matrices are supported for now.
    :param src2: Second source matrix or a scalar. The ``src1`` elements are divided by it.
@@ -80,7 +80,7 @@ gpu::divide
 	This function, in contrast to :c:func:`divide`, uses a round-down rounding mode.
-See Also: :c:func:`divide`.
+See Also: :c:func:`divide`
@@ -96,7 +96,7 @@ gpu::exp
    :param dst: Destination matrix with the same size and type as ``src``.
-See Also: :c:func:`exp`.
+See Also: :c:func:`exp`
@@ -112,7 +112,7 @@ gpu::log
    :param dst: Destination matrix with the same size and type as ``src``.
-See Also: :c:func:`log`.
+See Also: :c:func:`log`
@@ -132,7 +132,7 @@ gpu::absdiff
    :param dst: Destination matrix with the same size and type as ``src1``.
-See Also: :c:func:`absdiff`.
+See Also: :c:func:`absdiff`
 .. index:: gpu::compare
@@ -157,7 +157,7 @@ gpu::compare
            * **CMP_LE:** ``src1(.) <= src2(.)``
            * **CMP_NE:** ``src1(.) != src2(.)``
-See Also: :c:func:`compare`.
+See Also: :c:func:`compare`
 .. index:: gpu::bitwise_not
@@ -168,7 +168,7 @@ gpu::bitwise_not
 .. cpp:function:: void gpu::bitwise_not(const GpuMat& src, GpuMat& dst, const GpuMat& mask, const Stream& stream)
-    Performs per-element bitwise inversion.
+    Performs a per-element bitwise inversion.
    :param src: Source matrix.
@@ -188,7 +188,7 @@ gpu::bitwise_or
 .. cpp:function:: void gpu::bitwise_or(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
-    Performs per-element bitwise disjunction of two matrices.
+    Performs a per-element bitwise disjunction of two matrices.
    :param src1: First source matrix.
@@ -210,7 +210,7 @@ gpu::bitwise_and
 .. cpp:function:: void gpu::bitwise_and(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
-    Performs per-element bitwise conjunction of two matrices.
+    Performs a per-element bitwise conjunction of two matrices.
    :param src1: First source matrix.
@@ -232,7 +232,7 @@ gpu::bitwise_xor
 .. cpp:function:: void gpu::bitwise_xor(const GpuMat& src1, const GpuMat& src2, GpuMat& dst, const GpuMat& mask, const Stream& stream)
-    Performs per-element bitwise "exclusive or" of two matrices.
+    Performs a per-element bitwise "exclusive or" operation of two matrices.
    :param src1: First source matrix.
@@ -268,7 +268,7 @@ gpu::min
    :param stream: Stream for the asynchronous version.
-See Also: :c:func:`min`.
+See Also: :c:func:`min`
@@ -294,4 +294,4 @@ gpu::max
    :param stream: Stream for the asynchronous version.
-See Also: :c:func:`max`.
+See Also: :c:func:`max`