@@ -3,15 +3,14 @@ Feature Detection and Description
.. highlight:: cpp
.. index:: gpu::SURF_GPU
gpu::SURF_GPU
-------------
.. cpp:class:: gpu::SURF_GPU
Class for extracting Speeded Up Robust Features from an image. ::
Class used for extracting Speeded Up Robust Features (SURF) from an image.
::
class SURF_GPU : public CvSURFParams
{
...
...
@@ -20,8 +19,7 @@ Class for extracting Speeded Up Robust Features from an image. ::
SURF_GPU();
//! the full constructor taking all the necessary parameters
explicit SURF_GPU(double _hessianThreshold, int _nOctaves=4,
int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f,
bool _upright = false);
int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f);
//! returns the descriptor size in float's (64 or 128)
int descriptorSize() const;
...
...
@@ -61,8 +59,6 @@ Class for extracting Speeded Up Robust Features from an image. ::
//! max keypoints = keypointsRatio * img.size().area()
float keypointsRatio;
bool upright;
GpuMat sum, mask1, maskSum, intBuffer;
...
...
@@ -73,15 +69,15 @@ Class for extracting Speeded Up Robust Features from an image. ::
GpuMat keypointsBuffer;
};
The class ``SURF_GPU`` implements Speeded Up Robust Features descriptor. There is fast multi-scale Hessian keypoint detector that can be used to find the keypoints (which is the default option), but the descriptors can be also computed for the user-specified keypoints. Supports only 8 bit grayscale images.
The class ``SURF_GPU`` can store results to GPU and CPU memory and provides functions to convert results between CPU and GPU version (``uploadKeypoints``, ``downloadKeypoints``, ``downloadDescriptors``). CPU results has the same format as :c:type:`SURF` results. GPU results are stored to :cpp:class:`gpu::GpuMat`. ``keypoints`` matrix is one row matrix with ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian``. ``descriptors`` matrix is ``nFeatures`` :math:`\times` ``descriptorSize`` matrix with ``CV_32FC1`` type.
The class ``SURF_GPU`` uses some buffers and provides access to it. All buffers can be safely released between function calls.
The class ``SURF_GPU`` implements Speeded Up Robust Features descriptor. There is a fast multi-scale Hessian keypoint detector that can be used to find the keypoints (which is the default option). But the descriptors can also be computed for the user-specified keypoints. Only 8 bit grayscale images are supported.
See also: :c:type:`SURF`.
The class ``SURF_GPU`` can store results in the GPU and CPU memory. It provides functions to convert results between CPU and GPU version ( ``uploadKeypoints``,``downloadKeypoints``,``downloadDescriptors`` ). The format of CPU results is the same as ``SURF`` results. GPU results are stored in ``GpuMat`` . The ``keypoints`` matrix is one-row matrix of the ``CV_32FC6`` type. It contains 6 float values per feature: ``x, y, laplacian, size, dir, hessian`` . The ``descriptors`` matrix is
:math:`\texttt{nFeatures} \times \texttt{descriptorSize}` matrix with ``CV_32FC1`` type.
The class ``SURF_GPU`` uses some buffers and provides access to it. All buffers can be safely released between function calls.
See Also: :c:type:`SURF`.
.. index:: gpu::BruteForceMatcher_GPU
...
...
@@ -104,7 +100,7 @@ Brute-force descriptor matcher. For each descriptor in the first set, this match
// Clear train descriptors collection.
void clear();
// Return true if there are not train descriptors in collection.
// Return true if there are no train descriptors in collection.
bool empty() const;
// Return true if the matcher supports mask in match methods.
...
...
@@ -173,27 +169,23 @@ Brute-force descriptor matcher. For each descriptor in the first set, this match
std::vector<GpuMat> trainDescCollection;
};
The class ``BruteForceMatcher_GPU`` has the similar interface to class :c:type:`DescriptorMatcher`. It has two groups of match methods: for matching descriptors of one image with other image or with image set. Also all functions have alternative: save results to GPU memory or to CPU memory.
``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only ``L1<float>`` and ``L2<float>`` distance types.
The class ``BruteForceMatcher_GPU`` has the interface similar to class :c:type:`DescriptorMatcher`. It has two groups of ``match`` methods: for matching descriptors of one image with another image or with an image set. Also, all functions have an alternative: save results to the GPU memory or to the CPU memory. ``Distance`` template parameter is kept for CPU/GPU interfaces similarity. ``BruteForceMatcher_GPU`` supports only ``L1<float>`` and ``L2<float>`` distance types.
See also: :c:type:`DescriptorMatcher`, :c:type:`BruteForceMatcher`.
Finds the best match for each query descriptor. Results will be stored to GPU memory.
:param queryDescs: Query set of descriptors.
:param trainDescs: Train set of descriptors. This will not be added to train descriptors collection stored in class object.
:param trainIdx: One row ``CV_32SC1`` matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``mask`` it will contain -1.
:param distance: One row ``CV_32FC1`` matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``mask`` it will contain ``FLT_MAX``.
:param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
Finds the best match for each query descriptor. Results are stored in the GPU memory.
:param queryDescs: Query set of descriptors.
:param trainDescs: Training set of descriptors. It is not added to train descriptors collection stored in the class object.
:param trainIdx: The output single-row ``CV_32SC1`` matrix that contains the best train index for each query. If some query descriptors are masked out in ``mask`` , it contains -1.
:param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``mask``, it will contains ``FLT_MAX``.
:param mask: Mask specifying permissible matches between the input query and train matrices of descriptors.
Find the best match for each query descriptor from train collection. Results will be stored to GPU memory.
:param queryDescs: Query set of descriptors.
:param trainCollection: :cpp:class:`gpu::GpuMat` containing train collection. It can be obtained from train descriptors collection that was set using ``add`` method by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it can contain user defined collection. It must be one row matrix, each element is a :cpp:class:`gpu::DevMem2D_` that points to one train descriptors matrix.
:param trainIdx: One row ``CV_32SC1`` matrix. Will contain the best train index for each query. If some query descriptors are masked out in ``maskCollection`` it will contain -1.
:param imgIdx: One row ``CV_32SC1`` matrix. Will contain image train index for each query. If some query descriptors are masked out in ``maskCollection`` it will contain -1.
:param distance: One row ``CV_32FC1`` matrix. Will contain the best distance for each query. If some query descriptors are masked out in ``maskCollection`` it will contain ``FLT_MAX``.
:param maskCollection: :cpp:class:`gpu::GpuMat` containing set of masks. It can be obtained from ``vector<GpuMat>`` by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it can contain user defined mask set. It must be empty matrix or one row matrix, each element is a :cpp:class:`gpu::PtrStep_` that points to one mask.
Finds the best match for each query descriptor from train collection. Results are stored in the GPU memory.
:param queryDescs: Query set of descriptors.
:param trainCollection: :cpp:class:`gpu::GpuMat` containing train collection. It can be obtained from the collection of train descriptors that was set using the ``add`` method by :cpp:func:`gpu::BruteForceMatcher_GPU::makeGpuCollection`. Or it may contain a user-defined collection. This is a one-row matrix where each element is ``DevMem2D`` pointing out to a matrix of train descriptors.
:param trainIdx: The output single-row ``CV_32SC1`` matrix that contains the best train index for each query. If some query descriptors are masked out in ``maskCollection`` , it contains -1.
:param imgIdx: The output single-row ``CV_32SC1`` matrix that contains image train index for each query. If some query descriptors are masked out in ``maskCollection`` , it contains -1.
:param distance: The output single-row ``CV_32FC1`` matrix that contains the best distance for each query. If some query descriptors are masked out in ``maskCollection`` , it contains ``FLT_MAX``.
:param maskCollection: ``GpuMat`` containing a set of masks. It can be obtained from ``std::vector<GpuMat>`` by ?? or it may contain a user-defined mask set. This is an empty matrix or one-row matrix where each element is a ``PtrStep`` that points to one mask.
Downloads ``trainIdx``, ``imgIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::matchSingle` or :cpp:func:`gpu::BruteForceMatcher_GPU::matchCollection` to CPU vector with :c:type:`DMatch`.
Finds the k best matches for each descriptor from a query set with train descriptors. Found k (or less if not possible) matches are returned in distance increasing order.
Finds the k best matches for each descriptor from a query set with train descriptors. Found k (or less if not possible) matches are returned in distance increasing order. Results will be stored to GPU memory.
:param queryDescs: Query set of descriptors.
Finds the k best matches for each descriptor from a query set with train descriptors. The function returns detected k (or less if not possible) matches in the increasing order by distance.
:param trainDescs; Train set of descriptors. This will not be added to train descriptors collection stored in class object.
:param trainIdx: Matrix with ``nQueries`` :math:`\times` ``k`` size and ``CV_32SC1`` type. ``trainIdx.at<int>(queryIdx, i)`` will contain index of the i'th best trains. If some query descriptors are masked out in ``mask`` it will contain -1.
See Also:
:func:`DescriptorMatcher::knnMatch` .
:param distance: Matrix with ``nQuery`` :math:`\times` ``k`` and ``CV_32FC1`` type. Will contain distance for each query and the i'th best trains. If some query descriptors are masked out in ``mask`` it will contain ``FLT_MAX``.
.. index:: gpu::BruteForceMatcher_GPU::knnMatch
:param allDist: Buffer to store all distances between query descriptors and train descriptors. It will have ``nQuery`` :math:`\times` ``nTrain`` size and ``CV_32FC1`` type. ``allDist.at<float>(queryIdx, trainIdx)`` will contain ``FLT_MAX``, if ``trainIdx`` is one from k best, otherwise it will contain distance between ``queryIdx`` and ``trainIdx`` descriptors.
:param k: Number of the best matches will be found per each query descriptor (or less if it's not possible).
Finds the k best matches for each descriptor from a query set with train descriptors. The function returns detected k (or less if not possible) matches in the increasing order by distance. Results will be stored in the GPU memory.
:param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
:param queryDescs: Query set of descriptors.
:param trainDescs: Training set of descriptors. It is not be added to train descriptors collection stored in the class object.
:param trainIdx: The output matrix of ``queryDescs.rows x k`` size and ``CV_32SC1`` type. ``trainIdx.at<int>(i, j)`` contains an index of the j-th best match for the i-th query descriptor. If some query descriptors are masked out in ``mask``, it will contains -1.
:param distance: The output matrix of ``queryDescs.rows x k`` size and ``CV_32FC1`` type. ``distance.at<float>(i, j)`` contains a distance from the j-th best match for the i-th query descriptor to the query descriptor. If some query descriptors are masked out in ``mask``, it will contain ``FLT_MAX``.
:param allDist: The floating-point matrix of the size ``queryDescs.rows x trainDescs.rows``. This is a buffer to store all distances between each query descriptors and each train descriptor. On output, ``allDist.at<float>(queryIdx, trainIdx)`` will contain ``FLT_MAX`` if ``trainIdx`` is one from k best.
See also: :c:func:`DescriptorMatcher::knnMatch`.
:param k: Number of the best matches per each query descriptor (or less if it is not possible).
:param mask: Mask specifying permissible matches between the input query and train matrices of descriptors.
Downloads ``trainIdx`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::knnMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
For each query descriptor, finds the best matches with a distance less than a given threshold. The function returns detected matches in the increasing order by distance.
Finds the best matches for each query descriptor which have distance less than given threshold. Found matches are returned in distance increasing order.
This function works only on devices with the compute capability
:math:`>=` 1.1.
Finds the best matches for each query descriptor which have distance less than given threshold. Results will be stored to GPU memory. Results are not sorted by distance increasing order.
:param queryDescs: Query set of descriptors.
See Also:
:func:`DescriptorMatcher::radiusMatch` .
:param trainDescs: Train set of descriptors. This will not be added to train descriptors collection stored in class object.
:param trainIdx: ``trainIdx.at<int>(queryIdx, i)`` will contain i'th train index ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``. If ``trainIdx`` is empty, it will be created with size ``nQuery`` :math:`\times` ``nTrain``. Or it can be allocated by user (it must have ``nQuery`` rows and ``CV_32SC1`` type). Cols can be less than ``nTrain``, but it can be that matcher won't find all matches, because it haven't enough memory to store results.
:param nMatches: ``nMatches.at<unsigned int>(0, queryIdx)`` will contain matches count for ``queryIdx``. Carefully, ``nMatches`` can be greater than ``trainIdx.cols`` - it means that matcher didn't find all matches, because it didn't have enough memory.
For each query descriptor, finds the best matches with a distance less than a given threshold (``maxDistance``). The results are stored in the GPU memory.
:param distance: ``distance.at<int>(queryIdx, i)`` will contain i'th distance ``(i < min(nMatches.at<unsigned int>(0, queryIdx), trainIdx.cols)``. If ``trainIdx`` is empty, it will be created with size ``nQuery`` :math:`\times` ``nTrain``. Otherwise it must be also allocated by user (it must have the same size as ``trainIdx`` and ``CV_32FC1`` type).
:param queryDescs: Query set of descriptors.
:param trainDescs: Training set of descriptors.It is not added to train descriptors collection stored in the class object.
:param trainIdx: ``trainIdx.at<int>(i, j)`` is the index of j-th training descriptor which is close enough to i-th query descriptor. If ``trainIdx`` is empty, it is created with the size ``queryDescs.rows x trainDescs.rows``. When the matrix is pre-allocated, it can have less than ``trainDescs.rows`` columns. Then the function will return as many matches for each query descriptors as fit into the matrix.
:param nMatches: ``nMatches.at<unsigned int>(0, i)`` contains the number of matching descriptors for the i-th query descriptor. The value can be larger than ``trainIdx.cols`` - it means that the function could not store all the matches since it did not have enough memory.
:param distance: ``distance.at<int>(i, j)`` is the distance between the j-th match for the j-th query descriptor and the this very query descriptor. The matrix will have ``CV_32FC1`` type and the same size as ``trainIdx``.
:param maxDistance: Distance threshold.
:param mask: Mask specifying permissible matches between input query and train matrices of descriptors.
**Please note:** This function works only on devices with Compute Capability :math:`>=` 1.1.
See also: :c:func:`DescriptorMatcher::radiusMatch`.
:param mask: Mask specifying permissible matches between the input query and train matrices of descriptors.
In contrast to :cpp:func:`gpu::BruteForceMatcher_GPU::knnMatch`, here the results are not sorted by the distance. This function works only on devices with the compute capability >= 1.1.
Downloads ``trainIdx``, ``nMatches`` and ``distance`` matrices obtained via :cpp:func:`gpu::BruteForceMatcher_GPU::radiusMatch` to CPU vector with :c:type:`DMatch`. If ``compactResult`` is true ``matches`` vector will not contain matches for fully masked out query descriptors.
Functions and classes described in this section are used to perform various linear or non-linear filtering operations on 2D images.
See also: :ref:`ImageFiltering`.
.. index:: gpu::BaseRowFilter_GPU
gpu::BaseRowFilter_GPU
...
...
@@ -28,9 +24,8 @@ The base class for linear or non-linear filters that processes rows of 2D arrays
int ksize, anchor;
};
**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
**Note:** This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
.. index:: gpu::BaseColumnFilter_GPU
...
...
@@ -49,9 +44,9 @@ The base class for linear or non-linear filters that processes columns of 2D arr
int ksize, anchor;
};
**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
**Note:**
This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
.. index:: gpu::BaseFilter_GPU
...
...
@@ -72,9 +67,8 @@ The base class for non-separable 2D filters. ::
};
**Please note:** This class doesn't allocate memory for destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
**Note:**
This class does not allocate memory for a destination image. Usually this class is used inside :cpp:class:`gpu::FilterEngine_GPU`.
.. index:: gpu::FilterEngine_GPU
...
...
@@ -93,7 +87,9 @@ The base class for Filter Engine. ::
Rect roi = Rect(0,0,-1,-1)) = 0;
};
The class can be used to apply an arbitrary filtering operation to an image. It contains all the necessary intermediate buffers. Pointers to the initialized ``FilterEngine_GPU`` instances are returned by various ``create*Filter_GPU`` functions, see below, and they are used inside high-level functions such as :cpp:func:`gpu::filter2D`, :cpp:func:`gpu::erode`, :cpp:func:`gpu::Sobel` etc.
The class can be used to apply an arbitrary filtering operation to an image. It contains all the necessary intermediate buffers. Pointers to the initialized ``FilterEngine_GPU`` instances are returned by various ``create*Filter_GPU`` functions (see below), and they are used inside high-level functions such as
:func:`gpu::filter2D`,:func:`gpu::erode`,:func:`gpu::Sobel` , and others.
By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary memory allocation for intermediate buffers and get much better performance: ::
...
...
@@ -117,31 +113,27 @@ By using ``FilterEngine_GPU`` instead of functions you can avoid unnecessary mem
// Release buffers only once
filter.release();
``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``, ``FilterEngine_GPU`` processes inner region of image (``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)``), because some filters doesn't check if indices are outside the image for better perfomace. See below which filters supports processing the whole image and which not and image type limitations.
``FilterEngine_GPU`` can process a rectangular sub-region of an image. By default, if ``roi == Rect(0,0,-1,-1)``,``FilterEngine_GPU`` processes the inner region of an image ( ``Rect(anchor.x, anchor.y, src_size.width - ksize.width, src_size.height - ksize.height)`` ), because some filters do not check whether indices are outside the image for better perfomance. See below to understand which filters support processing the whole image and which do not and identify image type limitations.
**Please note:** The GPU filters doesn't support the in-place mode.
**Note:** The GPU filters do not support the in-place mode.
.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getBoxFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1, -1))
Creates a normalized 2D box filter.
Creates normalized 2D box filter.
.. cpp:function:: Ptr<BaseFilter_GPU> getBoxFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1, -1))
:param srcType: Input image type. Supports ``CV_8UC1`` and ``CV_8UC4``.
...
...
@@ -221,13 +207,11 @@ gpu::createBoxFilter_GPU
:param ksize: Kernel size.
:param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
See also: :c:func:`boxFilter`.
:param anchor: Anchor point. The default value ``Point(-1, -1)`` means that the anchor is at the kernel center.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
See Also: :c:func:`boxFilter`.
.. index:: gpu::boxFilter
...
...
@@ -237,43 +221,39 @@ gpu::boxFilter
Smooths the image using the normalized box filter.
:param src: Input image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
:param src: Input image. ``CV_8UC1`` and ``CV_8UC4`` source types are supported.
:param dst: Output image type. Will have the same size and the same type as ``src``.
:param dst: Output image type. The size and type is the same as ``src``.
:param ddepth: Output image depth. Support only the same as source depth (``CV_8U``) or -1 what means use source depth.
:param ddepth: Output image depth. If -1, the output image will have the same depth as the input one. The only values allowed here are ``CV_8U`` and -1.
:param ksize: Kernel size.
:param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
See also: :c:func:`boxFilter`, :cpp:func:`gpu::createBoxFilter_GPU`.
:param anchor: Anchor point. The default value ``Point(-1, -1)`` means that the anchor is at the kernel center.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getLinearFilter_GPU(int srcType, int dstType, const Mat& kernel, const Size& ksize, Point anchor = Point(-1, -1))
Creates the non-separable linear filter.
:param srcType: Input image type. ``CV_8UC1`` and ``CV_8UC4`` types are supported.
:param srcType: Input image type. Supports ``CV_8UC1`` and ``CV_8UC4``.
:param dstType: Output image type. Supports only the same as source type.
:param dstType: Output image type. The same type as ``src`` is supported.
:param kernel: 2D array of filter coefficients. This filter works with integers kernels, if ``kernel`` has ``float`` or ``double`` type it will be used fixed point arithmetic.
:param kernel: 2D array of filter coefficients. Floating-point coefficients will be converted to fixed-point representation before the actual processing.
:param ksize: Kernel size.
:param anchor: Anchor point. The default value ``(-1, -1)`` means that the anchor is at the kernel center.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
See also: :c:func:`createLinearFilter`.
:param anchor: Anchor point. The default value Point(-1, -1) means that the anchor is at the kernel center.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
See Also: :c:func:`createLinearFilter`.
.. index:: gpu::filter2D
...
...
@@ -417,23 +388,21 @@ gpu::filter2D
-----------------
.. cpp:function:: void gpu::filter2D(const GpuMat& src, GpuMat& dst, int ddepth, const Mat& kernel, Point anchor=Point(-1,-1))
Applies non-separable 2D linear filter to image.
:param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
:param dst: Destination image. It will have the same size and the same number of channels as ``src``.
Applies the non-separable 2D linear filter to an image.
:param ddepth: The desired depth of the destination image. If it is negative, it will be the same as ``src.depth()``. Supports only the same depth as source image.
:param src: Source image. ``CV_8UC1`` and ``CV_8UC4`` source types are supported.
:param kernel: 2D array of filter coefficients. This filter works with integers kernels, if ``kernel`` has ``float`` or ``double`` type it will use fixed point arithmetic.
:param dst: Destination image. The size and the number of channels is the same as ``src`` .
:param anchor: Anchor of the kernel that indicates the relative position of a filtered point within the kernel. The anchor should lie within the kernel. The special default value ``(-1,-1)`` means that the anchor is at the kernel center.
:param ddepth: Desired depth of the destination image. If it is negative, it is the same as ``src.depth()`` . It supports only the same depth as the source image depth.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
:param kernel: 2D array of filter coefficients. This filter works with integers kernels. If ``kernel`` has a ``float`` or ``double`` type, it uses fixed-point arithmetic.
See also: :c:func:`filter2D`, :cpp:func:`gpu::createLinearFilter_GPU`.
:param anchor: Anchor of the kernel that indicates the relative position of a filtered point within the kernel. The anchor resides within the kernel. The special default value (-1,-1) means that the anchor is at the kernel center.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
See Also: :c:func:`filter2D`.
.. index:: gpu::Laplacian
...
...
@@ -441,23 +410,22 @@ gpu::Laplacian
------------------
.. cpp:function:: void gpu::Laplacian(const GpuMat& src, GpuMat& dst, int ddepth, int ksize = 1, double scale = 1)
Applies Laplacian operator to image.
Applies the Laplacian operator to an image.
:param src: Source image. Supports ``CV_8UC1`` and ``CV_8UC4`` source types.
:param src: Source image. ``CV_8UC1`` and ``CV_8UC4`` source types are supported.
:param dst: Destination image; will have the same size and the same number of channels as ``src``.
:param dst: Destination image. The size and number of channels is the same as ``src`` .
:param ddepth: Desired depth of the destination image. Supports only tha same depth as source image depth.
:param ddepth: Desired depth of the destination image. It supports only the same depth as the source image depth.
:param ksize: Aperture size used to compute the second-derivative filters, see :c:func:`getDerivKernels`. It must be positive and odd. Supports only ``ksize`` = 1 and ``ksize`` = 3.
:param ksize: Aperture size used to compute the second-derivative filters (see :c:func:`getDerivKernels`). It must be positive and odd. Only ``ksize`` = 1 and ``ksize`` = 3 are supported.
:param scale: Optional scale factor for the computed Laplacian values (by default, no scaling is applied, see :c:func:`getDerivKernels`).
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
See also: :c:func:`Laplacian`, :cpp:func:`gpu::filter2D`.
:param scale: Optional scale factor for the computed Laplacian values. By default, no scaling is applied (see :c:func:`getDerivKernels` ).
**Note:**
This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
See Also: :c:func:`Laplacian`,:func:`gpu::filter2D` .
.. index:: gpu::getLinearRowFilter_GPU
...
...
@@ -465,23 +433,23 @@ gpu::getLinearRowFilter_GPU
-------------------------------
.. cpp:function:: Ptr<BaseRowFilter_GPU> gpu::getLinearRowFilter_GPU(int srcType, int bufType, const Mat& rowKernel, int anchor = -1, int borderType = BORDER_CONSTANT)
Creates primitive row filter with the specified kernel.
Creates a primitive row filter with the specified kernel.
:param srcType: Source array type. Only ``CV_8UC1``, ``CV_8UC4``, ``CV_16SC1``, ``CV_16SC2``, ``CV_32SC1``, ``CV_32FC1`` source types are supported.
:param bufType: Inermediate buffer type; must have as many channels as ``srcType``.
:param bufType: Intermediate buffer type with as many channels as ``srcType`` .
:param rowKernel: Filter coefficients.
:param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
:param borderType: Pixel extrapolation method; see :c:func:`borderInterpolate`. About limitation see below.
:param anchor: Anchor position within the kernel. Negative values mean that the anchor is positioned at the aperture center.
There are two version of algorithm: NPP and OpenCV. NPP calls when ``srcType == CV_8UC1`` or ``srcType == CV_8UC4`` and ``bufType == srcType``, otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
See also: :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
:param borderType: Pixel extrapolation method. For details, see :c:func:`borderInterpolate`. For details on limitations, see below.
There are two versions of the algorithm: NPP and OpenCV.
* NPP version is called when ``srcType == CV_8UC1`` or ``srcType == CV_8UC4`` and ``bufType == srcType`` . Otherwise, the OpenCV version is called. NPP supports only ``BORDER_CONSTANT`` border type and does not check indices outside the image.
* OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE``, and ``BORDER_CONSTANT`` border types. It checks indices outside the image.
:param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
:param borderType: Pixel extrapolation method; see :c:func:`borderInterpolate`. About limitation see below.
:param anchor: Anchor position within the kernel. Negative values mean that the anchor is positioned at the aperture center.
There are two version of algorithm: NPP and OpenCV. NPP calls when ``dstType == CV_8UC1`` or ``dstType == CV_8UC4`` and ``bufType == dstType``, otherwise calls OpenCV version. NPP supports only ``BORDER_CONSTANT`` border type and doesn't check indices outside image. OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE`` and ``BORDER_CONSTANT`` border types and checks indices outside image.
:param borderType: Pixel extrapolation method. For details, see :c:func:`borderInterpolate` . For details on limitations, see below.
There are two versions of the algorithm: NPP and OpenCV.
* NPP version is called when ``dstType == CV_8UC1`` or ``dstType == CV_8UC4`` and ``bufType == dstType`` . Otherwise, the OpenCV version is called. NPP supports only ``BORDER_CONSTANT`` border type and does not check indices outside the image.
* OpenCV version supports only ``CV_32F`` buffer depth and ``BORDER_REFLECT101``,``BORDER_REPLICATE``, and ``BORDER_CONSTANT`` border types. It checks indices outside image.
See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :c:func:`createSeparableLinearFilter`.
.. index:: gpu::createSeparableLinearFilter_GPU
gpu::createSeparableLinearFilter_GPU
----------------------------------------
.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createSeparableLinearFilter_GPU(int srcType, int dstType, const Mat& rowKernel, const Mat& columnKernel, const Point& anchor = Point(-1,-1), int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createSeparableLinearFilter_GPU(int srcType, int dstType, const Mat& rowKernel, const Mat& columnKernel, const Point& anchor = Point(-1,-1), int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
:param anchor: Anchor position within the kernel; negative values mean that anchor is positioned at the aperture center.
:param anchor: Anchor position within the kernel. Negative values mean that anchor is positioned at the aperture center.
:param rowBorderType, columnBorderType: Pixel extrapolation method in the horizontal and the vertical directions; see :c:func:`borderInterpolate`. About limitation see :cpp:func:`gpu::getLinearRowFilter_GPU`, cpp:func:`gpu::getLinearColumnFilter_GPU`.
See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
:param rowBorderType, columnBorderType: Pixel extrapolation method in the horizontal and vertical directions For details, see :c:func:`borderInterpolate`. For details on limitations, see :cpp:func:`gpu::getLinearRowFilter_GPU`, cpp:func:`gpu::getLinearColumnFilter_GPU`.
See also: :cpp:func:`gpu::getLinearRowFilter_GPU`, :cpp:func:`gpu::getLinearColumnFilter_GPU`, :c:func:`createSeparableLinearFilter`.
.. index:: gpu::sepFilter2D
...
...
@@ -535,180 +502,166 @@ gpu::sepFilter2D
--------------------
.. cpp:function:: void gpu::sepFilter2D(const GpuMat& src, GpuMat& dst, int ddepth, const Mat& kernelX, const Mat& kernelY, Point anchor = Point(-1,-1), int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
:param ddepth: Destination image depth. ``CV_8U``, ``CV_16S``, ``CV_32S``, and ``CV_32F`` are supported.
:param kernelX, kernelY: Filter coefficients.
:param anchor: Anchor position within the kernel; The default value ``(-1, 1)`` means that the anchor is at the kernel center.
:param anchor: Anchor position within the kernel. The default value ``(-1, 1)`` means that the anchor is at the kernel center.
:param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
:param rowBorderType, columnBorderType: Pixel extrapolation method. For details, see :c:func:`borderInterpolate`.
See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`sepFilter2D`.
.. index:: gpu::createDerivFilter_GPU
gpu::createDerivFilter_GPU
------------------------------
.. cpp:function:: Ptr<FilterEngine_GPU> gpu::createDerivFilter_GPU(int srcType, int dstType, int dx, int dy, int ksize, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
.. cpp:function:: Ptr<FilterEngine_GPU> createDerivFilter_GPU(int srcType, int dstType, int dx, int dy, int ksize, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
Creates filter engine for the generalized Sobel operator.
Creates a filter engine for the generalized Sobel operator.
:param dstType: Destination image type; must have as many channels as ``srcType``. Supports ``CV_8U``, ``CV_16S``, ``CV_32S`` and ``CV_32F`` depths.
:param dstType: Destination image type with as many channels as ``srcType`` . ``CV_8U``, ``CV_16S``, ``CV_32S``, and ``CV_32F`` depths are supported.
:param dx: Derivative order in respect with x.
:param dx: Derivative order in respect of x.
:param dy: Derivative order in respect with y.
:param dy: Derivative order in respect of y.
:param ksize: Aperture size; see :c:func:`getDerivKernels`.
:param ksize: Aperture size. See :c:func:`getDerivKernels` for details.
:param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
:param rowBorderType, columnBorderType: Pixel extrapolation method. See :c:func:`borderInterpolate` for details.
See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`createDerivFilter`.
.. index:: gpu::Sobel
gpu::Sobel
--------------
.. cpp:function:: void gpu::Sobel(const GpuMat& src, GpuMat& dst, int ddepth, int dx, int dy, int ksize = 3, double scale = 1, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
Applies generalized Sobel operator to the image.
Applies the generalized Sobel operator to an image.
:param ddepth: Destination image depth. ``CV_8U``, ``CV_16S``, ``CV_32S``, and ``CV_32F`` are supported.
:param dx: Derivative order in respect with x.
:param dx: Derivative order in respect of x.
:param dy: Derivative order in respect with y.
:param dy: Derivative order in respect of y.
:param ksize: Size of the extended Sobel kernel, must be 1, 3, 5 or 7.
:param ksize: Size of the extended Sobel kernel. Possible valies are 1, 3, 5 or 7.
:param scale: Optional scale factor for the computed derivative values (by default, no scaling is applied, see :c:func:`getDerivKernels`).
:param scale: Optional scale factor for the computed derivative values. By default, no scaling is applied. For details, see :c:func:`getDerivKernels` .
:param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
:param rowBorderType, columnBorderType: Pixel extrapolation method. See :c:func:`borderInterpolate` for details.
See also: :cpp:func:`gpu::createSeparableLinearFilter_GPU`, :c:func:`Sobel`.
.. index:: gpu::Scharr
gpu::Scharr
---------------
.. cpp:function:: void gpu::Scharr(const GpuMat& src, GpuMat& dst, int ddepth, int dx, int dy, double scale = 1, int rowBorderType = BORDER_DEFAULT, int columnBorderType = -1)
Calculates the first x- or y- image derivative using Scharr operator.
Calculates the first x- or y- image derivative using the Scharr operator.
:param dst: Destination image; will have the same size and the same type as ``src``.
:param dst: Destination image. The size and type is the same as ``src`` has.
:param ksize: Gaussian kernel size; ``ksize.width`` and ``ksize.height`` can differ, but they both must be positive and odd. Or, they can be zero's, then they are computed from ``sigmaX`` amd ``sigmaY``.
:param ksize: Gaussian kernel size. ``ksize.width`` and ``ksize.height`` can differ but they both must be positive and odd. If they are zeros, they are computed from ``sigmaX`` and ``sigmaY`` .
:param sigmaX, sigmaY: Gaussian kernel standard deviations in X and Y direction. If ``sigmaY`` is zero, it is set to be equal to ``sigmaX``. If they are both zeros, they are computed from ``ksize.width`` and ``ksize.height``, respectively, see :c:func:`getGaussianKernel`. To fully control the result regardless of possible future modification of all this semantics, it is recommended to specify all of ``ksize``, ``sigmaX`` and ``sigmaY``.
:param sigmaX, sigmaY: Gaussian kernel standard deviations in X and Y direction. If ``sigmaY`` is zero, it is set to be equal to ``sigmaX`` . If they are both zeros, they are computed from ``ksize.width`` and ``ksize.height``, respectively. See :c:func:`getGaussianKernel` for details. To fully control the result regardless of possible future modification of all this semantics, you are recommended to specify all of ``ksize``, ``sigmaX``, and ``sigmaY`` .
:param rowBorderType, columnBorderType: Pixel extrapolation method; see :c:func:`borderInterpolate`.
:param rowBorderType, columnBorderType: Pixel extrapolation method. See :c:func:`borderInterpolate` for details.
See also: :cpp:func:`gpu::createGaussianFilter_GPU`, :c:func:`GaussianBlur`.
.. index:: gpu::getMaxFilter_GPU
gpu::getMaxFilter_GPU
-------------------------
.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMaxFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1,-1))
Creates maximum filter.
Creates the maximum filter.
:param srcType: Input image type. Supports only ``CV_8UC1`` and ``CV_8UC4``.
:param srcType: Input image type. Only ``CV_8UC1`` and ``CV_8UC4`` are supported.
:param dstType: Output image type. Supports only the same type as source.
:param dstType: Output image type. It supports only the same type as the source type.
:param ksize: Kernel size.
:param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
.. index:: gpu::getMinFilter_GPU
gpu::getMinFilter_GPU
-------------------------
.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMinFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1,-1))
.. cpp:function:: Ptr<BaseFilter_GPU> gpu::getMinFilter_GPU(int srcType, int dstType, const Size& ksize, Point anchor = Point(-1,-1))
Creates minimum filter.
Creates the minimum filter.
:param srcType: Input image type. Supports only ``CV_8UC1`` and ``CV_8UC4``.
:param srcType: Input image type. Only ``CV_8UC1`` and ``CV_8UC4`` are supported.
:param dstType: Output image type. Supports only the same type as source.
:param dstType: Output image type. It supports only the same type as the source type.
:param ksize: Kernel size.
:param anchor: Anchor point. The default value (-1) means that the anchor is at the kernel center.
**Please note:** This filter doesn't check out of border accesses, so only proper submatrix of bigger matrix have to be passed to it.
**Note:** This filter does not check out-of-border accesses, so only a proper sub-matrix of a bigger matrix has to be passed to it.
The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVidia CUDA Runtime API, so only the NVidia GPUs are supported. It includes utility functions, low-level vision primitives as well as high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU. Whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors etc.), ready to be used by the application developers.
The OpenCV GPU module is a set of classes and functions to utilize GPU computational capabilities. It is implemented using NVidia* CUDA Runtime API and supports only NVidia GPUs. The OpenCV GPU module includes utility functions, low-level vision primitives, and high-level algorithms. The utility functions and low-level primitives provide a powerful infrastructure for developing fast vision algorithms taking advantage of GPU whereas the high-level functionality includes some state-of-the-art algorithms (such as stereo correspondence, face and people detectors, and others), ready to be used by the application developers.
The GPU module is designed as host-level API, i.e. if a user has pre-compiled OpenCV GPU binaries, it is not necessary to have Cuda Toolkit installed or write any extra code to make use of the GPU.
The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.
The GPU module depends on the Cuda Toolkit and NVidia Performance Primitives library (NPP). Make sure you have the latest versions of those. The two libraries can be downloaded from NVidia site for all supported platforms. To compile OpenCV GPU module you will need a compiler compatible with Cuda Runtime Toolkit.
The GPU module depends on the CUDA Toolkit and NVidia Performance Primitives library (NPP). Make sure you have the latest versions of this software installed. You can download two libraries for all supported platforms from the NVidia site. To compile the OpenCV GPU module, you need a compiler compatible with the Cuda Runtime Toolkit.
OpenCV GPU module is designed for ease of use and does not require any knowledge of Cuda. Though, such a knowledge will certainly be useful in non-trivial cases, or when you want to get the highest performance. It is helpful to have understanding of the costs of various operations, what the GPU does, what are the preferred data formats etc. The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if you algorithm involves many simple operations, then for the best possible performance you may still need to write your own kernels, to avoid extra write and read operations on the intermediate results.
The OpenCV GPU module is designed for ease of use and does not require any knowledge of CUDA. Though, such a knowledge will certainly be useful to handle non-trivial cases or achieve the highest performance. It is helpful to understand the cost of various operations, what the GPU does, what the preferred data formats are, and so on. The GPU module is an effective instrument for quick implementation of GPU-accelerated computer vision algorithms. However, if your algorithm involves many simple operations, then, for the best possible performance, you may still need to write your own kernels to avoid extra write and read operations on the intermediate results.
To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module will be built. Otherwise, the module will still be built, but at runtime all functions from the module will throw :c:type:`Exception` with ``CV_GpuNotSupported`` error code, except for :cpp:func:`gpu::getCudaEnabledDeviceCount`. The latter function will return zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require Cuda Toolkit installed. Therefore, using :cpp:func:`gpu::getCudaEnabledDeviceCount` function it is possible to implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
To enable CUDA support, configure OpenCV using CMake with ``WITH_CUDA=ON`` . When the flag is set and if CUDA is installed, the full-featured OpenCV GPU module is built. Otherwise, the module is still built, but at runtime all functions from the module throw
:func:`Exception` with ``CV_GpuNotSupported`` error code, except for
:func:`gpu::getCudaEnabledDeviceCount()`. The latter function returns zero GPU count in this case. Building OpenCV without CUDA support does not perform device code compilation, so it does not require the CUDA Toolkit installed. Therefore, using
:func:`gpu::getCudaEnabledDeviceCount()` function, you can implement a high-level algorithm that will detect GPU presence at runtime and choose the appropriate implementation (CPU or GPU) accordingly.
Compilation for different NVidia platforms.
Compilation for Different NVidia* Platforms
-------------------------------------------
NVidia compiler allows generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform, which is defined entirely by the set of capabilities, or features. Depending on the virtual platform chosen, some of the instructions will be emulated or disabled, even if the real hardware supports all the features.
On first call, the PTX code is compiled to binary code for the particular GPU using JIT compiler. When the target GPU has lower "compute capability" (CC) than the PTX code, JIT fails.
NVidia* compiler enables generating binary code (cubin and fatbin) and intermediate code (PTX). Binary code often implies a specific GPU architecture and generation, so the compatibility with other GPUs is not guaranteed. PTX is targeted for a virtual platform that is defined entirely by the set of capabilities or features. Depending on the selected virtual platform, some of the instructions are emulated or disabled, even if the real hardware supports all the features.
At the first call, the PTX code is compiled to binary code for the particular GPU using a JIT compiler. When the target GPU has a compute capability (CC) lower than the PTX code, JIT fails.
By default, the OpenCV GPU module includes:
* Binaries for compute capabilities 1.1, 1.2, 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN`` in CMake)
*
Binaries for compute capabilities 1.3 and 2.0 (controlled by ``CUDA_ARCH_BIN`` in CMake)
* PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX`` in CMake)
*
PTX code for compute capabilities 1.1 and 1.3 (controlled by ``CUDA_ARCH_PTX`` in CMake)
That means for devices with CC 1.1, 1.2, 1.3 and 2.0 binary images are ready to run. For all newer platforms the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.0 no code is available and the functions will throw
:c:type:`Exception`. For platforms where JIT compilation is performed first run will be slow.
This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT'ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT'ed. For devices with CC 1.0, no code is available and the functions throw
:func:`Exception`. For platforms where JIT compilation is performed first, run is slow.
If you happen to have GPU with CC 1.0, the GPU module can still be compiled on it and most of the functions will run just fine on such card. Simply add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"``. The functions that can not be run on CC 1.0 GPUs will throw an exception.
On a GPU with CC 1.0, you can still compile the GPU module and most of the functions will run flawlessly. To achieve this, add "1.0" to the list of binaries, for example, ``CUDA_ARCH_BIN="1.0 1.3 2.0"`` . The functions that cannot be run on CC 1.0 GPUs throw an exception.
You can always determine at runtime whether OpenCV GPU built binaries (or PTX code) are compatible with your GPU. The function :cpp:func:`gpu::DeviceInfo::isCompatible` return the compatibility status (true/false).
You can always determine at runtime whether the OpenCV GPU-built binaries (or PTX code) are compatible with your GPU. The function
:func:`gpu::DeviceInfo::isCompatible` returns the compatibility status (true/false).
Threading and multi-threading.
Threading and Multi-threading
------------------------------
OpenCV GPU module follows Cuda Runtime API conventions regarding the multi-threaded programming. That is, on first the API call a Cuda context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as memory allocation, GPU code compilation, will be associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread can not be accessed by the other thread. Instead, for this other thread Cuda will create another context associated with it. In short, by default different threads do not share resources.
The OpenCV GPU module follows the CUDA Runtime API conventions regarding the multi-threaded programming. This means that for the first API call a CUDA context is created implicitly, attached to the current CPU thread and then is used as the thread's "current" context. All further operations, such as memory allocation, GPU code compilation, are associated with the context and the thread. Because any other thread is not attached to the context, memory (and other resources) allocated in the first thread cannot be accessed by the other thread. Instead, for this other thread CUDA creates another context associated with it. In short, by default, different threads do not share resources.
But such limitation can be removed using Cuda Driver API (version 3.1 or later). User can retrieve context reference for one thread, attach it to another thread and make it "current" for that thread. Then the threads can share memory and other resources. It is also possible to create a context explicitly before calling any GPU code and attach it to all the threads that you want to share the resources.
But you can remove this limitation by using the CUDA Driver API (version 3.1 or later). You can retrieve context reference for one thread, attach it to another thread, and make it "current" for that thread. As a result, the threads can share memory and other resources. It is also possible to create a context explicitly before calling any GPU code and attach it to all the threads you want to share the resources with.
Also it is possible to create context explicitly using Cuda Driver API, attach and make "current" for all necessary threads. Cuda Runtime API (and OpenCV functions respectively) will pick up it.
It is also possible to create the context explicitly using the CUDA Driver API, attach, and set the "current" context for all necessary threads. The CUDA Runtime API (and OpenCV functions, respectively) picks it up.
Multi-GPU
Utilizing Multiple GPUs
---------
In the current version each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, user has to manually distribute the work between the GPUs. Here are the two ways of utilizing multiple GPUs:
In the current version, each of the OpenCV GPU algorithms can use only a single GPU. So, to utilize multiple GPUs, you have to manually distribute the work between GPUs. Here are the two ways of utilizing multiple GPUs:
*
If you only use synchronous functions, create several CPU threads (one per each GPU) and from within each thread create a CUDA context for the corresponding GPU using
:func:`gpu::setDevice()` or Driver API. Each of the threads will use the associated GPU.
*
If you use asynchronous functions, you can use the Driver API to create several CUDA contexts associated with different GPUs but attached to one CPU thread. Within the thread you can switch from one GPU to another by making the corresponding context "current". With non-blocking GPU calls, managing algorithm is clear.
While developing algorithms for multiple GPUs, note a data passing overhead. For primitive functions and small images, it can be significant, which may eliminate all the advantages of having multiple GPUs. But for high-level algorithms, consider using multi-GPU acceleration. For example, the Stereo Block Matching algorithm has been successfully parallelized using the following algorithm:
1. Split each image of the stereo pair into two horizontal overlapping stripes.
* If you only use synchronous functions, first, create several CPU threads (one per each GPU) and from within each thread create CUDA context for the corresponding GPU using :cpp:func:`gpu::setDevice` or Driver API. That's it. Now each of the threads will use the associated GPU.
2. Process each pair of stripes (from the left and right images) on a separate Fermi* GPU.
* In case of asynchronous functions, it is possible to create several Cuda contexts associated with different GPUs but attached to one CPU thread. This can be done only by Driver API. Within the thread you can switch from one GPU to another by making the corresponding context "current". With non-blocking GPU calls managing algorithm is clear.
While developing algorithms for multiple GPUs a data passing overhead have to be taken into consideration. For primitive functions and for small images it can be significant and eliminate all the advantages of having multiple GPUs. But for high level algorithms Multi-GPU acceleration may be suitable. For example, Stereo Block Matching algorithm has been successfully parallelized using the following algorithm:
3. Merge the results into a single disparity map.
* Each image of the stereo pair is split into two horizontal overlapping stripes.
* Each pair of stripes (from the left and the right images) has been processed on a separate Fermi GPU
* The results are merged into the single disparity map.
With this algorithm, a dual GPU gave a 180
%
performance increase comparing to the single Fermi GPU. For a source code example, see
With this scheme dual GPU gave 180 % performance increase comparing to the single Fermi GPU. The source code of the example is available at https://code.ros.org/svn/opencv/trunk/opencv/examples/gpu/.
Histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector. ::
Histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of oriented gradients for human detection. 2005.] descriptor and detector.
::
struct HOGDescriptor
struct CV_EXPORTS HOGDescriptor
{
enum { DEFAULT_WIN_SIGMA = -1 };
enum { DEFAULT_NLEVELS = 64 };
...
...
@@ -62,47 +61,46 @@ Histogram of Oriented Gradients [Navneet Dalal and Bill Triggs. Histogram of ori
}
Interfaces of all methods are kept similar to CPU HOG descriptor and detector analogues as much as possible.
Interfaces of all methods are kept similar to the ``CPU HOG`` descriptor and detector analogues as much as possible.
:param img: Source image. ``CV_8UC1`` and ``CV_8UC4`` types are supported for now.
Performs object detection without a multi-scale window.
:param found_locations: Will contain left-top corner points of detected objects boundaries.
:param img: Source image. ``CV_8UC1`` and ``CV_8UC4`` types are supported for now.
:param hit_threshold: Threshold for the distance between features and SVM classifying plane. Usually it's 0 and should be specfied in the detector coefficients (as the last free coefficient), but if the free coefficient is omitted (it's allowed) you can specify it manually here.
:param found_locations: Left-top corner points of detected objects boundaries.
:param win_stride: Window stride. Must be a multiple of block stride.
:param padding: Mock parameter to keep CPU interface compatibility. Must be (0,0).
:param hit_threshold: Threshold for the distance between features and SVM classifying plane. Usually it is 0 and should be specfied in the detector coefficients (as the last free coefficient). But if the free coefficient is omitted (which is allowed), you can specify it manually here.
:param win_stride: Window stride. It must be a multiple of block stride.
:param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
:param hit_threshold: The threshold for the distance between features and SVM classifying plane. See :cpp:func:`gpu::HOGDescriptor::detect` for details.
:param hit_threshold: Threshold for the distance between features and SVM classifying plane. See :func:`gpu::HOGDescriptor::detect` for details.
:param win_stride: Window stride. Must be a multiple of block stride.
:param win_stride: Window stride. It must be a multiple of block stride.
:param padding: Mock parameter to keep CPU interface compatibility. Must be (0,0).
:param padding: Mock parameter to keep the CPU interface compatibility. Must be (0,0).
:param scale0: Coefficient of the detection window increase.
:param group_threshold: After detection some objects could be covered by many rectangles. This coefficient regulates similarity threshold. 0 means don't perform grouping. See :c:func:`groupRectangles`.
:param group_threshold: Coefficient to regulate the similarity threshold. When detected, some objects can be covered by many rectangles. 0 means not to perform grouping. See
:param filename: Name of file from which classifier will be load. Only old haar classifier (trained by haartraining application) and NVidia's nvbin are supported.
:param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
Loads the classifier from file. The previous content is destroyed.
:param filename: Name of file from which classifier will be load. Only old haar classifier (trained by haartraining application) and NVidia's nvbin are supported.
Loads the classifier from a file. The previous content is destroyed.
:param filename: Name of the file from which the classifier is loaded. Only the old ``haar`` classifier (trained by the haartraining application) and NVidia's ``nvbin`` are supported.
.. cpp:function:: int gpu::CascadeClassifier_GPU::detectMultiScale(const GpuMat& image, GpuMat& objectsBuf, double scaleFactor=1.2, int minNeighbors=4, Size minSize=Size())
.. cpp:function:: int gpu::CascadeClassifier_GPU::detectMultiScale(const GpuMat\& image, GpuMat\& objectsBuf, double scaleFactor=1.2, int minNeighbors=4, Size minSize=Size())
Detects objects of different sizes in the input image. The detected objects are returned as a list of rectangles.
:param image: Matrix of type ``CV_8U`` containing the image in which to detect objects.
:param image: Matrix of type ``CV_8U`` containing an image where objects should be detected.
:param objects: Buffer to store detected objects (rectangles). If it is empty, it will be allocated with default size. If not empty, function will search not more than N objects, where ``N = sizeof(objectsBufer's data)/sizeof(cv::Rect)``.
:param objects: Buffer to store detected objects (rectangles). If it is empty, it is allocated with the default size. If not empty, the function searches not more than N objects, where N = sizeof(objectsBufer's data)/sizeof(cv::Rect).
:param scaleFactor: Specifies how much the image size is reduced at each image scale.
:param scaleFactor: Value to specify how much the image size is reduced at each image scale.
:param minNeighbors: Specifies how many neighbors should each candidate rectangle have to retain it.
:param minNeighbors: Value to specify how many neighbours each candidate rectangle has to retain.
:param minSize: The minimum possible object size. Objects smaller than that are ignored.
:param minSize: Minimum possible object size. Objects smaller than that are ignored.
The function returns number of detected objects, so you can retrieve them as in following example: ::
The function returns the number of detected objects, so you can retrieve them as in the following example: ::
cv::gpu::CascadeClassifier_GPU cascade_gpu(...);
gpu::CascadeClassifier_GPU cascade_gpu(...);
Mat image_cpu = imread(...)
GpuMat image_gpu(image_cpu);
...
...
@@ -336,5 +323,6 @@ The function returns number of detected objects, so you can retrieve them as in
imshow("Faces", image_cpu);
See also: :c:func:`CascadeClassifier::detectMultiScale`.
See Also: :c:func:`CascadeClassifier::detectMultiScale` .
Transforms the source matrix into the destination matrix using given look-up table:
Transforms the source matrix into the destination matrix using the given look-up table: ``dst(I) = lut(src(I))``
.. math::
:param src: Source matrix. ``CV_8UC1`` and ``CV_8UC3`` matrices are supported for now.
dst(I) = lut(src(I))
:param src: Source matrix. ``CV_8UC1`` and ``CV_8UC3`` matrixes are supported for now.
:param lut: Look-up table. Must be continuous, ``CV_8U`` depth matrix. Its area must satisfy to ``lut.rows`` :math:`\times` ``lut.cols`` = 256 condition.
:param dst: Destination matrix. Will have the same depth as ``lut`` and the same number of channels as ``src``.
See also: :c:func:`LUT`.
:param lut: Look-up table of 256 elements. Must be continuous, ``CV_8U`` matrix.
:param dst: Destination matrix with the same depth as ``lut`` and the same number of channels as ``src`` .