OpenCV (Open Source Computer Vision Library: http://opencv.willowgarage.com/wiki/) is open-source BSD-licensed library that includes several hundreds computer vision algorithms. It is very popular in the Computer Vision community. Some people call it “de-facto standard” API. The document aims to specify the stable parts of the library, as well as some abstract interfaces for high-level interfaces, with the final goal to make it an official standard.
OpenCV (Open Source Computer Vision Library: http://opencv.willowgarage.com/wiki/) is open-source BSD-licensed library that includes several hundreds computer vision algorithms. The document describes the so-called OpenCV 2.x API, which is essentially a C++ API, as opposite to the C-based OpenCV 1.x API. The latter is described in opencv1x.pdf.
API specifications in the document use the standard C++ (http://www.open-std.org/jtc1/sc22/wg21/) and the standard C++ library.
The current OpenCV implementation has a modular structure (i.e. the binary package includes several shared or static libraries), where we have:
OpenCV has a modular structure (i.e. package includes several shared or static libraries). The modules are:
* **core** - the compact module defining basic data structures, including the dense multi-dimensional array ``Mat``, and basic functions, used by all other modules.
* **imgproc** - image processing module that includes linear and non-linear image filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remap), color space conversion, histograms etc.
...
...
@@ -18,9 +16,7 @@ The current OpenCV implementation has a modular structure (i.e. the binary packa
* **gpu** - GPU-accelerated algorithms from different OpenCV modules.
* ... some other helper modules, such as FLANN and Google test wrappers, Python bindings etc.
Although the alternative implementations of the proposed standard may be structured differently, the proposed standard draft is organized by the functionality groups that reflect the decomposition of the library by modules.
Below are the other main concepts of the OpenCV API, implied everywhere in the document.
The further chapters of the document describe functionality of each module. But first, let's make an overview of the common API concepts, used thoroughly in the library.
.. c:function:: Mat imread( const string\& filename, int flags=1 )
...
...
@@ -61,34 +67,19 @@ imread
The function ``imread`` loads an image from the specified file and returns it. If the image can not be read (because of missing file, improper permissions, unsupported or invalid format), the function returns empty matrix ( ``Mat::data==NULL`` ).Currently, the following file formats are supported:
*
Windows bitmaps - ``*.bmp, *.dib`` (always supported)
* Windows bitmaps - ``*.bmp, *.dib`` (always supported)
*
JPEG files - ``*.jpeg, *.jpg, *.jpe`` (see
**Note2**
)
* JPEG files - ``*.jpeg, *.jpg, *.jpe`` (see **Note2**)
*
JPEG 2000 files - ``*.jp2`` (see
**Note2**
)
* JPEG 2000 files - ``*.jp2`` (see **Note2**)
*
Portable Network Graphics - ``*.png`` (see
**Note2**
)
* Portable Network Graphics - ``*.png`` (see **Note2**)
*
Portable image format - ``*.pbm, *.pgm, *.ppm`` (always supported)
* Portable image format - ``*.pbm, *.pgm, *.ppm`` (always supported)
*
Sun rasters - ``*.sr, *.ras`` (always supported)
* Sun rasters - ``*.sr, *.ras`` (always supported)
*
TIFF files - ``*.tiff, *.tif`` (see
**Note2**
)
* TIFF files - ``*.tiff, *.tif`` (see **Note2**)
**Note1**
: The function determines type of the image by the content, not by the file extension.
...
...
@@ -100,6 +91,8 @@ On Linux, BSD flavors and other Unix-like open-source operating systems OpenCV l
@@ -84,9 +88,11 @@ The function ``imshow`` displays the image in the specified window. If the windo
.. index:: namedWindow
.. _namedWindow:
namedWindow
---------------
.. c:function:: void namedWindow( const string\& winname, int flags )
.. c:function:: void namedWindow( const string& winname, int flags )
Creates a window.
...
...
@@ -104,27 +110,29 @@ qt-specific details:
* **flags** Flags of the window. Currently the supported flags are:
* **CV_WINDOW_NORMAL or CV_WINDOW_AUTOSIZE:** ``CV_WINDOW_NORMAL`` let the user resize the window, whereas ``CV_WINDOW_AUTOSIZE`` adjusts automatically the window's size to fit the displayed image (see :ref:`ShowImage` ), and the user can not change the window size manually.
* **CV_WINDOW_NORMAL or CV_WINDOW_AUTOSIZE:** ``CV_WINDOW_NORMAL`` let the user resize the window, whereas ``CV_WINDOW_AUTOSIZE`` adjusts automatically the window's size to fit the displayed image (see :ref:`imshow` ), and the user can not change the window size manually.
* **CV_WINDOW_FREERATIO or CV_WINDOW_KEEPRATIO:** ``CV_WINDOW_FREERATIO`` adjust the image without respect the its ration, whereas ``CV_WINDOW_KEEPRATIO`` keep the image's ratio.
* **CV_GUI_NORMAL or CV_GUI_EXPANDED:** ``CV_GUI_NORMAL`` is the old way to draw the window without statusbar and toolbar, whereas ``CV_GUI_EXPANDED`` is the new enhance GUI.
This parameter is optional. The default flags set for a new window are ``CV_WINDOW_AUTOSIZE`` , ``CV_WINDOW_KEEPRATIO`` , and ``CV_GUI_EXPANDED`` .
This parameter is optional. The default flags set for a new window are ``CV_WINDOW_AUTOSIZE`` , ``CV_WINDOW_KEEPRATIO`` , and ``CV_GUI_EXPANDED`` .
However, if you want to modify the flags, you can combine them using OR operator, ie:
Applies a generic geometrical transformation to an image.
:param src: Source image
:param dst: Destination image. It will have the same size as ``map1`` and the same type as ``src``
:param map1: The first map of either ``(x,y)`` points or just ``x`` values having type ``CV_16SC2`` , ``CV_32FC1`` or ``CV_32FC2`` . See :func:`convertMaps` for converting floating point representation to fixed-point for speed.
:param map1: The first map of either ``(x,y)`` points or just ``x`` values having type ``CV_16SC2`` , ``CV_32FC1`` or ``CV_32FC2`` . See :func:`convertMaps` for converting floating point representation to fixed-point for speed.
:param map2: The second map of ``y`` values having type ``CV_16UC1`` , ``CV_32FC1`` or none (empty map if map1 is ``(x,y)`` points), respectively
:param map2: The second map of ``y`` values having type ``CV_16UC1`` , ``CV_32FC1`` or none (empty map if map1 is ``(x,y)`` points), respectively
:param interpolation: The interpolation method, see :func:`resize` . The method ``INTER_AREA`` is not supported by this function
...
...
@@ -252,9 +275,12 @@ This function can not operate in-place.
@@ -7,7 +9,7 @@ A common machine learning task is supervised learning. In supervised learning, t
:math:`y` . Predicting the qualitative output is called classification, while predicting the quantitative output is called regression.
Boosting is a powerful learning concept, which provide a solution to the supervised classification learning task. It combines the performance of many "weak" classifiers to produce a powerful 'committee'
:ref:`HTF01` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. Many of them smartly combined, however, results in a strong classifier, which often outperforms most 'monolithic' strong classifiers such as SVMs and Neural Networks.
:ref:`[HTF01] <HTF01>` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. Many of them smartly combined, however, results in a strong classifier, which often outperforms most 'monolithic' strong classifiers such as SVMs and Neural Networks.
Decision trees are the most popular weak classifiers used in boosting schemes. Often the simplest decision trees with only a single split node per tree (called stumps) are sufficient.
...
...
@@ -20,7 +22,7 @@ The boosted model is based on
:math:`K` -component vector. Each component encodes a feature relevant for the learning task at hand. The desired two-class output is encoded as -1 and +1.
Different variants of boosting are known such as Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost
:ref:`FHT98` . All of them are very similar in their overall structure. Therefore, we will look only at the standard two-class Discrete AdaBoost algorithm as shown in the box below. Each sample is initially assigned the same weight (step 2). Next a weak classifier
:ref:`[FHT98] <FHT98>` . All of them are very similar in their overall structure. Therefore, we will look only at the standard two-class Discrete AdaBoost algorithm as shown in the box below. Each sample is initially assigned the same weight (step 2). Next a weak classifier
:math:`f_{m(x)}` is trained on the weighted training data (step 3a). Its weighted training error and scaling factor
:math:`c_m` is computed (step 3b). The weights are increased for training samples, which have been misclassified (step 3c). All weights are then normalized, and the process of finding the next weak classifier continues for another
:math:`M` -1 times. The final classifier
...
...
@@ -65,15 +67,20 @@ As well as the classical boosting methods, the current implementation supports 2
:math:`>` 2 classes there is the
**AdaBoost.MH**
algorithm, described in
:ref:`FHT98` , that reduces the problem to the 2-class problem, yet with a much larger training set.
:ref:`[FHT98] <FHT98>` , that reduces the problem to the 2-class problem, yet with a much larger training set.
In order to reduce computation time for boosted models without substantially losing accuracy, the influence trimming technique may be employed. As the training algorithm proceeds and the number of trees in the ensemble is increased, a larger number of the training samples are classified correctly and with increasing confidence, thereby those samples receive smaller weights on the subsequent iterations. Examples with very low relative weight have small impact on training of the weak classifier. Thus such examples may be excluded during the weak classifier training without having much effect on the induced classifier. This process is controlled with the weight_trim_rate parameter. Only examples with the summary fraction weight_trim_rate of the total weight mass are used in the weak classifier training. Note that the weights for
**all**
training examples are recomputed at each training iteration. Examples deleted at a particular iteration may be used again for learning some of the weak classifiers further
:ref:`FHT98` .
:ref:`[FHT98] <FHT98>` .
.. _HTF01:
[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. 2001.**
.. _FHT98:
**[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. 2001.**
**[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics, Stanford University, 1998.**
[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics, Stanford University, 1998.**
@@ -57,7 +57,7 @@ Alternatively, the algorithm may start with the M-step when the initial values f
:math:`p_{i,k}` can be provided. Another alternative when
:math:`p_{i,k}` are unknown, is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` . Often (and in ML) the
:ref:`KMeans2` algorithm is used for that purpose.
:ref:`kmeans` algorithm is used for that purpose.
One of the main that EM algorithm should deal with is the large number
of parameters to estimate. The majority of the parameters sits in
...
...
@@ -197,12 +197,12 @@ CvEM::train
Estimates the Gaussian mixture parameters from the sample set.
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or the function values) on input. Instead, it computes the
:ref:`MLE` of the Gaussian mixture parameters from the input sample set, stores all the parameters inside the structure:
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from the input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,:math:`a_k` in ``means`` :math:`S_k` in ``covs[k]``,:math:`\pi_k` in ``weights`` and optionally computes the output "class label" for each sample:
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (i.e. indices of the most-probable mixture for each sample).
The trained model can be used further for prediction, just like any other classifier. The model trained is similar to the
:ref:`Bayes classifier`.
:ref:`Bayes classifier`.
Example: Clustering random samples of multi-Gaussian distribution using EM ::
This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed), so the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
**[Fukunaga90] K. Fukunaga. Introduction to Statistical Pattern Recognition. second ed., New York: Academic Press, 1990.**