A common machine learning task is supervised learning. In supervised learning, the goal is to learn the functional relationship
A common machine learning task is supervised learning. In supervised learning, the goal is to learn the functional relationship
:math:`F: y = F(x)` between the input
:math:`F: y = F(x)` between the input
:math:`x` and the output
:math:`x` and the output
:math:`y` . Predicting the qualitative output is called classification, while predicting the quantitative output is called regression.
:math:`y` . Predicting the qualitative output is called *classification*, while predicting the quantitative output is called *regression*.
Boosting is a powerful learning concept that provides a solution to the supervised classification learning task. It combines the performance of many "weak" classifiers to produce a powerful 'committee'
Boosting is a powerful learning concept that provides a solution to the supervised classification learning task. It combines the performance of many "weak" classifiers to produce a powerful committee
:ref:`[HTF01] <HTF01>` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. However, many of them smartly combine results to a strong classifier that often outperforms most "monolithic" strong classifiers such as SVMs and Neural Networks.??
:ref:`[HTF01] <HTF01>` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. However, many of them smartly combine results to a strong classifier that often outperforms most "monolithic" strong classifiers such as SVMs and Neural Networks.
Decision trees are the most popular weak classifiers used in boosting schemes. Often the simplest decision trees with only a single split node per tree (called ``stumps`` ) are sufficient.
Decision trees are the most popular weak classifiers used in boosting schemes. Often the simplest decision trees with only a single split node per tree (called ``stumps`` ) are sufficient.
...
@@ -19,10 +21,10 @@ The boosted model is based on
...
@@ -19,10 +21,10 @@ The boosted model is based on
:math:`x_i \in{R^K}` and
:math:`x_i \in{R^K}` and
:math:`y_i \in{-1, +1}` .
:math:`y_i \in{-1, +1}` .
:math:`x_i` is a
:math:`x_i` is a
:math:`K` -component vector. Each component encodes a feature relevant for the learning task at hand. The desired two-class output is encoded as -1 and +1.
:math:`K` -component vector. Each component encodes a feature relevant to the learning task at hand. The desired two-class output is encoded as -1 and +1.
Different variants of boosting are known as Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost
Different variants of boosting are known as Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost
:ref:`[FHT98] <FHT98>` . All of them are very similar in their overall structure. Therefore, this chapter focuses only on the standard two-class Discrete AdaBoost algorithm as shown in the box below??. Each sample is initially assigned the same weight (step 2). Then, a weak classifier
:ref:`[FHT98] <FHT98>` . All of them are very similar in their overall structure. Therefore, this chapter focuses only on the standard two-class Discrete AdaBoost algorithm as shown in the box below??. Initially the same weight is assigned to each sample (step 2). Then, a weak classifier
:math:`f_{m(x)}` is trained on the weighted training data (step 3a). Its weighted training error and scaling factor
:math:`f_{m(x)}` is trained on the weighted training data (step 3a). Its weighted training error and scaling factor
:math:`c_m` is computed (step 3b). The weights are increased for training samples that have been misclassified (step 3c). All weights are then normalized, and the process of finding the next weak classifier continues for another
:math:`c_m` is computed (step 3b). The weights are increased for training samples that have been misclassified (step 3c). All weights are then normalized, and the process of finding the next weak classifier continues for another
:math:`M` -1 times. The final classifier
:math:`M` -1 times. The final classifier
...
@@ -63,34 +65,25 @@ Different variants of boosting are known as Discrete Adaboost, Real AdaBoost, Lo
...
@@ -63,34 +65,25 @@ Different variants of boosting are known as Discrete Adaboost, Real AdaBoost, Lo
Two-class Discrete AdaBoost Algorithm: Training (steps 1 to 3) and Evaluation (step 4)??you need to revise this section. what is this? a title for the image that is missing?
Two-class Discrete AdaBoost Algorithm: Training (steps 1 to 3) and Evaluation (step 4)??you need to revise this section. what is this? a title for the image that is missing?
**NOTE:**
.. note:: Similar to the classical boosting methods, the current implementation supports two-class classifiers only. For M
:math:`>` two classes, there is the **AdaBoost.MH** algorithm (described in :ref:`[FHT98] <FHT98>` ) that reduces the problem to the two-class problem, yet with a much larger training set.
Similar to the classical boosting methods, the current implementation supports two-class classifiers only. For M
:math:`>` two classes, there is the
**AdaBoost.MH**
algorithm (described in
:ref:`[FHT98] <FHT98>` ) that reduces the problem to the two-class problem, yet with a much larger training set.
To reduce computation time for boosted models without substantially losing accuracy, the influence trimming technique may be employed. As the training algorithm proceeds and the number of trees in the ensemble is increased, a larger number of the training samples are classified correctly and with increasing confidence, thereby those samples receive smaller weights on the subsequent iterations. Examples with a very low relative weight have a small impact on the weak classifier training. Thus, such examples may be excluded during the weak classifier training without having much effect on the induced classifier. This process is controlled with the ``weight_trim_rate`` parameter. Only examples with the summary fraction ``weight_trim_rate`` of the total weight mass are used in the weak classifier training. Note that the weights for
To reduce computation time for boosted models without substantially losing accuracy, the influence trimming technique can be employed. As the training algorithm proceeds and the number of trees in the ensemble is increased, a larger number of the training samples are classified correctly and with increasing confidence, thereby those samples receive smaller weights on the subsequent iterations. Examples with a very low relative weight have a small impact on the weak classifier training. Thus, such examples may be excluded during the weak classifier training without having much effect on the induced classifier. This process is controlled with the ``weight_trim_rate`` parameter. Only examples with the summary fraction ``weight_trim_rate`` of the total weight mass are used in the weak classifier training. Note that the weights for
**all**
**all**
training examples are recomputed at each training iteration. Examples deleted at a particular iteration may be used again for learning some of the weak classifiers further
training examples are recomputed at each training iteration. Examples deleted at a particular iteration may be used again for learning some of the weak classifiers further
:ref:`[FHT98] <FHT98>` .
:ref:`[FHT98] <FHT98>` .
.. _HTF01:??what is this meant to be? it doesn't work
.. _HTF01:
[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. *The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics*. 2001.
.. _FHT98:??the same comment
[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics*, Stanford University, 1998.
[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. *The Elements of Statistical Learning: Data Mining, Inference, and Prediction*. Springer Series in Statistics. 2001.
.. index:: CvBoostParams
.. _FHT98:
.. _CvBoostParams:
[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. *Additive Logistic Regression: a Statistical View of Boosting*. Technical Report, Dept. of Statistics, Stanford University, 1998.
CvBoostParams
CvBoostParams
-------------
-------------
.. c:type:: CvBoostParams
.. ocv:class:: CvBoostParams
Boosting training parameters.
Boosting training parameters.
...
@@ -98,10 +91,6 @@ The structure is derived from :ref:`CvDTreeParams` but not all of the decision t
...
@@ -98,10 +91,6 @@ The structure is derived from :ref:`CvDTreeParams` but not all of the decision t
All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
.. index:: CvBoostParams::CvBoostParams
.. _CvBoostParams::CvBoostParams:
CvBoostParams::CvBoostParams
CvBoostParams::CvBoostParams
----------------------------
----------------------------
.. ocv:function:: CvBoostParams::CvBoostParams()
.. ocv:function:: CvBoostParams::CvBoostParams()
...
@@ -137,9 +126,9 @@ Also there is one parameter that you can set directly.
...
@@ -137,9 +126,9 @@ Also there is one parameter that you can set directly.
CvBoostTree
CvBoostTree
-----------
-----------
.. c:type:: CvBoostTree
.. ocv:class:: CvBoostTree
Weak tree classifier ::
Weak tree classifier. ::
class CvBoostTree: public CvDTree
class CvBoostTree: public CvDTree
{
{
...
@@ -161,28 +150,21 @@ Weak tree classifier ::
...
@@ -161,28 +150,21 @@ Weak tree classifier ::
The weak classifier, a component of the boosted tree classifier
The weak classifier, a component of the boosted tree classifier
:ref:`CvBoost` , is a derivative of
:ocv:class:`CvBoost` , is a derivative of
:ref:`CvDTree` . Normally, there is no need to use the weak classifiers directly. However, they can be accessed as elements of the sequence ``CvBoost::weak`` , retrieved by ``CvBoost::get_weak_predictors`` .
:ocv:class:`CvDTree` . Normally, there is no need to use the weak classifiers directly. However, they can be accessed as elements of the sequence ``CvBoost::weak`` , retrieved by ``CvBoost::get_weak_predictors`` .
**Note:**
.. note::
In case of LogitBoost and Gentle AdaBoost, each weak predictor is a regression tree, rather than a classification tree. Even in case of Discrete AdaBoost and Real AdaBoost, the ``CvBoostTree::predict`` return value ( ``CvDTreeNode::value`` ) is not an output class label. A negative value "votes" for class
In case of LogitBoost and Gentle AdaBoost, each weak predictor is a regression tree, rather than a classification tree. Even in case of Discrete AdaBoost and Real AdaBoost, the ``CvBoostTree::predict`` return value ( ``CvDTreeNode::value`` ) is not an output class label. A negative value "votes" for class
#
#
0, a positive - for class
0, a positive value - for class
#
#
1. The votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale`` .
1. The votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale`` .
.. index:: CvBoost
CvBoost
CvBoost
-------
-------
.. ocv:class:: CvBoost
.. ocv:class:: CvBoost
Boosted tree classifier, derived from :ocv:class:`CvStatModel`
Boosted tree classifier derived from :ocv:class:`CvStatModel`.
.. index:: CvBoost::train
.. _CvBoost::train:
CvBoost::train
CvBoost::train
--------------
--------------
...
@@ -192,10 +174,6 @@ CvBoost::train
...
@@ -192,10 +174,6 @@ CvBoost::train
The train method follows the common template. The last parameter ``update`` specifies whether the classifier needs to be updated (the new weak tree classifiers added to the existing ensemble) or the classifier needs to be rebuilt from scratch. The responses must be categorical, which means that boosted trees cannot be built for regression, and there should be two classes.
The train method follows the common template. The last parameter ``update`` specifies whether the classifier needs to be updated (the new weak tree classifiers added to the existing ensemble) or the classifier needs to be rebuilt from scratch. The responses must be categorical, which means that boosted trees cannot be built for regression, and there should be two classes.
The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the ``CvBoostTree`` class or, probably, to some of its derivatives.
The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the ``CvBoostTree`` class or to some of its derivatives.
.. index:: CvBoost::get_params
.. _CvBoost::get_params:
CvBoost::get_params
CvBoost::get_params
-------------------
-------------------
...
@@ -262,14 +217,8 @@ CvBoost::get_params
...
@@ -262,14 +217,8 @@ CvBoost::get_params
Returns current parameters of the boosted tree classifier.
Returns current parameters of the boosted tree classifier.
The EM (Expectation Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
.. highlight:: cpp
The Expectation Maximization(EM) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
Consider the set of the N feature vectors
Consider the set of the N feature vectors
{ :math:`x_1, x_2,...,x_{N}` } from a d-dimensional Euclidean space drawn from a Gaussian mixture:
{ :math:`x_1, x_2,...,x_{N}` } from a d-dimensional Euclidean space drawn from a Gaussian mixture:
...
@@ -59,7 +61,7 @@ At the second step (Maximization step or M-step), the mixture parameter estimate
...
@@ -59,7 +61,7 @@ At the second step (Maximization step or M-step), the mixture parameter estimate
Alternatively, the algorithm may start with the M-step when the initial values for
Alternatively, the algorithm may start with the M-step when the initial values for
:math:`p_{i,k}` can be provided. Another alternative when
:math:`p_{i,k}` can be provided. Another alternative when
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` . Often (including ML) the
:math:`p_{i,k}` . Often (including macnine learning) the
:ref:`kmeans` algorithm is used for that purpose.
:ref:`kmeans` algorithm is used for that purpose.
One of the main problems of the EM algorithm is a large number
One of the main problems of the EM algorithm is a large number
...
@@ -83,22 +85,16 @@ already a good enough approximation).
...
@@ -83,22 +85,16 @@ already a good enough approximation).
*
*
Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
.. index:: CvEMParams
.. _CvEMParams:
CvEMParams
CvEMParams
----------
----------
.. c:type:: CvEMParams
.. ocv:class:: CvEMParams
Parameters of the EM algorithm.
Parameters of the EM algorithm.
All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
.. index:: CvEMParams::CvEMParams
.. _CvEMParams::CvEMParams:
CvEMParams::CvEMParams
CvEMParams::CvEMParams
----------------------
----------------------
...
@@ -144,26 +140,16 @@ The default constructor represents a rough rule-of-the-thumb:
...
@@ -144,26 +140,16 @@ The default constructor represents a rough rule-of-the-thumb:
With another contstructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
With another contstructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
.. index:: CvEM
.. _CvEM:
CvEM
CvEM
----
----
.. c:type:: CvEM
.. ocv:class:: CvEM
The EM model.
The class implements the EM algorithm as described in the beginning of this section.
.. index:: CvEM::train
The class implements the EM algorithm as described in the beginning of this section.
:param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
:param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
Estimates the Gaussian mixture parameters from a sample set.
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,
:math:`p_{i,k}` in ``probs``,
:math:`a_k` in ``means`` ,
:math:`a_k` in ``means`` ,
:math:`S_k` in ``covs[k]``,
:math:`S_k` in ``covs[k]``,
:math:`\pi_k` in ``weights`` ,
:math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture for each sample).
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
:ref:`Bayes classifier`.
:ref:`Bayes classifier`.
For an example of clustering random samples of the multi-Gaussian distribution using EM, see ``em.cpp`` sample in the OpenCV distribution.
.. index:: CvEM::predict
.. _CvEM::predict:
CvEM::predict
CvEM::predict
-------------
-------------
...
@@ -205,10 +192,6 @@ CvEM::predict
...
@@ -205,10 +192,6 @@ CvEM::predict
:param probs: If it is not null then the method will write posterior probabilities of each component given the sample data to this parameter.
:param probs: If it is not null then the method will write posterior probabilities of each component given the sample data to this parameter.
.. index:: CvEM::getNClusters
.. _CvEM::getNClusters:
CvEM::getNClusters
CvEM::getNClusters
------------------
------------------
.. ocv:function:: int CvEM::getNClusters() const
.. ocv:function:: int CvEM::getNClusters() const
...
@@ -218,10 +201,6 @@ CvEM::getNClusters
...
@@ -218,10 +201,6 @@ CvEM::getNClusters
Returns the number of mixture components :math:`M` in the gaussian mixture model.
Returns the number of mixture components :math:`M` in the gaussian mixture model.