Commit 718f56e6 authored by Elena Fedotova's avatar Elena Fedotova

Purpose: completed the ml chapter

parent 8acce4e3
This diff is collapsed.
This diff is collapsed.
Expectation-Maximization
Expectation Maximization
========================
The EM (Expectation-Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
The EM (Expectation Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
Consider the set of the feature vectors
:math:`x_1, x_2,...,x_{N}` : N vectors from a d-dimensional Euclidean space drawn from a Gaussian mixture:
Consider the set of the
:math:`x_1, x_2,...,x_{N}` : N feature vectors?? from a d-dimensional Euclidean space drawn from a Gaussian mixture:
.. math::
......@@ -19,12 +19,15 @@ where
:math:`p_k` is the normal distribution
density with the mean
:math:`a_k` and covariance matrix
:math:`S_k`,:math:`\pi_k` is the weight of the k-th mixture. Given the number of mixtures
:math:`S_k`,
:math:`\pi_k` is the weight of the k-th mixture. Given the number of mixtures
:math:`M` and the samples
:math:`x_i`,:math:`i=1..N` the algorithm finds the
maximum-likelihood estimates (MLE) of the all the mixture parameters,
i.e.
:math:`a_k`,:math:`S_k` and
:math:`x_i`,
:math:`i=1..N` the algorithm finds the
maximum-likelihood estimates (MLE) of all the mixture parameters,
that is,
:math:`a_k`,
:math:`S_k` and
:math:`\pi_k` :
.. math::
......@@ -35,8 +38,8 @@ i.e.
\Theta = \left \{ (a_k,S_k, \pi _k): a_k \in \mathbbm{R} ^d,S_k=S_k^T>0,S_k \in \mathbbm{R} ^{d \times d}, \pi _k \geq 0, \sum _{k=1}^{m} \pi _k=1 \right \} .
EM algorithm is an iterative procedure. Each iteration of it includes
two steps. At the first step (Expectation-step, or E-step), we find a
The EM algorithm is an iterative procedure. Each iteration includes
two steps. At the first step (Expectation step or E-step), you find a
probability
:math:`p_{i,k}` (denoted
:math:`\alpha_{i,k}` in the formula below) of
......@@ -47,30 +50,30 @@ available mixture parameter estimates:
\alpha _{ki} = \frac{\pi_k\varphi(x;a_k,S_k)}{\sum\limits_{j=1}^{m}\pi_j\varphi(x;a_j,S_j)} .
At the second step (Maximization-step, or M-step) the mixture parameter estimates are refined using the computed probabilities:
At the second step (Maximization step or M-step), the mixture parameter estimates are refined using the computed probabilities:
.. math::
\pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki}, \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} , \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}} ,
\pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki}, \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} , \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}}
Alternatively, the algorithm may start with the M-step when the initial values for
:math:`p_{i,k}` can be provided. Another alternative when
:math:`p_{i,k}` are unknown, is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` . Often (and in ML) the
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` . Often (including ML) the
:ref:`kmeans` algorithm is used for that purpose.
One of the main that EM algorithm should deal with is the large number
of parameters to estimate. The majority of the parameters sits in
One of the main problems?? the EM algorithm should deal with is a large number
of parameters to estimate. The majority of the parameters reside in
covariance matrices, which are
:math:`d \times d` elements each
(where
:math:`d` is the feature space dimensionality). However, in
many practical problems the covariance matrices are close to diagonal,
where
:math:`d` is the feature space dimensionality. However, in
many practical problems, the covariance matrices are close to diagonal
or even to
:math:`\mu_k*I` , where
:math:`I` is identity matrix and
:math:`\mu_k` is mixture-dependent "scale" parameter. So a robust computation
scheme could be to start with the harder constraints on the covariance
:math:`I` is an identity matrix and
:math:`\mu_k` is a mixture-dependent "scale" parameter. So, a robust computation
scheme could start with harder constraints on the covariance
matrices and then use the estimated parameters as an input for a less
constrained optimization problem (often a diagonal covariance matrix is
already a good enough approximation).
......@@ -78,7 +81,7 @@ already a good enough approximation).
**References:**
*
Bilmes98 J. A. Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
.. index:: CvEMParams
......@@ -88,7 +91,7 @@ CvEMParams
----------
.. c:type:: CvEMParams
Parameters of the EM algorithm. ::
Parameters of the EM algorithm ::
struct CvEMParams
{
......@@ -124,7 +127,7 @@ Parameters of the EM algorithm. ::
};
The structure has 2 constructors, the default one represents a rough rule-of-thumb, with another one it is possible to override a variety of parameters, from a single number of mixtures (the only essential problem-dependent parameter), to the initial values for the mixture parameters.
The structure has two constructors. The default one represents a rough rule-of-the-thumb. With another one it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
.. index:: CvEM
......@@ -134,7 +137,7 @@ CvEM
----
.. c:type:: CvEM
EM model. ::
EM model ::
class CV_EXPORTS CvEM : public CvStatModel
{
......@@ -142,7 +145,7 @@ EM model. ::
// Type of covariance matrices
enum { COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2 };
// The initial step
// Initial step
enum { START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0 };
CvEM();
......@@ -194,14 +197,17 @@ CvEM::train
-----------
.. c:function:: void CvEM::train( const CvMat* samples, const CvMat* sample_idx=0, CvEMParams params=CvEMParams(), CvMat* labels=0 )
Estimates the Gaussian mixture parameters from the sample set.
Estimates the Gaussian mixture parameters from a sample set.
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or the function values) on input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from the input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,:math:`a_k` in ``means`` :math:`S_k` in ``covs[k]``,:math:`\pi_k` in ``weights`` and optionally computes the output "class label" for each sample:
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (i.e. indices of the most-probable mixture for each sample).
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,
:math:`a_k` in ``means`` ,
:math:`S_k` in ``covs[k]``,
:math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture for each sample).
The trained model can be used further for prediction, just like any other classifier. The model trained is similar to the
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
:ref:`Bayes classifier`.
Example: Clustering random samples of multi-Gaussian distribution using EM ::
......@@ -244,7 +250,7 @@ Example: Clustering random samples of multi-Gaussian distribution using EM ::
}
cvReshape( samples, samples, 1, 0 );
// initialize model's parameters
// initialize model parameters
params.covs = NULL;
params.means = NULL;
params.weights = NULL;
......@@ -263,7 +269,7 @@ Example: Clustering random samples of multi-Gaussian distribution using EM ::
// the piece of code shows how to repeatedly optimize the model
// with less-constrained parameters
//(COV_MAT_DIAGONAL instead of COV_MAT_SPHERICAL)
// when the output of the first stage is used as input for the second.
// when the output of the first stage is used as input for the second one.
CvEM em_model2;
params.cov_mat_type = CvEM::COV_MAT_DIAGONAL;
params.start_step = CvEM::START_E_STEP;
......
K Nearest Neighbors
===================
The algorithm caches all of the training samples, and predicts the response for a new sample by analyzing a certain number (
The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (
**K**
) of the nearest neighbors of the sample (using voting, calculating weighted sum etc.) The method is sometimes referred to as "learning by example", because for prediction it looks for the feature vector with a known response that is closest to the given vector.
) of the nearest neighbors of the sample (using voting, calculating weighted sum, and so on). The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
.. index:: CvKNearest
......@@ -13,7 +13,7 @@ CvKNearest
----------
.. c:type:: CvKNearest
K Nearest Neighbors model. ::
K-Nearest Neighbors model ::
class CvKNearest : public CvStatModel
{
......@@ -53,12 +53,16 @@ CvKNearest::train
Trains the model.
The method trains the K-Nearest model. It follows the conventions of generic ``train`` "method" with the following limitations: only CV_ROW_SAMPLE data layout is supported, the input variables are all ordered, the output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ), variable subsets ( ``var_idx`` ) and missing measurements are not supported.
The method trains the K-Nearest model. It follows the conventions of the generic ``train`` "method" with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
* Variable subsets ( ``var_idx`` ) and missing measurements are not supported.
The parameter ``_max_k`` specifies the number of maximum neighbors that may be passed to the method ``find_nearest`` .
The parameter ``_update_base`` specifies whether the model is trained from scratch
( ``_update_base=false`` ), or it is updated using the new training data ( ``_update_base=true`` ). In the latter case the parameter ``_max_k`` must not be larger than the original value.
( ``_update_base=false`` ), or it is updated using the new training data ( ``_update_base=true`` ). In the latter case, the parameter ``_max_k`` must not be larger than the original value.
.. index:: CvKNearest::find_nearest
......@@ -68,18 +72,18 @@ CvKNearest::find_nearest
------------------------
.. c:function:: float CvKNearest::find_nearest( const CvMat* _samples, int k, CvMat* results=0, const float** neighbors=0, CvMat* neighbor_responses=0, CvMat* dist=0 ) const
Finds the neighbors for the input vectors.
Finds the neighbors for input vectors.
For each input vector (which are the rows of the matrix ``_samples`` ) the method finds the
For each input vector (a row of the matrix ``_samples`` ), the method finds the
:math:`\texttt{k} \le
\texttt{get\_max\_k()}` nearest neighbor. In the case of regression,
the predicted result will be a mean value of the particular vector's
neighbor responses. In the case of classification the class is determined
\texttt{get\_max\_k()}` nearest neighbor. In case of regression,
the predicted result is a mean value of the particular vector's
neighbor responses. In case of classification, the class is determined
by voting.
For custom classification/regression prediction, the method can optionally return pointers to the neighbor vectors themselves ( ``neighbors`` , an array of ``k*_samples->rows`` pointers), their corresponding output values ( ``neighbor_responses`` , a vector of ``k*_samples->rows`` elements) and the distances from the input vectors to the neighbors ( ``dist`` , also a vector of ``k*_samples->rows`` elements).
For a custom classification/regression prediction, the method can optionally return pointers to the neighbor vectors themselves ( ``neighbors`` , an array of ``k*_samples->rows`` pointers), their corresponding output values ( ``neighbor_responses`` , a vector of ``k*_samples->rows`` elements), and the distances from the input vectors to the neighbors ( ``dist`` , also a vector of ``k*_samples->rows`` elements).
For each input vector the neighbors are sorted by their distances to the vector.
For each input vector, the neighbors are sorted by their distances to the vector.
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method. ::
......@@ -126,7 +130,7 @@ If only a single input vector is passed, all output matrices are optional and th
sample.data.fl[0] = (float)j;
sample.data.fl[1] = (float)i;
// estimates the response and get the neighbors' labels
// estimate the response and get the neighbors' labels
response = knn.find_nearest(&sample,K,0,0,nearests,0);
// compute the number of neighbors representing the majority
......
......@@ -2,9 +2,9 @@
ml. Machine Learning
********************
The Machine Learning Library (MLL) is a set of classes and functions for statistical classification, regression and clustering of data.
The Machine Learning Library (MLL) is a set of classes and functions for statistical classification, regression, and clustering of data.
Most of the classification and regression algorithms are implemented as C++ classes. As the algorithms have different seta of features (like the ability to handle missing measurements, or categorical input variables etc.), there is a little common ground between the classes. This common ground is defined by the class `CvStatModel` that all the other ML classes are derived from.
Most of the classification and regression algorithms are implemented as C++ classes. As the algorithms have different sets of features (like an ability to handle missing measurements or categorical input variables), there is a little common ground between the classes. This common ground is defined by the class `CvStatModel` that all the other ML classes are derived from.
.. toctree::
:maxdepth: 2
......
This diff is collapsed.
......@@ -3,9 +3,9 @@
Normal Bayes Classifier
=======================
This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed), so the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
**[Fukunaga90] K. Fukunaga. Introduction to Statistical Pattern Recognition. second ed., New York: Academic Press, 1990.**
[Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
.. index:: CvNormalBayesClassifier
......@@ -13,7 +13,7 @@ CvNormalBayesClassifier
-----------------------
.. c:type:: CvNormalBayesClassifier
Bayes classifier for normally distributed data. ::
Bayes classifier for normally distributed data ::
class CvNormalBayesClassifier : public CvStatModel
{
......@@ -50,7 +50,12 @@ CvNormalBayesClassifier::train
Trains the model.
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations: only CV_ROW_SAMPLE data layout is supported; the input variables are all ordered; the output variable is categorical (i.e. elements of ``_responses`` must be integer numbers, though the vector may have ``CV_32FC1`` type), and missing measurements are not supported.
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Output variable is categorical , which means that elements of ``_responses`` must be integer numbers, though the vector may have the ``CV_32FC1`` type.
* Missing measurements are not supported.
In addition, there is an ``update`` flag that identifies whether the model should be trained from scratch ( ``update=false`` ) or should be updated using the new training data ( ``update=true`` ).
......@@ -62,7 +67,7 @@ CvNormalBayesClassifier::predict
--------------------------------
.. c:function:: float CvNormalBayesClassifier::predict( const CvMat* samples, CvMat* results=0 ) const
Predicts the response for sample(s)
Predicts the response for sample(s).
The method ``predict`` estimates the most probable classes for the input vectors. The input vectors (one or more) are stored as rows of the matrix ``samples`` . In the case of multiple input vectors, there should be one output vector ``results`` . The predicted class for a single input vector is returned by the method.
The method ``predict`` estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples`` . In case of multiple input vectors, there should be one output vector ``results`` . The predicted class for a single input vector is returned by the method.
......@@ -6,24 +6,24 @@ Random Trees
Random trees have been introduced by Leo Breiman and Adele Cutler:
http://www.stat.berkeley.edu/users/breiman/RandomForests/
. The algorithm can deal with both classification and regression problems. Random trees is a collection (ensemble) of tree predictors that is called
**forest**
further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In the case of regression the classifier response is the average of the responses over all the trees in the forest.
*forest*
further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In case of regression, the classifier response is the average of the responses over all the trees in the forest.
All the trees are trained with the same parameters, but on the different training sets, which are generated from the original training set using the bootstrap procedure: for each training set we randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each tree trained not all the variables are used to find the best split, rather than a random subset of them. With each node a new subset is generated, however its size is fixed for all the nodes and all the trees. It is a training parameter, set to
:math:`\sqrt{number\_of\_variables}` by default. None of the trees that are built are pruned.
All the trees are trained with the same parameters but on different training sets that are generated from the original training set using the bootstrap procedure: for each training set, you randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each trained tree, not all the variables are used to find the best split, rather than a random subset of them. With each node a new subset is generated. However, its size is fixed for all the nodes and all the trees. It is a training parameter set to
:math:`\sqrt{number\_of\_variables}` by default. None of the built trees are pruned.
In random trees there is no need for any accuracy estimation procedures, such as cross-validation or bootstrap, or a separate test set to get an estimate of the training error. The error is estimated internally during the training. When the training set for the current tree is drawn by sampling with replacement, some vectors are left out (so-called
*oob (out-of-bag) data*
). The size of oob data is about ``N/3`` . The classification error is estimated by using this oob-data as following:
). The size of oob data is about ``N/3`` . The classification error is estimated by using this oob-data as follows:
*
Get a prediction for each vector, which is oob relatively to the i-th tree, using the very i-th tree.
#.
Get a prediction for each vector, which is oob relative to the i-th tree, using the very i-th tree.
*
After all the trees have been trained, for each vector that has ever been oob, find the class-"winner" for it (i.e. the class that has got the majority of votes in the trees, where the vector was oob) and compare it to the ground-truth response.
#.
After all the trees have been trained, for each vector that has ever been oob, find the class-"winner" for it (the class that has got the majority of votes in the trees where the vector was oob) and compare it to the ground-truth response.
*
Then the classification error estimate is computed as ratio of number of misclassified oob vectors to all the vectors in the original data. In the case of regression the oob-error is computed as the squared error for oob vectors difference divided by the total number of vectors.
#.
Compute the classification error estimate as ratio of the number of misclassified oob vectors to all the vectors in the original data. In case of regression, the oob-error is computed as the squared error for oob vectors difference divided by the total number of vectors.
**References:**
......@@ -55,7 +55,7 @@ CvRTParams
----------
.. c:type:: CvRTParams
Training Parameters of Random Trees. ::
Training parameters of random trees ::
struct CvRTParams : public CvDTreeParams
{
......@@ -78,7 +78,7 @@ Training Parameters of Random Trees. ::
};
The set of training parameters for the forest is the superset of the training parameters for a single tree. However, Random trees do not need all the functionality/features of decision trees, most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
The set of training parameters for the forest is a superset of the training parameters for a single tree. However, random trees do not need all the functionality/features of decision trees. Most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
.. index:: CvRTrees
......@@ -88,7 +88,7 @@ CvRTrees
--------
.. c:type:: CvRTrees
Random Trees. ::
Random trees ::
class CvRTrees : public CvStatModel
{
......@@ -138,9 +138,9 @@ CvRTrees::train
---------------
.. c:function:: bool CvRTrees::train( const CvMat* train_data, int tflag, const CvMat* responses, const CvMat* comp_idx=0, const CvMat* sample_idx=0, const CvMat* var_type=0, const CvMat* missing_mask=0, CvRTParams params=CvRTParams() )
Trains the Random Trees model.
Trains the Random Tree model.
The method ``CvRTrees::train`` is very similar to the first form of ``CvDTree::train`` () and follows the generic method ``CvStatModel::train`` conventions. All of the specific to the algorithm training parameters are passed as a
The method ``CvRTrees::train`` is very similar to the first form of ``CvDTree::train`` () and follows the generic method ``CvStatModel::train`` conventions. All the parameters specific to the algorithm training are passed as a
:ref:`CvRTParams` instance. The estimate of the training error ( ``oob-error`` ) is stored in the protected class member ``oob_error`` .
.. index:: CvRTrees::predict
......@@ -151,9 +151,9 @@ CvRTrees::predict
-----------------
.. c:function:: double CvRTrees::predict( const CvMat* sample, const CvMat* missing=0 ) const
Predicts the output for the input sample.
Predicts the output for an input sample.
The input parameters of the prediction method are the same as in ``CvDTree::predict`` , but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
The input parameters of the prediction method are the same as in ``CvDTree::predict`` but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
.. index:: CvRTrees::get_var_importance
......@@ -165,7 +165,7 @@ CvRTrees::get_var_importance
Retrieves the variable importance array.
The method returns the variable importance vector, computed at the training stage when ``:ref:`CvRTParams`::calc_var_importance`` is set. If the training flag is not set, then the ``NULL`` pointer is returned. This is unlike decision trees, where variable importance can be computed anytime after the training.
The method returns the variable importance vector, computed at the training stage when ``:ref:`CvRTParams`::calc_var_importance`` is set. If the training flag is not set, the ``NULL`` pointer is returned. This differs from the decision trees where variable importance can be computed anytime after the training.
.. index:: CvRTrees::get_proximity
......@@ -177,9 +177,9 @@ CvRTrees::get_proximity
Retrieves the proximity measure between two training samples.
The method returns proximity measure between any two samples (the ratio of the those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees).
The method returns proximity measure between any two samples, which is the ratio of those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees.
Example: Prediction of mushroom goodness using random trees classifier ::
Example: Prediction of mushroom goodness using the random-tree classifier ::
#include <float.h>
#include <stdio.h>
......
This diff is collapsed.
......@@ -3,20 +3,20 @@ Support Vector Machines
.. highlight:: cpp
Originally, support vector machines (SVM) was a technique for building an optimal (in some sense) binary (2-class) classifier. Then the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods, it maps feature vectors into higher-dimensional space using some kernel function, and then it builds an optimal linear discriminating function in this space (or an optimal hyper-plane that fits into the training data, ...). in the case of SVM the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
The solution is optimal in a sense that the margin between the separating hyper-plane and the nearest feature vectors from the both classes (in the case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", meaning that the position of other vectors does not affect the hyper-plane (the decision function).
The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", which means that the position of other vectors does not affect the hyper-plane (the decision function).
There are a lot of good references on SVM. Here are only a few ones to start with.
There are a lot of good references on SVM. You may consider starting with the following:
*
**[Burges98] C. Burges. "A tutorial on support vector machines for pattern recognition", Knowledge Discovery and Data Mining 2(2), 1998.**
[Burges98] C. Burges. *A tutorial on support vector machines for pattern recognition*, Knowledge Discovery and Data Mining 2(2), 1998.
(available online at
http://citeseer.ist.psu.edu/burges98tutorial.html
).
*
**LIBSVM - A Library for Support Vector Machines. By Chih-Chung Chang and Chih-Jen Lin**
Chih-Chung Chang and Chih-Jen Lin. *LIBSVM - A Library for Support Vector Machines*
(
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
)
......@@ -29,7 +29,7 @@ CvSVM
-----
.. c:type:: CvSVM
Support Vector Machines. ::
Support Vector Machines ::
class CvSVM : public CvStatModel
{
......@@ -92,7 +92,7 @@ CvSVMParams
-----------
.. c:type:: CvSVMParams
SVM training parameters. ::
SVM training parameters ::
struct CvSVMParams
{
......@@ -129,9 +129,14 @@ CvSVM::train
Trains SVM.
The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations: only the CV_ROW_SAMPLE data layout is supported, the input variables are all ordered, the output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ), missing measurements are not supported.
The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations:
All the other parameters are gathered in
* Only the ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ).
* Missing measurements are not supported.
All the other parameters are gathered in the
:ref:`CvSVMParams` structure.
.. index:: CvSVM::train_auto
......@@ -144,20 +149,23 @@ CvSVM::train_auto
Trains SVM with optimal parameters.
:param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets, one subset being used to train the model, the others forming the test set. So, the SVM algorithm is executed ``k_fold`` times.
:param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to train the model, the others form the test set. So, the SVM algorithm is executed ``k_fold`` times.
The method trains the SVM model automatically by choosing the optimal
parameters ``C``,``gamma``,``p``,``nu``,``coef0``,``degree`` from
:ref:`CvSVMParams` . By optimal
one means that the cross-validation estimate of the test set error
parameters ``C`` , ``gamma`` , ``p`` , ``nu`` , ``coef0`` , ``degree`` from
:ref:`CvSVMParams`. Parameters are considered optimal
when the cross-validation estimate of the test set error
is minimal. The parameters are iterated by a logarithmic grid, for
example, the parameter ``gamma`` takes the values in the set
(
:math:`min`,:math:`min*step`,:math:`min*{step}^2` , ...
:math:`min`,
:math:`min*step`,
:math:`min*{step}^2` , ...
:math:`min*{step}^n` )
where
:math:`min` is ``gamma_grid.min_val``,:math:`step` is ``gamma_grid.step`` , and
:math:`n` is the maximal index such, that
:math:`min` is ``gamma_grid.min_val`` ,
:math:`step` is ``gamma_grid.step`` , and
:math:`n` is the maximal index such that
.. math::
......@@ -165,16 +173,15 @@ where
So ``step`` must always be greater than 1.
If there is no need in optimization in some parameter, the according grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` one should set ``gamma_grid.step = 0``,``gamma_grid.min_val``,``gamma_grid.max_val`` being arbitrary numbers. In this case, the value ``params.gamma`` will be taken for ``gamma`` .
If there is no need to optimize a parameter, the corresponding grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
And, finally, if the optimization in some parameter is required, but
there is no idea of the corresponding grid, one may call the function ``CvSVM::get_default_grid`` . In
order to generate a grid, say, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
And, finally, if the optimization in a parameter is required but
the corresponding grid is unknown, you may call the function ``CvSVM::get_default_grid`` . To generate a grid, for example, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
This function works for the case of classification
( ``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC`` )
as well as for the regression
( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with specified in ``params`` parameters is executed.
( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
.. index:: CvSVM::get_default_grid
......@@ -184,9 +191,9 @@ CvSVM::get_default_grid
-----------------------
.. c:function:: CvParamGrid CvSVM::get_default_grid( int param_id )
Generates a grid for the SVM parameters.
Generates a grid for SVM parameters.
:param param_id: Must be one of the following:
:param param_id: SVN parameters IDs that must be one of the following:
* **CvSVM::C**
......@@ -214,7 +221,7 @@ CvSVM::get_params
Returns the current SVM parameters.
This function may be used to get the optimal parameters that were obtained while automatically training ``CvSVM::train_auto`` .
This function may be used to get the optimal parameters obtained while automatically training ``CvSVM::train_auto`` .
.. index:: CvSVM::get_support_vector*
......@@ -226,7 +233,7 @@ CvSVM::get_support_vector*
.. c:function:: const float* CvSVM::get_support_vector(int i) const
Retrieves the number of support vectors and the particular vector.
Retrieves a number of support vectors and the particular vector.
The methods can be used to retrieve the set of support vectors.
The methods can be used to retrieve a set of support vectors.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment