The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (
The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (
**K**
**K**
) of the nearest neighbors of the sample (using voting, calculating weighted sum, and so on). The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
) of the nearest neighbors of the sample using voting, calculating weighted sum, and so on. The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
.. index:: CvKNearest
.. index:: CvKNearest
...
@@ -11,9 +13,9 @@ The algorithm caches all training samples and predicts the response for a new sa
...
@@ -11,9 +13,9 @@ The algorithm caches all training samples and predicts the response for a new sa
CvKNearest
CvKNearest
----------
----------
.. c:type:: CvKNearest
.. ocv:class:: CvKNearest
K-Nearest Neighbors model ::
K-Nearest Neighbors model. ::
class CvKNearest : public CvStatModel
class CvKNearest : public CvStatModel
{
{
...
@@ -53,7 +55,8 @@ CvKNearest::train
...
@@ -53,7 +55,8 @@ CvKNearest::train
Trains the model.
Trains the model.
The method trains the K-Nearest model. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the K-Nearest model. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Input variables are all ordered.
* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
...
@@ -87,7 +90,7 @@ For each input vector, the neighbors are sorted by their distances to the vector
...
@@ -87,7 +90,7 @@ For each input vector, the neighbors are sorted by their distances to the vector
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.
The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification ::
The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification: ::
ML implements feed-forward artificial neural networks, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons that are directionally linked with the neurons from the previous and the next layer. The example below represents a 3-layer perceptron with three inputs, two outputs, and the hidden layer including five neurons:
.. highlight:: cpp
ML implements feed-forward artificial neural networks or, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons directionally linked with the neurons from the previous and the next layer. The example below represents a 3-layer perceptron with three inputs, two outputs, and the hidden layer including five neurons:
.. image:: pics/mlp.png
.. image:: pics/mlp.png
...
@@ -45,10 +47,13 @@ In ML, all the neurons have the same activation functions, with the same free pa
...
@@ -45,10 +47,13 @@ In ML, all the neurons have the same activation functions, with the same free pa
So, the whole trained network works as follows:
So, the whole trained network works as follows:
#. It takes the feature vector as input. The vector size is equal to the size of the input layer.
#. Take the feature vector as input. The vector size is equal to the size of the input layer.
#. Values are passed as input to the first hidden layer.
#. Outputs of the hidden layer are computed using the weights and the activation functions.
#. Pass values as input to the first hidden layer.
#. Outputs are passed further downstream until you compute the output layer.
#. Compute outputs of the hidden layer using the weights and the activation functions.
#. Pass outputs further downstream until you compute the output layer.
So, to compute the network, you need to know all the
So, to compute the network, you need to know all the
weights
weights
...
@@ -66,10 +71,10 @@ so the error on the test set usually starts increasing after the network
...
@@ -66,10 +71,10 @@ so the error on the test set usually starts increasing after the network
size reaches a limit. Besides, the larger networks are trained much
size reaches a limit. Besides, the larger networks are trained much
longer than the smaller ones, so it is reasonable to pre-process the data,
longer than the smaller ones, so it is reasonable to pre-process the data,
using
using
:ref:`PCA::operator ()` or similar technique, and train a smaller network
:ocv:func:`PCA::operator ()` or similar technique, and train a smaller network
on only essential features.
on only essential features.
Another feature of MLP's is their inability to handle categorical
Another MPL feature is an inability to handle categorical
data as is. However, there is a workaround. If a certain feature in the
data as is. However, there is a workaround. If a certain feature in the
input or output (in case of ``n`` -class classifier for
input or output (in case of ``n`` -class classifier for
:math:`n>2` ) layer is categorical and can take
:math:`n>2` ) layer is categorical and can take
...
@@ -101,9 +106,9 @@ References:
...
@@ -101,9 +106,9 @@ References:
CvANN_MLP_TrainParams
CvANN_MLP_TrainParams
---------------------
---------------------
.. c:type:: CvANN_MLP_TrainParams
.. ocv:class:: CvANN_MLP_TrainParams
Parameters of the MLP training algorithm ::
Parameters of the MLP training algorithm. ::
struct CvANN_MLP_TrainParams
struct CvANN_MLP_TrainParams
{
{
...
@@ -134,9 +139,9 @@ The structure has a default constructor that initializes parameters for the ``RP
...
@@ -134,9 +139,9 @@ The structure has a default constructor that initializes parameters for the ``RP
CvANN_MLP
CvANN_MLP
---------
---------
.. c:type:: CvANN_MLP
.. ocv:class:: CvANN_MLP
MLP model ::
MLP model. ::
class CvANN_MLP : public CvStatModel
class CvANN_MLP : public CvStatModel
{
{
...
@@ -259,9 +264,9 @@ CvANN_MLP::train
...
@@ -259,9 +264,9 @@ CvANN_MLP::train
:param _flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
:param _flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
* **UPDATE_WEIGHTS = 1** Algorithm updates the network weights, rather than computes them from scratch (in the latter case the weights are initialized using the Nguyen-Widrow algorithm).
* **UPDATE_WEIGHTS = 1** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
* **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation =1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
* **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
* **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
* **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
.. highlight:: cpp
This simple classification model assumes that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
[Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
[Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
...
@@ -11,9 +13,9 @@ This is a simple classification model assuming that feature vectors from each cl
...
@@ -11,9 +13,9 @@ This is a simple classification model assuming that feature vectors from each cl
CvNormalBayesClassifier
CvNormalBayesClassifier
-----------------------
-----------------------
.. c:type:: CvNormalBayesClassifier
.. ocv:class:: CvNormalBayesClassifier
Bayes classifier for normally distributed data ::
Bayes classifier for normally distributed data. ::
class CvNormalBayesClassifier : public CvStatModel
class CvNormalBayesClassifier : public CvStatModel
{
{
...
@@ -50,7 +52,7 @@ CvNormalBayesClassifier::train
...
@@ -50,7 +52,7 @@ CvNormalBayesClassifier::train
Trains the model.
Trains the model.
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Only ``CV_ROW_SAMPLE`` data layout is supported.
Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique was extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", which means that the position of other vectors does not affect the hyper-plane (the decision function).
The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called *support vectors*, which means that the position of other vectors does not affect the hyper-plane (the decision function).
There are a lot of good references on SVM. You may consider starting with the following:
There are a lot of good references on SVM. You may consider starting with the following:
...
@@ -27,9 +27,9 @@ There are a lot of good references on SVM. You may consider starting with the fo
...
@@ -27,9 +27,9 @@ There are a lot of good references on SVM. You may consider starting with the fo
CvSVM
CvSVM
-----
-----
.. c:type:: CvSVM
.. ocv:class:: CvSVM
Support Vector Machines ::
Support Vector Machines. ::
class CvSVM : public CvStatModel
class CvSVM : public CvStatModel
{
{
...
@@ -90,9 +90,9 @@ Support Vector Machines ::
...
@@ -90,9 +90,9 @@ Support Vector Machines ::
CvSVMParams
CvSVMParams
-----------
-----------
.. c:type:: CvSVMParams
.. ocv:class:: CvSVMParams
SVM training parameters ::
SVM training parameters. ::
struct CvSVMParams
struct CvSVMParams
{
{
...
@@ -117,7 +117,7 @@ SVM training parameters ::
...
@@ -117,7 +117,7 @@ SVM training parameters ::
The structure must be initialized and passed to the training method of
The structure must be initialized and passed to the training method of
The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the SVM model. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only the ``CV_ROW_SAMPLE`` data layout is supported.
* Only the ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Input variables are all ordered.
* Output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ).
* Output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ).
:param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to train the model, the others form the test set. So, the SVM algorithm is executed ``k_fold`` times.
:param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to train the model, the others form the test set. So, the SVM algorithm is executed ``k_fold`` times.
The method trains the SVM model automatically by choosing the optimal
The method trains the SVM model automatically by choosing the optimal
:ref:`CvSVMParams`. Parameters are considered optimal
:ocv:class:`CvSVMParams`. Parameters are considered optimal
when the cross-validation estimate of the test set error
when the cross-validation estimate of the test set error
is minimal. The parameters are iterated by a logarithmic grid, for
is minimal. The parameters are iterated by a logarithmic grid, for
example, the parameter ``gamma`` takes the values in the set
example, the parameter ``gamma`` takes values in the set
(
(
:math:`min`,
:math:`min`,
:math:`min*step`,
:math:`min*step`,
...
@@ -165,7 +168,7 @@ example, the parameter ``gamma`` takes the values in the set
...
@@ -165,7 +168,7 @@ example, the parameter ``gamma`` takes the values in the set
where
where
:math:`min` is ``gamma_grid.min_val`` ,
:math:`min` is ``gamma_grid.min_val`` ,
:math:`step` is ``gamma_grid.step`` , and
:math:`step` is ``gamma_grid.step`` , and
:math:`n` is the maximal index such that
:math:`n` is the maximal index where
.. math::
.. math::
...
@@ -173,12 +176,12 @@ where
...
@@ -173,12 +176,12 @@ where
So ``step`` must always be greater than 1.
So ``step`` must always be greater than 1.
If there is no need to optimize a parameter, the corresponding grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
And, finally, if the optimization in a parameter is required but
And, finally, if the optimization in a parameter is required but
the corresponding grid is unknown, you may call the function ``CvSVM::get_default_grid`` . To generate a grid, for example, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
the corresponding grid is unknown, you may call the function ``CvSVM::get_default_grid`` . To generate a grid, for example, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
This function works for the case of classification
This function works for the classification
( ``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC`` )
( ``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC`` )
as well as for the regression
as well as for the regression
( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
...
@@ -207,7 +210,7 @@ CvSVM::get_default_grid
...
@@ -207,7 +210,7 @@ CvSVM::get_default_grid
* **CvSVM::DEGREE**
* **CvSVM::DEGREE**
The grid will be generated for the parameter with this ID.
The grid is generated for the parameter with this ID.
The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function ``CvSVM::train_auto`` .
The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function ``CvSVM::train_auto`` .