Commit 501033db authored by Vadim Pisarevsky's avatar Vadim Pisarevsky

integrated grammar fixes from tech writer (part I)

parent 84e4f597
This diff is collapsed.
K Nearest Neighbors
K-Nearest Neighbors
===================
.. highlight:: cpp
The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (
**K**
) of the nearest neighbors of the sample (using voting, calculating weighted sum, and so on). The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
) of the nearest neighbors of the sample using voting, calculating weighted sum, and so on. The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
.. index:: CvKNearest
......@@ -11,9 +13,9 @@ The algorithm caches all training samples and predicts the response for a new sa
CvKNearest
----------
.. c:type:: CvKNearest
.. ocv:class:: CvKNearest
K-Nearest Neighbors model ::
K-Nearest Neighbors model. ::
class CvKNearest : public CvStatModel
{
......@@ -53,7 +55,8 @@ CvKNearest::train
Trains the model.
The method trains the K-Nearest model. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the K-Nearest model. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
......@@ -87,7 +90,7 @@ For each input vector, the neighbors are sorted by their distances to the vector
If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method.
The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification ::
The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification: ::
#include "ml.h"
#include "highgui.h"
......
This diff is collapsed.
Neural Networks
===============
ML implements feed-forward artificial neural networks, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons that are directionally linked with the neurons from the previous and the next layer. The example below represents a 3-layer perceptron with three inputs, two outputs, and the hidden layer including five neurons:
.. highlight:: cpp
ML implements feed-forward artificial neural networks or, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons directionally linked with the neurons from the previous and the next layer. The example below represents a 3-layer perceptron with three inputs, two outputs, and the hidden layer including five neurons:
.. image:: pics/mlp.png
......@@ -45,10 +47,13 @@ In ML, all the neurons have the same activation functions, with the same free pa
So, the whole trained network works as follows:
#. It takes the feature vector as input. The vector size is equal to the size of the input layer.
#. Values are passed as input to the first hidden layer.
#. Outputs of the hidden layer are computed using the weights and the activation functions.
#. Outputs are passed further downstream until you compute the output layer.
#. Take the feature vector as input. The vector size is equal to the size of the input layer.
#. Pass values as input to the first hidden layer.
#. Compute outputs of the hidden layer using the weights and the activation functions.
#. Pass outputs further downstream until you compute the output layer.
So, to compute the network, you need to know all the
weights
......@@ -66,10 +71,10 @@ so the error on the test set usually starts increasing after the network
size reaches a limit. Besides, the larger networks are trained much
longer than the smaller ones, so it is reasonable to pre-process the data,
using
:ref:`PCA::operator ()` or similar technique, and train a smaller network
:ocv:func:`PCA::operator ()` or similar technique, and train a smaller network
on only essential features.
Another feature of MLP's is their inability to handle categorical
Another MPL feature is an inability to handle categorical
data as is. However, there is a workaround. If a certain feature in the
input or output (in case of ``n`` -class classifier for
:math:`n>2` ) layer is categorical and can take
......@@ -101,9 +106,9 @@ References:
CvANN_MLP_TrainParams
---------------------
.. c:type:: CvANN_MLP_TrainParams
.. ocv:class:: CvANN_MLP_TrainParams
Parameters of the MLP training algorithm ::
Parameters of the MLP training algorithm. ::
struct CvANN_MLP_TrainParams
{
......@@ -134,9 +139,9 @@ The structure has a default constructor that initializes parameters for the ``RP
CvANN_MLP
---------
.. c:type:: CvANN_MLP
.. ocv:class:: CvANN_MLP
MLP model ::
MLP model. ::
class CvANN_MLP : public CvStatModel
{
......@@ -259,9 +264,9 @@ CvANN_MLP::train
:param _flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
* **UPDATE_WEIGHTS = 1** Algorithm updates the network weights, rather than computes them from scratch (in the latter case the weights are initialized using the Nguyen-Widrow algorithm).
* **UPDATE_WEIGHTS = 1** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
* **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation =1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
* **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
* **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
......
......@@ -3,7 +3,9 @@
Normal Bayes Classifier
=======================
This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
.. highlight:: cpp
This simple classification model assumes that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
[Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
......@@ -11,9 +13,9 @@ This is a simple classification model assuming that feature vectors from each cl
CvNormalBayesClassifier
-----------------------
.. c:type:: CvNormalBayesClassifier
.. ocv:class:: CvNormalBayesClassifier
Bayes classifier for normally distributed data ::
Bayes classifier for normally distributed data. ::
class CvNormalBayesClassifier : public CvStatModel
{
......@@ -50,7 +52,7 @@ CvNormalBayesClassifier::train
Trains the model.
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
......
......@@ -3,9 +3,9 @@ Support Vector Machines
.. highlight:: cpp
Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique was extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", which means that the position of other vectors does not affect the hyper-plane (the decision function).
The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called *support vectors*, which means that the position of other vectors does not affect the hyper-plane (the decision function).
There are a lot of good references on SVM. You may consider starting with the following:
......@@ -27,9 +27,9 @@ There are a lot of good references on SVM. You may consider starting with the fo
CvSVM
-----
.. c:type:: CvSVM
.. ocv:class:: CvSVM
Support Vector Machines ::
Support Vector Machines. ::
class CvSVM : public CvStatModel
{
......@@ -90,9 +90,9 @@ Support Vector Machines ::
CvSVMParams
-----------
.. c:type:: CvSVMParams
.. ocv:class:: CvSVMParams
SVM training parameters ::
SVM training parameters. ::
struct CvSVMParams
{
......@@ -117,7 +117,7 @@ SVM training parameters ::
The structure must be initialized and passed to the training method of
:ref:`CvSVM` .
:ocv:class:`CvSVM` .
.. index:: CvSVM::train
......@@ -127,17 +127,20 @@ CvSVM::train
------------
.. ocv:function:: bool CvSVM::train( const Mat& _train_data, const Mat& _responses, const Mat& _var_idx=Mat(), const Mat& _sample_idx=Mat(), CvSVMParams _params=CvSVMParams() )
Trains SVM.
Trains an SVM.
The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations:
The method trains the SVM model. It follows the conventions of the generic ``train`` approach with the following limitations:
* Only the ``CV_ROW_SAMPLE`` data layout is supported.
* Input variables are all ordered.
* Output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ).
* Missing measurements are not supported.
All the other parameters are gathered in the
:ref:`CvSVMParams` structure.
:ocv:class:`CvSVMParams` structure.
.. index:: CvSVM::train_auto
......@@ -147,16 +150,16 @@ CvSVM::train_auto
-----------------
.. ocv:function:: train_auto( const Mat& _train_data, const Mat& _responses, const Mat& _var_idx, const Mat& _sample_idx, CvSVMParams params, int k_fold = 10, CvParamGrid C_grid = get_default_grid(CvSVM::C), CvParamGrid gamma_grid = get_default_grid(CvSVM::GAMMA), CvParamGrid p_grid = get_default_grid(CvSVM::P), CvParamGrid nu_grid = get_default_grid(CvSVM::NU), CvParamGrid coef_grid = get_default_grid(CvSVM::COEF), CvParamGrid degree_grid = get_default_grid(CvSVM::DEGREE) )
Trains SVM with optimal parameters.
Trains an SVM with optimal parameters.
:param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to train the model, the others form the test set. So, the SVM algorithm is executed ``k_fold`` times.
The method trains the SVM model automatically by choosing the optimal
parameters ``C`` , ``gamma`` , ``p`` , ``nu`` , ``coef0`` , ``degree`` from
:ref:`CvSVMParams`. Parameters are considered optimal
:ocv:class:`CvSVMParams`. Parameters are considered optimal
when the cross-validation estimate of the test set error
is minimal. The parameters are iterated by a logarithmic grid, for
example, the parameter ``gamma`` takes the values in the set
example, the parameter ``gamma`` takes values in the set
(
:math:`min`,
:math:`min*step`,
......@@ -165,7 +168,7 @@ example, the parameter ``gamma`` takes the values in the set
where
:math:`min` is ``gamma_grid.min_val`` ,
:math:`step` is ``gamma_grid.step`` , and
:math:`n` is the maximal index such that
:math:`n` is the maximal index where
.. math::
......@@ -173,12 +176,12 @@ where
So ``step`` must always be greater than 1.
If there is no need to optimize a parameter, the corresponding grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
And, finally, if the optimization in a parameter is required but
the corresponding grid is unknown, you may call the function ``CvSVM::get_default_grid`` . To generate a grid, for example, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
This function works for the case of classification
This function works for the classification
( ``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC`` )
as well as for the regression
( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
......@@ -207,7 +210,7 @@ CvSVM::get_default_grid
* **CvSVM::DEGREE**
The grid will be generated for the parameter with this ID.
The grid is generated for the parameter with this ID.
The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function ``CvSVM::train_auto`` .
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment