@@ -66,7 +66,7 @@ Alternatively, the algorithm may start with the M-step when the initial values f
:math:`p_{i,k}` can be provided. Another alternative when
:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
:math:`p_{i,k}` . Often (including machine learning) the
:ocv:func:`kmeans` algorithm is used for that purpose.
``k-means`` algorithm is used for that purpose.
One of the main problems of the EM algorithm is a large number
of parameters to estimate. The majority of the parameters reside in
...
...
@@ -99,15 +99,17 @@ EM::Params
----------
.. ocv:class:: EM::Params
The class describes EM training parameters. It includes:
The class describes EM training parameters.
.. ocv:member:: int clusters
The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
EM::Params::Params
------------------
The constructor
.. ocv:member:: int covMatType
.. ocv:function:: EM::Params::Params( int nclusters=DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL,const TermCriteria& termCrit=TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, EM::DEFAULT_MAX_ITERS, 1e-6))
Constraint on covariance matrices which defines type of matrices. Possible values are:
:param nclusters: The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
:param covMatType: Constraint on covariance matrices which defines type of matrices. Possible values are:
* **EM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``covMatType=EM::COV_MAT_DIAGONAL``.
...
...
@@ -115,9 +117,7 @@ The class describes EM training parameters. It includes:
* **EM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
.. ocv:member:: TermCriteria termCrit
The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
:param termCrit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
@@ -22,7 +22,7 @@ In machine learning algorithms there is notion of training data. Training data i
As you can see, training data can have rather complex structure; besides, it may be very big and/or not entirely available, so there is need to make abstraction for this concept. In OpenCV ml there is ``cv::ml::TrainData`` class for that.
TrainData
--------
---------
.. ocv:class:: TrainData
Class encapsulating training data. Please note that the class only specifies the interface of training data, but not implementation. All the statistical model classes in ml take Ptr<TrainData>. In other words, you can create your own class derived from ``TrainData`` and supply smart pointer to the instance of this class into ``StatModel::train``.
...
...
@@ -31,7 +31,7 @@ TrainData::loadFromCSV
----------------------
Reads the dataset from a .csv file and returns the ready-to-use training data.
.. ocv:function:: Ptr<TrainData> loadFromCSV(const String& filename, int headerLineCount, int responseStartIdx=-1, int responseEndIdx=-1, const String& varTypeSpec=String(), char delimiter=',', char missch='?');
.. ocv:function:: Ptr<TrainData> loadFromCSV(const String& filename, int headerLineCount, int responseStartIdx=-1, int responseEndIdx=-1, const String& varTypeSpec=String(), char delimiter=',', char missch='?')
:param termCrit: The termination criteria that specifies when the training algorithm stops - either when the specified number of trees is trained and added to the ensemble or when sufficient accuracy (measured as OOB error) is achieved. Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
The default constructor sets all parameters to default values which are different from default values of :ocv:class:`CvDTreeParams`:
The default constructor sets all parameters to default values which are different from default values of ``DTrees::Params``:
.. ocv:function:: Ptr<_Tp> StatModel::train(InputArray samples, int layout, InputArray responses, const _Tp::Params& p, int flags=0 )
CvStatModel::CvStatModel
------------------------
Thedefaultconstructor.
:param trainData: training data that can be loaded from file using ``TrainData::loadFromCSV`` or created with ``TrainData::create``.
:param samples: training samples
:param layout: ``ROW_SAMPLE`` (training samples are the matrix rows) or ``COL_SAMPLE`` (training samples are the matrix columns)
:param responses: vector of responses associated with the training samples.
:param p: the stat model parameters.
:param flags: optional flags, depending on the model. Some of the models can be updated with the new training samples, not completely overwritten (such as ``NormalBayesClassifier`` or ``ANN_MLP``).
..ocv:function::CvStatModel::CvStatModel()
There are 2 instance methods and 2 static (class) template methods. The first two train the already created model (the very first method must be overwritten in the derived classes). And the latter two variants are convenience methods that construct empty model and then call its train method.
:param samples: The input samples, floating-point matrix
:param results: The optional output matrix of results.
:param flags: The optional flags, model-dependent. Some models, such as ``Boost``, ``SVM`` recognize ``StatModel::RAW_OUTPUT`` flag, which makes the method return the raw results (the sum), not the class label.
:param test: if true, the error is computed over the test subset of the data, otherwise it's computed over the training subset of the data. Please note that if you loaded a completely different dataset to evaluate already trained classifier, you will probably want not to set the test subset at all with ``TrainData::setTrainTestSplitRatio`` and specify ``test=false``, so that the error is computed for the whole new set. Yes, this sounds a bit confusing.
The method uses ``StatModel::predict`` to compute the error. For regression models the error is computed as RMS, for classifiers - as a percent of missclassified samples (0%-100%).