This section describes obsolete ``C`` interface of EM algorithm. Details of the algorithm and its ``C++`` interface can be found in the other section :ref:`ML_Expectation Maximization`.
.. highlight:: cpp
CvEMParams
----------
.. ocv:class:: CvEMParams
Parameters of the EM algorithm. All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
CvEMParams::CvEMParams
----------------------
The constructors
.. ocv:function:: CvEMParams::CvEMParams()
.. ocv:function:: CvEMParams::CvEMParams( int nclusters, int cov_mat_type=CvEM::COV_MAT_DIAGONAL, int start_step=CvEM::START_AUTO_STEP, CvTermCriteria term_crit=cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, FLT_EPSILON), const CvMat* probs=0, const CvMat* weights=0, const CvMat* means=0, const CvMat** covs=0 )
:param nclusters: The number of mixture components in the Gaussian mixture model. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
:param cov_mat_type: Constraint on covariance matrices which defines type of matrices. Possible values are:
* **CvEM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``cov_mat_type=CvEM::COV_MAT_DIAGONAL``.
* **CvEM::COV_MAT_DIAGONAL** A diagonal matrix with positive diagonal elements. The number of free parameters is ``d`` for each matrix. This is most commonly used option yielding good estimation results.
* **CvEM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
:param start_step: The start step of the EM algorithm:
* **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixture components to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
* **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilities :math:`p_{i,k}` to use this option.
* **CvEM::START_AUTO_STEP** Start with Expectation step. You need not provide any parameters because they will be estimated by the kmeans algorithm.
:param term_crit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``term_crit.max_iter`` (number of M-steps) or when relative change of likelihood logarithm is less than ``term_crit.epsilon``.
:param probs: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture component :math:`k`. It is a floating-point matrix of :math:`nsamples \times nclusters` size. It is used and must be not NULL only when ``start_step=CvEM::START_M_STEP``.
:param weights: Initial weights :math:`\pi_k` of mixture components. It is a floating-point vector with :math:`nclusters` elements. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
:param means: Initial means :math:`a_k` of mixture components. It is a floating-point matrix of :math:`nclusters \times dims` size. It is used used and must be not NULL only when ``start_step=CvEM::START_E_STEP``.
:param covs: Initial covariance matrices :math:`S_k` of mixture components. Each of covariance matrices is a valid square floating-point matrix of :math:`dims \times dims` size. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
The default constructor represents a rough rule-of-the-thumb:
With another constructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
CvEM
----
.. ocv:class:: CvEM
The class implements the EM algorithm as described in the beginning of the section :ref:`ML_Expectation Maximization`.
CvEM::train
-----------
Estimates the Gaussian mixture parameters from a sample set.
:param samples: Samples from which the Gaussian mixture model will be estimated.
:param sample_idx: Mask of samples to use. All samples are used by default.
:param params: Parameters of the EM algorithm.
:param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
:math:`p_{i,k}` in ``probs``,
:math:`a_k` in ``means`` ,
:math:`S_k` in ``covs[k]``,
:math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
:ocv:class:`CvNormalBayesClassifier`.
For an example of clustering random samples of the multi-Gaussian distribution using EM, see ``em.cpp`` sample in the OpenCV distribution.
For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilities :math:`p_{i,k}` to belong to a mixture component :math:`k`.
:param node: The parent map. If it is NULL, the function searches a node with parameters in all the top-level nodes (streams), starting with the first one.
@@ -85,87 +89,67 @@ already a good enough approximation).
*
Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
EM
--
.. ocv:class:: EM
CvEMParams
----------
.. ocv:class:: CvEMParams
Parameters of the EM algorithm. All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
The class implements the EM algorithm as described in the beginning of this section. It is inherited from :ocv:class:`Algorithm`.
.. ocv:function:: CvEMParams::CvEMParams( int nclusters, int cov_mat_type=CvEM::COV_MAT_DIAGONAL, int start_step=CvEM::START_AUTO_STEP, CvTermCriteria term_crit=cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, FLT_EPSILON), const CvMat* probs=0, const CvMat* weights=0, const CvMat* means=0, const CvMat** covs=0 )
:param nclusters: The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
:param nclusters: The number of mixture components in the gaussian mixture model. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
:param covMatType: Constraint on covariance matrices which defines type of matrices. Possible values are:
:param cov_mat_type: Constraint on covariance matrices which defines type of matrices. Possible values are:
* **EM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for earch matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``covMatType=EM::COV_MAT_DIAGONAL``.
* **CvEM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for earch matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``cov_mat_type=CvEM::COV_MAT_DIAGONAL``.
* **EM::COV_MAT_DIAGONAL** A diagonal matrix with positive diagonal elements. The number of free parameters is ``d`` for each matrix. This is most commonly used option yielding good estimation results.
* **CvEM::COV_MAT_DIAGONAL** A diagonal matrix with positive diagonal elements. The number of free parameters is ``d`` for each matrix. This is most commonly used option yielding good estimation results.
* **EM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
* **CvEM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
:param termCrit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
:param start_step: The start step of the EM algorithm:
EM::train
---------
Estimates the Gaussian mixture parameters from a samples set.
* **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixture components to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
* **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilities :math:`p_{i,k}` to use this option.
* **CvEM::START_AUTO_STEP** Start with Expectation step. You need not provide any parameters because they will be estimated by the k-means algorithm.
:param term_crit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``term_crit.max_iter`` (number of M-steps) or when relative change of likelihood logarithm is less than ``term_crit.epsilon``.
:param probs: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture component :math:`k`. It is a floating-point matrix of :math:`nsamples \times nclusters` size. It is used and must be not NULL only when ``start_step=CvEM::START_M_STEP``.
:param weights: Initial weights :math:`\pi_k` of mixture components. It is a floating-point vector with :math:`nclusters` elements. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
:param samples: Samples from which the Gaussian mixture model will be estimated. It should be a one-channel matrix, each row of which is a sample. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
:param means: Initial means :math:`a_k` of mixture components. It is a floating-point matrix of :math:`nclusters \times dims` size. It is used used and must be not NULL only when ``start_step=CvEM::START_E_STEP``.
:param means0: Initial means :math:`a_k` of mixture components. It is a one-channel matrix of :math:`nclusters \times dims` size. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
:param covs: Initial covariance matrices :math:`S_k` of mixture components. Each of covariance matrices is a valid square floating-point matrix of :math:`dims \times dims` size. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
:param covs0: The vector of initial covariance matrices :math:`S_k` of mixture components. Each of covariance matrices is a one-channel matrix of :math:`dims \times dims` size. If the matrices do not have ``CV_64F`` type they will be converted to the inner matrices of such type for the further computing.
The default constructor represents a rough rule-of-the-thumb:
:param weights0: Initial weights :math:`\pi_k` of mixture components. It should be a one-channel floating-point matrix with :math:`1 \times nclusters` or :math:`nclusters \times 1` size.
::
:param probs0: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture component :math:`k`. It is a one-channel floating-point matrix of :math:`nsamples \times nclusters` size.
:param logLikelihoods: The optional output matrix that contains a likelihood logarithm value for each sample. It has :math:`nsamples \times 1` size and ``CV_64FC1`` type.
:param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample). It has :math:`nsamples \times 1` size and ``CV_32SC1`` type.
With another constructor it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
:param probs: The optional output matrix that contains posterior probabilities of each Gaussian mixture component given the each sample. It has :math:`nsamples \times nclusters` size and ``CV_64FC1`` type.
Three versions of training method differ in the initialization of Gaussian mixture model parameters and start step:
CvEM
----
.. ocv:class:: CvEM
The class implements the EM algorithm as described in the beginning of this section.
CvEM::train
-----------
Estimates the Gaussian mixture parameters from a sample set.
* **train** - Starts with Expectation step. Initial values of the model parameters will be estimated by the k-means algorithm.
* **trainE** - Starts with Expectation step. You need to provide initial means :math:`a_k` of mixture components. Optionally you can pass initial weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
:param samples: Samples from which the Gaussian mixture model will be estimated.
:param sample_idx: Mask of samples to use. All samples are used by default.
:param params: Parameters of the EM algorithm.
:param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
The methods return ``true`` if the Gaussian mixture model was trained successfully, otherwise it returns ``false``.
Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
...
...
@@ -178,121 +162,34 @@ Unlike many of the ML models, EM is an unsupervised learning algorithm and it do
The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
:ocv:class:`CvNormalBayesClassifier`.
For an example of clustering random samples of the multi-Gaussian distribution using EM, see ``em.cpp`` sample in the OpenCV distribution.
:param probs: Optional output matrix that contains posterior probabilities of each component given the sample. It has :math:`1 \times nclusters` size and ``CV_64FC1`` type.
.. ocv:pyfunction:: cv2.EM.getProbs() -> probs
The method returns a two-element ``double`` vector. Zero element is a likelihood logarithm value for the sample. First element is an index of the most probable mixture component for the given sample.
For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilities :math:`p_{i,k}` to belong to a mixture component :math:`k`.
CvEM::isTrained
---------------
Returns ``true`` if the Gaussian mixture model was trained.
:param fs: A file storage with parameters of the EM algorithm.
:param node: The parent map. If it is NULL, the function searches a node with parameters in all the top-level nodes (streams), starting with the first one.
The function reads EM parameters from the specified file storage node. For example of clustering random samples of multi-Gaussian distribution using EM see em.cpp sample in OpenCV distribution.
See :ocv:function:`Algorithm::read` and :ocv:function:`Algorithm::write`.
EM::get
-------
See :ocv:function:`Algorithm::get`. The following parameters are available for getting: