ML doc fixes

9074e79f · Ilya Lysenkov · 183be052 · 9074e79f · 9074e79f · 9074e79f
Commit 9074e79f authored Jun 29, 2011 by Ilya Lysenkov
6 changed files
--- a/modules/ml/doc/boosting.rst
+++ b/modules/ml/doc/boosting.rst
@@ -76,7 +76,7 @@ CvBoostParams
    Boosting training parameters.
-The structure is derived from :ref:`CvDTreeParams` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
+The structure is derived from :ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
 All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
@@ -175,18 +175,34 @@ Predicts a response for an input sample.
 .. ocv:function:: float CvBoost::predict(  const Mat& sample, const Mat& missing=Mat(), const Range& slice=Range::all(), bool rawMode=false, bool returnSum=false ) const
+.. ocv:cfunction:: float CvBoost::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weak_responses=0, CvSlice slice=CV_WHOLE_SEQ, bool raw_mode=false, bool return_sum=false ) const
 .. ocv:pyfunction:: cv2.CvBoost.predict(sample[, missing[, slice[, rawMode[, returnSum]]]]) -> retval
+    :param sample: Input sample.
+    :param missing: Optional mask of missing measurements. To handle missing measurements, the weak classifiers must include surrogate splits (see ``CvDTreeParams::use_surrogates``).
+    :param weak_responses: Optional output parameter, a floating-point vector with responses of each individual weak classifier. The number of elements in the vector must be equal to the slice length.
+    :param slice: Continuous subset of the sequence of weak classifiers to be used for prediction. By default, all the weak classifiers are used. 
+    :param raw_mode: Normally, it should be set to ``false``.
+    :param return_sum: If ``true`` then return sum of votes instead of the class label.
 The method runs the sample through the trees in the ensemble and returns the output class label based on the weighted voting.
 CvBoost::prune
 --------------
 Removes the specified weak classifiers.
-.. ocv:function:: void CvBoost::prune( CvSlice slice )
+.. ocv:cfunction:: void CvBoost::prune( CvSlice slice )
 .. ocv:pyfunction:: cv2.CvBoost.prune(slice) -> None
+    :param slice: Continuous subset of the sequence of weak classifiers to be removed.
 The method removes the specified weak classifiers from the sequence. 
 .. note:: Do not confuse this method with the pruning of individual decision trees, which is currently not supported.
@@ -196,7 +212,7 @@ CvBoost::calc_error
 -------------------
 Returns error of the boosted tree classifier.
-.. ocv:function:: float CvBoost::calc_error( CvMLData* _data, int type , std::vector<float> *resp = 0 )
+.. ocv:cfunction:: float CvBoost::calc_error( CvMLData* _data, int type , std::vector<float> *resp = 0 )
 The method is identical to :ocv:func:`CvDTree::calc_error` but uses the boosted tree classifier as predictor.
@@ -205,9 +221,9 @@ CvBoost::get_weak_predictors
 ----------------------------
 Returns the sequence of weak tree classifiers.
-.. ocv:function:: CvSeq* CvBoost::get_weak_predictors()
+.. ocv:cfunction:: CvSeq* CvBoost::get_weak_predictors()
-The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the ``CvBoostTree`` class or to some of its derivatives.
+The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the :ocv:class:`CvBoostTree` class or to some of its derivatives.
 CvBoost::get_params
 -------------------
@@ -220,5 +236,5 @@ CvBoost::get_data
 -----------------
 Returns used train data of the boosted tree classifier.
-.. ocv:function:: const CvDTreeTrainData* CvBoost::get_data() const
+.. ocv:cfunction:: const CvDTreeTrainData* CvBoost::get_data() const
--- a/modules/ml/doc/decision_trees.rst
+++ b/modules/ml/doc/decision_trees.rst
@@ -179,7 +179,7 @@ The constructors.
    :param truncate_pruned_tree: If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree by decreasing ``CvDTree::pruned_tree_idx`` parameter.
-    :param priors: The array of a priori class probabilities, sorted by the class label value. The parameter can be used to tune the decision tree preferences toward a certain class. For example, if users want to detect some rare anomaly occurrence, the training base will likely contain much more normal cases than anomalies, so a very good classification performance will be achieved just by considering every case as normal. To avoid this, the priors can be specified, where the anomaly probability is artificially increased (up to 0.5 or even greater), so the weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly. You can also think about this parameter as weights of prediction categories which determine relative weights that you give to misclassification. That is, if the weight of the first category is 1 and the weight of the second category is 10, then each mistake in predicting the second category is equivalent to making 10 mistakes in predicting the first category.
+    :param priors: The array of a priori class probabilities, sorted by the class label value. The parameter can be used to tune the decision tree preferences toward a certain class. For example, if you want to detect some rare anomaly occurrence, the training base will likely contain much more normal cases than anomalies, so a very good classification performance will be achieved just by considering every case as normal. To avoid this, the priors can be specified, where the anomaly probability is artificially increased (up to 0.5 or even greater), so the weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly. You can also think about this parameter as weights of prediction categories which determine relative weights that you give to misclassification. That is, if the weight of the first category is 1 and the weight of the second category is 10, then each mistake in predicting the second category is equivalent to making 10 mistakes in predicting the first category.
 The default constructor initializes all the parameters with the default values tuned for the standalone classification tree:
@@ -229,21 +229,21 @@ Trains a decision tree.
 .. ocv:function:: bool CvDTree::train( const Mat& train_data,  int tflag, const Mat& responses,  const Mat& var_idx=Mat(), const Mat& sample_idx=Mat(), const Mat& var_type=Mat(), const Mat& missing_mask=Mat(), CvDTreeParams params=CvDTreeParams() )
-.. ocv:function:: bool CvDTree::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvDTreeParams params=CvDTreeParams() )
+.. ocv:cfunction:: bool CvDTree::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvDTreeParams params=CvDTreeParams() )
-.. ocv:function:: bool CvDTree::train( CvMLData* trainData, CvDTreeParams params=CvDTreeParams() )
+.. ocv:cfunction:: bool CvDTree::train( CvMLData* trainData, CvDTreeParams params=CvDTreeParams() )
-.. ocv:function:: bool CvDTree::train( CvDTreeTrainData* trainData, const CvMat* subsampleIdx )
+.. ocv:cfunction:: bool CvDTree::train( CvDTreeTrainData* trainData, const CvMat* subsampleIdx )
 .. ocv:pyfunction:: cv2.CvDTree.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
 There are four ``train`` methods in :ocv:class:`CvDTree`:
-* The **first two** methods follow the generic ``CvStatModel::train`` conventions. It is the most complete form. Both data layouts (``tflag=CV_ROW_SAMPLE`` and ``tflag=CV_COL_SAMPLE``) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types, and so on. The last parameter contains all of the necessary training parameters (see the :ref:`CvDTreeParams` description).
+* The **first two** methods follow the generic :ocv:func:`CvStatModel::train` conventions. It is the most complete form. Both data layouts (``tflag=CV_ROW_SAMPLE`` and ``tflag=CV_COL_SAMPLE``) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types, and so on. The last parameter contains all of the necessary training parameters (see the :ocv:class:`CvDTreeParams` description).
 * The **third** method uses :ocv:class:`CvMLData` to pass training data to a decision tree.
-* The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ref:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
+* The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ocv:class:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
@@ -251,19 +251,19 @@ CvDTree::predict
 ----------------
 Returns the leaf node of a decision tree corresponding to the input vector.
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const Mat& sample, const Mat& missing_data_mask=Mat(), bool raw_mode=false ) const
+.. ocv:function:: CvDTreeNode* CvDTree::predict( const Mat& sample, const Mat& missingDataMask=Mat(), bool preprocessedInput=false ) const
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const CvMat* sample, const CvMat* missingDataMask=0, bool preprocessedInput=false ) const
+.. ocv:cfunction:: CvDTreeNode* CvDTree::predict( const CvMat* sample, const CvMat* missingDataMask=0, bool preprocessedInput=false ) const
 .. ocv:pyfunction:: cv2.CvDTree.predict(sample[, missingDataMask[, preprocessedInput]]) -> retval
    :param sample: Sample for prediction.
-    :param missing_data: Optional input missing measurement mask.
+    :param missingDataMask: Optional input missing measurement mask.
    :param preprocessedInput: This parameter is normally set to ``false``, implying a regular input. If it is ``true``, the method assumes that all the values of the discrete input variables have been already normalized to :math:`0` to :math:`num\_of\_categories_i-1` ranges since the decision tree uses such normalized representation internally. It is useful for faster prediction with tree ensembles. For ordered input variables, the flag is not used.
-The method traverses the decision tree and returns the reached leaf node as output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the :ref:`CvDTreeNode` structure, for example: ``dtree->predict(sample,mask)->value``.
+The method traverses the decision tree and returns the reached leaf node as output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the :ocv:class:`CvDTreeNode` structure, for example: ``dtree->predict(sample,mask)->value``.
@@ -271,7 +271,7 @@ CvDTree::calc_error
 -------------------
 Returns error of the decision tree.
-.. ocv:function:: float CvDTree::calc_error( CvMLData* trainData, int type, std::vector<float> *resp = 0 )
+.. ocv:cfunction:: float CvDTree::calc_error( CvMLData* trainData, int type, std::vector<float> *resp = 0 )
    :param data: Data for the decision tree.
@@ -292,7 +292,7 @@ Returns the variable importance array.
 .. ocv:function:: Mat CvDTree::getVarImportance()
-.. ocv:function:: const CvMat* CvDTree::get_var_importance()
+.. ocv:cfunction:: const CvMat* CvDTree::get_var_importance()
 CvDTree::get_root
@@ -314,7 +314,7 @@ CvDTree::get_data
 -----------------
 Returns used train data of the decision tree.
-.. ocv:function:: const CvDTreeTrainData* CvDTree::get_data() const
+.. ocv:cfunction:: const CvDTreeTrainData* CvDTree::get_data() const
 Example: building a tree for classifying mushrooms.  See the ``mushroom.cpp`` sample that demonstrates how to build and use the
 decision tree.

--- a/modules/ml/doc/expectation_maximization.rst
+++ b/modules/ml/doc/expectation_maximization.rst
@@ -114,19 +114,19 @@ The constructors
    :param start_step: The start step of the EM algorithm: 
-        * **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixtures to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixtures.
+        * **CvEM::START_E_STEP** Start with Expectation step. You need to provide means :math:`a_k` of mixture components to use this option. Optionally you can pass weights :math:`\pi_k` and covariance matrices :math:`S_k` of mixture components.
        * **CvEM::START_M_STEP** Start with Maximization step. You need to provide initial probabilites :math:`p_{i,k}` to use this option.
        * **CvEM::START_AUTO_STEP** Start with Expectation step. You need not provide any parameters because they will be estimated by the k-means algorithm.
    :param term_crit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``term_crit.max_iter`` (number of M-steps) or when relative change of likelihood logarithm is less than ``term_crit.epsilon``.
-    :param probs: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture :math:`k`. It is a floating-point matrix of :math:`nsamples \times nclusters` size. It is used and must be not NULL only when ``start_step=CvEM::START_M_STEP``.
+    :param probs: Initial probabilities :math:`p_{i,k}` of sample :math:`i` to belong to mixture component :math:`k`. It is a floating-point matrix of :math:`nsamples \times nclusters` size. It is used and must be not NULL only when ``start_step=CvEM::START_M_STEP``.
-    :param weights: Initial weights of mixtures :math:`\pi_k`. It is a floating-point vector with :math:`nclusters` elements. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``. 
+    :param weights: Initial weights :math:`\pi_k` of mixture components. It is a floating-point vector with :math:`nclusters` elements. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``. 
-    :param means: Initial means of mixtures :math:`a_k`. It is a floating-point matrix of :math:`nclusters \times dims` size. It is used used and must be not NULL only when ``start_step=CvEM::START_E_STEP``.
+    :param means: Initial means :math:`a_k` of mixture components. It is a floating-point matrix of :math:`nclusters \times dims` size. It is used used and must be not NULL only when ``start_step=CvEM::START_E_STEP``.
-    :param covs: Initial covariance matrices of mixtures :math:`S_k`. Each of covariance matrices is a valid square floating-point matrix of :math:`dims \times dims` size. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
+    :param covs: Initial covariance matrices :math:`S_k` of mixture components. Each of covariance matrices is a valid square floating-point matrix of :math:`dims \times dims` size. It is used (if not NULL) only when ``start_step=CvEM::START_E_STEP``.
 The default constructor represents a rough rule-of-the-thumb:
@@ -167,15 +167,13 @@ Estimates the Gaussian mixture parameters from a sample set.
    :param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
-    Estimates the Gaussian mixture parameters from a sample set.
 Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
 *Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
 :math:`p_{i,k}` in ``probs``,
 :math:`a_k` in ``means`` ,
 :math:`S_k` in ``covs[k]``,
 :math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
-:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture for each sample).
+:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
 The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
 :ref:`Bayes classifier`.
@@ -242,7 +240,7 @@ Returns vectors of probabilities for each training sample.
 .. ocv:function:: const CvMat* CvEM::get_probs() const
-Returns probabilites :math:`p_{i,k}` of sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) to belong to a mixture component :math:`k`.
+For each training sample :math:`i` (that have been passed to the constructor or to :ocv:func:`CvEM::train`) returns probabilites :math:`p_{i,k}` to belong to a mixture component :math:`k`.
 CvEM::getLikelihood

--- a/modules/ml/doc/random_trees.rst
+++ b/modules/ml/doc/random_trees.rst
@@ -87,7 +87,16 @@ The constructors.
 For meaning of other parameters see :ocv:func:`CvDTreeParams::CvDTreeParams`.
-The default constructor sets all parameters to some default values and they are different from default values of :ref:`CvDTreeParams`.
+The default constructor sets all parameters to default values which are different from default values of :ocv:class:`CvDTreeParams`:
+::
+    CvRTParams::CvRTParams() : CvDTreeParams( 5, 10, 0, false, 10, 0, false, false, 0 ),
+        calc_var_importance(false), nactive_vars(0)
+    {
+        term_crit = cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 50, 0.1 );
+    }
 CvRTrees
 --------
@@ -99,11 +108,11 @@ CvRTrees::train
 ---------------
 Trains the Random Trees model.
-.. ocv:function:: bool CvRTrees::train( CvMLData* data, CvRTParams params=CvRTParams() )
 .. ocv:function:: bool CvRTrees::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvRTParams params=CvRTParams() )
-.. ocv:function:: bool CvRTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvRTParams params=CvRTParams() )
+.. ocv:cfunction:: bool CvRTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvRTParams params=CvRTParams() )
+.. ocv:cfunction:: bool CvRTrees::train( CvMLData* data, CvRTParams params=CvRTParams() )
 .. ocv:pyfunction:: cv2.CvRTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
@@ -115,7 +124,7 @@ Predicts the output for an input sample.
 .. ocv:function:: double CvRTrees::predict(  const Mat& sample,  const Mat& missing=Mat() ) const
-.. ocv:function:: float CvRTrees::predict( const CvMat* sample, const CvMat* missing = 0 ) const
+.. ocv:cfunction:: float CvRTrees::predict( const CvMat* sample, const CvMat* missing = 0 ) const
 .. ocv:pyfunction:: cv2.CvRTrees.predict(sample[, missing]) -> retval
@@ -132,7 +141,7 @@ Returns a fuzzy-predicted class label.
 .. ocv:function:: float CvRTrees::predict_prob( const cv::Mat& sample, const cv::Mat& missing = cv::Mat() ) const
-.. ocv:function:: float CvRTrees::predict_prob( const CvMat* sample, const CvMat* missing = 0 ) const
+.. ocv:cfunction:: float CvRTrees::predict_prob( const CvMat* sample, const CvMat* missing = 0 ) const
 .. ocv:pyfunction:: cv2.CvRTrees.predict_prob(sample[, missing]) -> retval
@@ -149,7 +158,7 @@ Returns the variable importance array.
 .. ocv:function:: Mat CvRTrees::getVarImportance()
-.. ocv:function:: const CvMat* CvRTrees::get_var_importance()
+.. ocv:cfunction:: const CvMat* CvRTrees::get_var_importance()
 The method returns the variable importance vector, computed at the training stage when ``CvRTParams::calc_var_importance`` is set to true. If this flag was set to false, the ``NULL`` pointer is returned. This differs from the decision trees where variable importance can be computed anytime after the training.
@@ -158,7 +167,7 @@ CvRTrees::get_proximity
 -----------------------
 Retrieves the proximity measure between two training samples.
-.. ocv:function:: float CvRTrees::get_proximity( const CvMat* sample1, const CvMat* sample2, const CvMat* missing1 = 0, const CvMat* missing2 = 0 ) const
+.. ocv:cfunction:: float CvRTrees::get_proximity( const CvMat* sample1, const CvMat* sample2, const CvMat* missing1 = 0, const CvMat* missing2 = 0 ) const
    :param sample_1: The first sample.
@@ -174,7 +183,7 @@ CvRTrees::calc_error
 --------------------
 Returns error of the random forest.
-.. ocv:function:: float CvRTrees::calc_error( CvMLData* data, int type, std::vector<float> *resp = 0 )
+.. ocv:cfunction:: float CvRTrees::calc_error( CvMLData* data, int type, std::vector<float> *resp = 0 )
 The method is identical to :ocv:func:`CvDTree::calc_error` but uses the random forest as predictor.
@@ -192,7 +201,7 @@ CvRTrees::get_rng
 -----------------
 Returns the state of the used random number generator.
-.. ocv:function:: CvRNG* CvRTrees::get_rng()
+.. ocv:cfunction:: CvRNG* CvRTrees::get_rng()
 CvRTrees::get_tree_count

--- a/modules/ml/doc/statistical_models.rst
+++ b/modules/ml/doc/statistical_models.rst
@@ -81,7 +81,7 @@ Deallocates memory and resets the model state.
 .. ocv:function:: void CvStatModel::clear()
-The method ``clear`` does the same job as the destructor: it deallocates all the memory occupied by the class members. But the object itself is not destructed and can be reused further. This method is called from the destructor, from the :ocv:func:`CvStatModel::train` methods of the derived classes, from the methods :ocv:func:`CvStatModel::load`, :ocv:func:`CvStatModel::read()``, or even explicitly by the user.
+The method ``clear`` does the same job as the destructor: it deallocates all the memory occupied by the class members. But the object itself is not destructed and can be reused further. This method is called from the destructor, from the :ocv:func:`CvStatModel::train` methods of the derived classes, from the methods :ocv:func:`CvStatModel::load`, :ocv:func:`CvStatModel::read()`, or even explicitly by the user.
 CvStatModel::save
 -----------------

--- a/modules/ml/doc/support_vector_machines.rst
+++ b/modules/ml/doc/support_vector_machines.rst
@@ -227,7 +227,7 @@ is minimal.
 If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma``, set ``gamma_grid.step = 0``, ``gamma_grid.min_val``, ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma``.
 And, finally, if the optimization in a parameter is required but
-the corresponding grid is unknown, you may call the function :ocv:func:`CvSVM::get_default_grid`. To generate a grid, for example, for ``gamma``, call :ocv:func:`CvSVM::get_default_grid(CvSVM::GAMMA)`.
+the corresponding grid is unknown, you may call the function :ocv:func:`CvSVM::get_default_grid`. To generate a grid, for example, for ``gamma``, call ``CvSVM::get_default_grid(CvSVM::GAMMA)``.
 This function works for the classification
 (``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC``)