Commit 907240a8 authored by Maria Dimashova's avatar Maria Dimashova

completed doc on MLData

parent 3d74662f
...@@ -6,7 +6,7 @@ MLData ...@@ -6,7 +6,7 @@ MLData
For the machine learning algorithms usage it is often that data set is saved in file of format like .csv. The supported format file must contains the table of predictors and responses values, each row of the table must correspond to one sample. Missing values are supported. Famous UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/) provides many stored in such format data sets to the machine learning community. The class MLData has been implemented to ease the loading data for the training one of the existing in OpenCV machine learning algorithm. For float values only separator ``'.'`` is supported. For the machine learning algorithms usage it is often that data set is saved in file of format like .csv. The supported format file must contains the table of predictors and responses values, each row of the table must correspond to one sample. Missing values are supported. Famous UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/) provides many stored in such format data sets to the machine learning community. The class MLData has been implemented to ease the loading data for the training one of the existing in OpenCV machine learning algorithm. For float values only separator ``'.'`` is supported.
CvMLData CvMLData
---------- --------
.. ocv:class:: CvMLData .. ocv:class:: CvMLData
The class to load the data from .csv file. The class to load the data from .csv file.
...@@ -55,31 +55,31 @@ The class to load the data from .csv file. ...@@ -55,31 +55,31 @@ The class to load the data from .csv file.
}; };
CvMLData::read_csv CvMLData::read_csv
---------- ------------------
.. ocv:function:: int CvMLData::read_csv(const char* filename); .. ocv:function:: int CvMLData::read_csv(const char* filename);
This method reads the data set from .csv-like file named ``filename`` and store all read values in one matrix. While reading the method tries to define variables (predictors and response) type: ordered or categorical. If some value of the variable is not a number (e.g. contains the letters) exept a label for missing value, then the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all unmissing values of the variable are the numbers, then the type of the variable is set to ``CV_VAR_ORDERED``. So default definition of variables types works correctly for all cases except the case of categorical variable that has numerical class labeles. In such case the type ``CV_VAR_ORDERED`` will be set and user should change the type to ``CV_VAR_CATEGORICAL`` using method :ocv:func:`CvMLData::change_var_type`. For categorical variables the common map is built to convert string class label to the numerical class label and this map can be got by :ocv:func:`CvMLData::get_class_labels_map`. Also while reading the data the method constructs the mask of missing values (e.g. values are egual to `'?'`). This method reads the data set from .csv-like file named ``filename`` and store all read values in one matrix. While reading the method tries to define variables (predictors and response) type: ordered or categorical. If some value of the variable is not a number (e.g. contains the letters) exept a label for missing value, then the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all unmissing values of the variable are the numbers, then the type of the variable is set to ``CV_VAR_ORDERED``. So default definition of variables types works correctly for all cases except the case of categorical variable that has numerical class labeles. In such case the type ``CV_VAR_ORDERED`` will be set and user should change the type to ``CV_VAR_CATEGORICAL`` using method :ocv:func:`CvMLData::change_var_type`. For categorical variables the common map is built to convert string class label to the numerical class label and this map can be got by :ocv:func:`CvMLData::get_class_labels_map`. Also while reading the data the method constructs the mask of missing values (e.g. values are egual to `'?'`).
CvMLData::get_values CvMLData::get_values
---------- --------------------
.. ocv:function:: const CvMat* CvMLData::get_values() const; .. ocv:function:: const CvMat* CvMLData::get_values() const;
Returns the pointer to the predictors and responses ``values`` matrix or ``0`` if data has not been loaded from file yet. This matrix has rows count equal to samples count, columns count equal to predictors ``+ 1`` for response (if exist) count (i.e. each row of matrix is values of one sample predictors and response) and type ``CV_32FC1``. Returns the pointer to the predictors and responses ``values`` matrix or ``0`` if data has not been loaded from file yet. This matrix has rows count equal to samples count, columns count equal to predictors ``+ 1`` for response (if exist) count (i.e. each row of matrix is values of one sample predictors and response) and type ``CV_32FC1``.
CvMLData::get_responses CvMLData::get_responses
---------- -----------------------
.. ocv:function:: const CvMat* CvMLData::get_responses(); .. ocv:function:: const CvMat* CvMLData::get_responses();
Returns the pointer to the responses values matrix or throw exception if data has not been loaded from file yet. This matrix has rows count equal to samples count, one column and type ``CV_32FC1``. Returns the pointer to the responses values matrix or throw exception if data has not been loaded from file yet. This matrix has rows count equal to samples count, one column and type ``CV_32FC1``.
CvMLData::get_missing CvMLData::get_missing
---------- ---------------------
.. ocv:function:: const CvMat* CvMLData::get_missing() const; .. ocv:function:: const CvMat* CvMLData::get_missing() const;
Returns the pointer to the missing values mask matrix or throw exception if data has not been loaded from file yet. This matrix has the same size as ``values`` matrix (see :ocv:func:`CvMLData::get_values`) and type ``CV_8UC1``. Returns the pointer to the missing values mask matrix or throw exception if data has not been loaded from file yet. This matrix has the same size as ``values`` matrix (see :ocv:func:`CvMLData::get_values`) and type ``CV_8UC1``.
CvMLData::set_response_idx CvMLData::set_response_idx
---------- --------------------------
.. ocv:function:: void CvMLData::set_response_idx( int idx ); .. ocv:function:: void CvMLData::set_response_idx( int idx );
Sets index of response column in ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throw exception if data has not been loaded from file yet. The old response column become pridictors. If ``idx < 0`` there will be no response. Sets index of response column in ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throw exception if data has not been loaded from file yet. The old response column become pridictors. If ``idx < 0`` there will be no response.
...@@ -92,91 +92,96 @@ CvMLData::get_response_idx ...@@ -92,91 +92,96 @@ CvMLData::get_response_idx
CvMLData::set_train_test_split CvMLData::set_train_test_split
---------- ------------------------------
.. ocv:function:: void set_train_test_split( const CvTrainTestSplit * spl ); .. ocv:function:: void CvMLData::set_train_test_split( const CvTrainTestSplit * spl );
For different purposes it can be useful to devide the read data set into two disjoint subsets: training and test ones. This method sets parametes for such split (using ``spl``, see :ocv:class:`CvTrainTestSplit`) and make the data split or throw exception if data has not been loaded from file yet. For different purposes it can be useful to devide the read data set into two disjoint subsets: training and test ones. This method sets parametes for such split (using ``spl``, see :ocv:class:`CvTrainTestSplit`) and make the data split or throw exception if data has not been loaded from file yet.
CvMLData::get_train_sample_idx CvMLData::get_train_sample_idx
---------- ------------------------------
.. ocv:function:: const CvMat* CvMLData::get_train_sample_idx() const; .. ocv:function:: const CvMat* CvMLData::get_train_sample_idx() const;
The read data set can be devided on training and test data subsets by setting split (see :ocv:func:`CvMLData::set_train_test_split`). Current method returns the matrix of samples indices for training subset (this matrix has one row and type ``CV_32SC1``). If data split is not set then the method returns ``0``. If data has not been loaded from file yet an exception is thrown. The read data set can be devided on training and test data subsets by setting split (see :ocv:func:`CvMLData::set_train_test_split`). Current method returns the matrix of samples indices for training subset (this matrix has one row and type ``CV_32SC1``). If data split is not set then the method returns ``0``. If data has not been loaded from file yet an exception is thrown.
CvMLData::get_test_sample_idx CvMLData::get_test_sample_idx
---------- -----------------------------
.. ocv:function:: const CvMat* CvMLData::get_test_sample_idx() const; .. ocv:function:: const CvMat* CvMLData::get_test_sample_idx() const;
Analogically with :ocv:func:`CvMLData::get_train_sample_idx`, but for test subset. Analogically with :ocv:func:`CvMLData::get_train_sample_idx`, but for test subset.
CvMLData::mix_train_and_test_idx CvMLData::mix_train_and_test_idx
---------- --------------------------------
.. ocv:function:: void CvMLData::mix_train_and_test_idx(); .. ocv:function:: void CvMLData::mix_train_and_test_idx();
Mixes the indices of training and test samples preserving sizes of training and test subsets (if data split is set by :ocv:func:`CvMLData::get_values`). If data has not been loaded from file yet an exception is thrown. Mixes the indices of training and test samples preserving sizes of training and test subsets (if data split is set by :ocv:func:`CvMLData::get_values`). If data has not been loaded from file yet an exception is thrown.
CvMLData::get_var_idx CvMLData::get_var_idx
---------- ---------------------
.. ocv:function:: const CvMat* CvMLData::get_var_idx(); .. ocv:function:: const CvMat* CvMLData::get_var_idx();
Returns used variables (columns) indices in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`), ``0`` if used subset is not set or throw exception if data has not been loaded from file yet. Returned matrix has one row, columns count equel to used variable subset size and type ``CV_32SC1``. Returns used variables (columns) indices in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`), ``0`` if used subset is not set or throw exception if data has not been loaded from file yet. Returned matrix has one row, columns count equel to used variable subset size and type ``CV_32SC1``.
CvMLData::chahge_var_idx CvMLData::chahge_var_idx
---------- ------------------------
.. ocv:function:: void CvMLData::chahge_var_idx( int vi, bool state ); .. ocv:function:: void CvMLData::chahge_var_idx( int vi, bool state );
By default after reading the data set all variables in ``values`` matrix (see :ocv:func:`CvMLData::get_values`) are used. But the user may want to use only subset of variables and can include on/off (depends on ``state`` value) a variable with ``vi`` index from used subset. If data has not been loaded from file yet an exception is thrown. By default after reading the data set all variables in ``values`` matrix (see :ocv:func:`CvMLData::get_values`) are used. But the user may want to use only subset of variables and can include on/off (depends on ``state`` value) a variable with ``vi`` index from used subset. If data has not been loaded from file yet an exception is thrown.
CvMLData::get_var_types CvMLData::get_var_types
---------- -----------------------
.. ocv:function:: const CvMat* CvMLData::get_var_types(); .. ocv:function:: const CvMat* CvMLData::get_var_types();
Returns matrix of used variable types. The matrix has one row, column count equel to used variables count and type ``CV_8UC1``. If data has not been loaded from file yet an exception is thrown. Returns matrix of used variable types. The matrix has one row, column count equel to used variables count and type ``CV_8UC1``. If data has not been loaded from file yet an exception is thrown.
CvMLData::set_var_types CvMLData::set_var_types
---------- -----------------------
.. ocv:function:: void CvMLData::set_var_types( const char* str ); .. ocv:function:: void CvMLData::set_var_types( const char* str );
Sets variables types according to given string ``str``. The better description of the supporting string format is several examples of it: ``"ord[0-17],cat[18]"``, ``"ord[0,2,4,10-12], cat[1,3,5-9,13,14]"``, ``"cat"`` (all variables are categorical), ``"ord"`` (all variables are ordered). That is after the variable type a list of such type variables indices is followed. Sets variables types according to given string ``str``. The better description of the supporting string format is several examples of it: ``"ord[0-17],cat[18]"``, ``"ord[0,2,4,10-12], cat[1,3,5-9,13,14]"``, ``"cat"`` (all variables are categorical), ``"ord"`` (all variables are ordered). That is after the variable type a list of such type variables indices is followed.
CvMLData::get_var_type CvMLData::get_var_type
---------- ----------------------
.. ocv:function:: int CvMLData::get_var_type( int var_idx ) const; .. ocv:function:: int CvMLData::get_var_type( int var_idx ) const;
Returns type of variable by index ``var_idx`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``). Returns type of variable by index ``var_idx`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
CvMLData::change_var_type CvMLData::change_var_type
---------- -------------------------
.. ocv:function:: void CvMLData::change_var_type( int var_idx, int type); .. ocv:function:: void CvMLData::change_var_type( int var_idx, int type);
Changes type of variable with index ``var_idx`` from existing type to ``type`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``). Changes type of variable with index ``var_idx`` from existing type to ``type`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
CvMLData::set_delimiter CvMLData::set_delimiter
---------- -----------------------
.. ocv:function:: void CvMLData::set_delimiter( char ch ); .. ocv:function:: void CvMLData::set_delimiter( char ch );
Sets the delimiter for the variable values in file. E.g. ``','`` (default), ``';'``, ``' '`` (space) or other character (exapt float separator ``'.'``). Sets the delimiter for the variable values in file. E.g. ``','`` (default), ``';'``, ``' '`` (space) or other character (exapt float separator ``'.'``).
CvMLData::get_delimiter CvMLData::get_delimiter
---------- -----------------------
.. ocv:function:: char CvMLData::get_delimiter() const; .. ocv:function:: char CvMLData::get_delimiter() const;
Gets the set delimiter charecter. Gets the set delimiter charecter.
CvMLData::set_miss_ch CvMLData::set_miss_ch
---------- ---------------------
.. ocv:function:: void CvMLData::set_miss_ch( char ch ); .. ocv:function:: void CvMLData::set_miss_ch( char ch );
Sets the character denoting the missing of value. E.g. ``'?'`` (default), ``'-'``, etc (exapt float separator ``'.'``). Sets the character denoting the missing of value. E.g. ``'?'`` (default), ``'-'``, etc (exapt float separator ``'.'``).
CvMLData::get_miss_ch CvMLData::get_miss_ch
---------- ---------------------
.. ocv:function:: char CvMLData::get_miss_ch() const; .. ocv:function:: char CvMLData::get_miss_ch() const;
Gets the character denoting the missing value. Gets the character denoting the missing value.
CvMLData::get_class_labels_map
-------------------------------
.. ocv:function:: const std::map<std::string, int>& CvMLData::get_class_labels_map() const;
Returns map that converts string class labels to the numerical class labels. It can be used to get original class label (as in file).
CvTrainTestSplit CvTrainTestSplit
---------- ----------------
.. ocv:class:: CvTrainTestSplit .. ocv:class:: CvTrainTestSplit
The structure to set split of data set read by :ocv:class:`CvMLData`. The structure to set split of data set read by :ocv:class:`CvMLData`.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment