Commit 61f36de5 authored by Maksim Shabunin's avatar Maksim Shabunin

Doxygen tutorials support

parent 312c8fa7
......@@ -57,7 +57,7 @@ namespace bgsegm
/** @brief Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
The class implements the algorithm described in @cite KB2001.
The class implements the algorithm described in @cite KB2001 .
*/
class CV_EXPORTS_W BackgroundSubtractorMOG : public BackgroundSubtractor
{
......@@ -88,7 +88,7 @@ CV_EXPORTS_W Ptr<BackgroundSubtractorMOG>
double backgroundRatio=0.7, double noiseSigma=0);
/** @brief Background Subtractor module based on the algorithm given in @cite Gold2012.
/** @brief Background Subtractor module based on the algorithm given in @cite Gold2012 .
Takes a series of images and returns a sequence of mask (8UC1)
images of the same size, where 255 indicates Foreground and 0 represents Background.
......
Retina : a Bio mimetic human retina model {#bioinspired_retina}
=========================================
Bioinspired Module Retina Introduction {#bioinspired_retina}
======================================
Retina
------
**Note** : do not forget that the retina model is included in the following namespace :
*cv::bioinspired*.
@note do not forget that the retina model is included in the following namespace : cv::bioinspired
### Introduction
......@@ -18,14 +17,13 @@ separable spatio-temporal filter modelling the two main retina information chann
From a general point of view, this filter whitens the image spectrum and corrects luminance thanks
to local adaptation. An other important property is its hability to filter out spatio-temporal noise
while enhancing details. This model originates from Jeanny Herault work @cite Herault2010. It has been
while enhancing details. This model originates from Jeanny Herault work @cite Herault2010 . It has been
involved in Alexandre Benoit phd and his current research @cite Benoit2010, @cite Strat2013 (he
currently maintains this module within OpenCV). It includes the work of other Jeanny's phd student
such as @cite Chaix2007 and the log polar transformations of Barthelemy Durette described in Jeanny's
book.
**NOTES :**
@note
- For ease of use in computer vision applications, the two retina channels are applied
homogeneously on all the input images. This does not follow the real retina topology but this
can still be done using the log sampling capabilities proposed within the class.
......@@ -71,7 +69,7 @@ described hereafter. XML parameters file samples are shown at the end of the pag
Here is an overview of the abstract Retina interface, allocate one instance with the *createRetina*
functions.:
@code{.cpp}
namespace cv{namespace bioinspired{
class Retina : public Algorithm
......@@ -122,6 +120,7 @@ functions.:
cv::Ptr<Retina> createRetina (Size inputSize);
cv::Ptr<Retina> createRetina (Size inputSize, const bool colorMode, RETINA_COLORSAMPLINGMETHOD colorSamplingMethod=RETINA_COLOR_BAYER, const bool useRetinaLogSampling=false, const double reductionFactor=1.0, const double samplingStrenght=10.0);
}} // cv and bioinspired namespaces end
@endcode
### Description
......@@ -146,59 +145,47 @@ Use : this model can be used basically for spatio-temporal video effects but als
- performing motion analysis also taking benefit of the previously cited properties (check out the
magnocellular retina channel output, by using the provided **getMagno** methods)
- general image/video sequence description using either one or both channels. An example of the
use of Retina in a Bag of Words approach is given in @cite Strat2013.
use of Retina in a Bag of Words approach is given in @cite Strat2013 .
Literature
----------
For more information, refer to the following papers :
- Model description :
[Benoit2010] Benoit A., Caplier A., Durette B., Herault, J., "Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing", Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011>
- Model use in a Bag of Words approach :
- Model description : @cite Benoit2010
[Strat2013] Strat S., Benoit A., Lambert P., "Retina enhanced SIFT descriptors for video indexing", CBMI2013, Veszprém, Hungary, 2013.
- Model use in a Bag of Words approach : @cite Strat2013
- Please have a look at the reference work of Jeanny Herault that you can read in his book :
[Herault2010] Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception (Progress in Neural Processing),By: Jeanny Herault, ISBN: 9814273686. WAPI (Tower ID): 113266891.
- Please have a look at the reference work of Jeanny Herault that you can read in his book : @cite Herault2010
This retina filter code includes the research contributions of phd/research collegues from which
code has been redrawn by the author :
- take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color
mosaicing/demosaicing and his reference paper:
[Chaix2007] B. Chaix de Lavarene, D. Alleysson, B. Durette, J. Herault (2007). "Efficient demosaicing through recursive filtering", IEEE International Conference on Image Processing ICIP 2007
mosaicing/demosaicing and his reference paper: @cite Chaix2007
- take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which
originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is
also proposed and originates from Jeanny's discussions. More informations in the above cited
Jeanny Heraults's book.
- Meylan&al work on HDR tone mapping that is implemented as a specific method within the model :
[Meylan2007] L. Meylan , D. Alleysson, S. Susstrunk, "A Model of Retinal Local Adaptation for the Tone Mapping of Color Filter Array Images", Journal of Optical Society of America, A, Vol. 24, N 9, September, 1st, 2007, pp. 2807-2816
- Meylan&al work on HDR tone mapping that is implemented as a specific method within the model : @cite Meylan2007
Demos and experiments !
-----------------------
**NOTE : Complementary to the following examples, have a look at the Retina tutorial in the
@note Complementary to the following examples, have a look at the Retina tutorial in the
tutorial/contrib section for complementary explanations.**
Take a look at the provided C++ examples provided with OpenCV :
- **samples/cpp/retinademo.cpp** shows how to use the retina module for details enhancement (Parvo channel output) and transient maps observation (Magno channel output). You can play with images, video sequences and webcam video.
Typical uses are (provided your OpenCV installation is situated in folder
*OpenCVReleaseFolder*)
Typical uses are (provided your OpenCV installation is situated in folder *OpenCVReleaseFolder*)
- image processing : **OpenCVReleaseFolder/bin/retinademo -image myPicture.jpg**
- video processing : **OpenCVReleaseFolder/bin/retinademo -video myMovie.avi**
- webcam processing: **OpenCVReleaseFolder/bin/retinademo -video**
**Note :** This demo generates the file *RetinaDefaultParameters.xml* which contains the
@note This demo generates the file *RetinaDefaultParameters.xml* which contains the
default parameters of the retina. Then, rename this as *RetinaSpecificParameters.xml*, adjust
the parameters the way you want and reload the program to check the effect.
......@@ -217,7 +204,7 @@ Take a look at the provided C++ examples provided with OpenCV :
Note that some sliders are made available to allow you to play with luminance compression.
If not using the 'fast' option, then, tone mapping is performed using the full retina model
@cite Benoit2010. It includes spectral whitening that allows luminance energy to be reduced.
@cite Benoit2010 . It includes spectral whitening that allows luminance energy to be reduced.
When using the 'fast' option, then, a simpler method is used, it is an adaptation of the
algorithm presented in @cite Meylan2007. This method gives also good results and is faster to
algorithm presented in @cite Meylan2007 . This method gives also good results and is faster to
process but it sometimes requires some more parameters adjustement.
Discovering the human retina and its use for image processing {#tutorial_bioinspired_retina_model}
=============================================================
Goal
----
I present here a model of human retina that shows some interesting properties for image
preprocessing and enhancement. In this tutorial you will learn how to:
- discover the main two channels outing from your retina
- see the basics to use the retina model
- discover some parameters tweaks
General overview
----------------
The proposed model originates from Jeanny Herault's research @cite Herault2010 at
[Gipsa](http://www.gipsa-lab.inpg.fr). It is involved in image processing applications with
[Listic](http://www.listic.univ-savoie.fr) (code maintainer and user) lab. This is not a complete
model but it already present interesting properties that can be involved for enhanced image
processing experience. The model allows the following human retina properties to be used :
- spectral whitening that has 3 important effects: high spatio-temporal frequency signals
canceling (noise), mid-frequencies details enhancement and low frequencies luminance energy
reduction. This *all in one* property directly allows visual signals cleaning of classical
undesired distortions introduced by image sensors and input luminance range.
- local logarithmic luminance compression allows details to be enhanced even in low light
conditions.
- decorrelation of the details information (Parvocellular output channel) and transient
information (events, motion made available at the Magnocellular output channel).
The first two points are illustrated below :
In the figure below, the OpenEXR image sample *CrissyField.exr*, a High Dynamic Range image is
shown. In order to make it visible on this web-page, the original input image is linearly rescaled
to the classical image luminance range [0-255] and is converted to 8bit/channel format. Such strong
conversion hides many details because of too strong local contrasts. Furthermore, noise energy is
also strong and pollutes visual information.
![image](images/retina_TreeHdr_small.jpg)
In the following image, applying the ideas proposed in @cite Benoit2010, as your retina does, local
luminance adaptation, spatial noise removal and spectral whitening work together and transmit
accurate information on lower range 8bit data channels. On this picture, noise in significantly
removed, local details hidden by strong luminance contrasts are enhanced. Output image keeps its
naturalness and visual content is enhanced. Color processing is based on the color
multiplexing/demultiplexing method proposed in @cite Chaix2007 .
![image](images/retina_TreeHdr_retina.jpg)
*Note :* image sample can be downloaded from the [OpenEXR website](http://www.openexr.com).
Regarding this demonstration, before retina processing, input image has been linearly rescaled
within 0-255 keeping its channels float format. 5% of its histogram ends has been cut (mostly
removes wrong HDR pixels). Check out the sample
*opencv/samples/cpp/OpenEXRimages_HighDynamicRange_Retina_toneMapping.cpp* for similar
processing. The following demonstration will only consider classical 8bit/channel images.
The retina model output channels
--------------------------------
The retina model presents two outputs that benefit from the above cited behaviors.
- The first one is called the Parvocellular channel. It is mainly active in the foveal retina area
(high resolution central vision with color sensitive photo-receptors), its aim is to provide
accurate color vision for visual details remaining static on the retina. On the other hand
objects moving on the retina projection are blurred.
- The second well known channel is the Magnocellular channel. It is mainly active in the retina
peripheral vision and send signals related to change events (motion, transient events, etc.).
These outing signals also help visual system to focus/center retina on 'transient'/moving areas
for more detailed analysis thus improving visual scene context and object classification.
**NOTE :** regarding the proposed model, contrary to the real retina, we apply these two channels on
the entire input images using the same resolution. This allows enhanced visual details and motion
information to be extracted on all the considered images... but remember, that these two channels
are complementary. For example, if Magnocellular channel gives strong energy in an area, then, the
Parvocellular channel is certainly blurred there since there is a transient event.
As an illustration, we apply in the following the retina model on a webcam video stream of a dark
visual scene. In this visual scene, captured in an amphitheater of the university, some students are
moving while talking to the teacher.
In this video sequence, because of the dark ambiance, signal to noise ratio is low and color
artifacts are present on visual features edges because of the low quality image capture tool-chain.
![image](images/studentsSample_input.jpg)
Below is shown the retina foveal vision applied on the entire image. In the used retina
configuration, global luminance is preserved and local contrasts are enhanced. Also, signal to noise
ratio is improved : since high frequency spatio-temporal noise is reduced, enhanced details are not
corrupted by any enhanced noise.
![image](images/studentsSample_parvo.jpg)
Below is the output of the Magnocellular output of the retina model. Its signals are strong where
transient events occur. Here, a student is moving at the bottom of the image thus generating high
energy. The remaining of the image is static however, it is corrupted by a strong noise. Here, the
retina filters out most of the noise thus generating low false motion area 'alarms'. This channel
can be used as a transient/moving areas detector : it would provide relevant information for a low
cost segmentation tool that would highlight areas in which an event is occurring.
![image](images/studentsSample_magno.jpg)
Retina use case
---------------
This model can be used basically for spatio-temporal video effects but also in the aim of :
- performing texture analysis with enhanced signal to noise ratio and enhanced details robust
against input images luminance ranges (check out the Parvocellular retina channel output)
- performing motion analysis also taking benefit of the previously cited properties.
Literature
----------
For more information, refer to the following papers : @cite Benoit2010
- Please have a look at the reference work of Jeanny Herault that you can read in his book @cite Herault2010
This retina filter code includes the research contributions of phd/research collegues from which
code has been redrawn by the author :
- take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color
mosaicing/demosaicing and his reference paper @cite Chaix2007
- take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which
originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is
also proposed and originates from Jeanny's discussions. More informations in the above cited
Jeanny Heraults's book.
Code tutorial
-------------
Please refer to the original tutorial source code in file
*opencv_folder/samples/cpp/tutorial_code/bioinspired/retina_tutorial.cpp*.
@note do not forget that the retina model is included in the following namespace: cv::bioinspired
To compile it, assuming OpenCV is correctly installed, use the following command. It requires the
opencv_core *(cv::Mat and friends objects management)*, opencv_highgui *(display and image/video
read)* and opencv_bioinspired *(Retina description)* libraries to compile.
@code{.sh}
// compile
gcc retina_tutorial.cpp -o Retina_tuto -lopencv_core -lopencv_highgui -lopencv_bioinspired
// Run commands : add 'log' as a last parameter to apply a spatial log sampling (simulates retina sampling)
// run on webcam
./Retina_tuto -video
// run on video file
./Retina_tuto -video myVideo.avi
// run on an image
./Retina_tuto -image myPicture.jpg
// run on an image with log sampling
./Retina_tuto -image myPicture.jpg log
@endcode
Here is a code explanation :
Retina definition is present in the bioinspired package and a simple include allows to use it. You
can rather use the specific header : *opencv2/bioinspired.hpp* if you prefer but then include the
other required openv modules : *opencv2/core.hpp* and *opencv2/highgui.hpp*
@code{.cpp}
#include "opencv2/opencv.hpp"
@endcode
Provide user some hints to run the program with a help function
@code{.cpp}
// the help procedure
static void help(std::string errorMessage)
{
std::cout<<"Program init error : "<<errorMessage<<std::endl;
std::cout<<"\nProgram call procedure : retinaDemo [processing mode] [Optional : media target] [Optional LAST parameter: \"log\" to activate retina log sampling]"<<std::endl;
std::cout<<"\t[processing mode] :"<<std::endl;
std::cout<<"\t -image : for still image processing"<<std::endl;
std::cout<<"\t -video : for video stream processing"<<std::endl;
std::cout<<"\t[Optional : media target] :"<<std::endl;
std::cout<<"\t if processing an image or video file, then, specify the path and filename of the target to process"<<std::endl;
std::cout<<"\t leave empty if processing video stream coming from a connected video device"<<std::endl;
std::cout<<"\t[Optional : activate retina log sampling] : an optional last parameter can be specified for retina spatial log sampling"<<std::endl;
std::cout<<"\t set \"log\" without quotes to activate this sampling, output frame size will be divided by 4"<<std::endl;
std::cout<<"\nExamples:"<<std::endl;
std::cout<<"\t-Image processing : ./retinaDemo -image lena.jpg"<<std::endl;
std::cout<<"\t-Image processing with log sampling : ./retinaDemo -image lena.jpg log"<<std::endl;
std::cout<<"\t-Video processing : ./retinaDemo -video myMovie.mp4"<<std::endl;
std::cout<<"\t-Live video processing : ./retinaDemo -video"<<std::endl;
std::cout<<"\nPlease start again with new parameters"<<std::endl;
std::cout<<"****************************************************"<<std::endl;
std::cout<<" NOTE : this program generates the default retina parameters file 'RetinaDefaultParameters.xml'"<<std::endl;
std::cout<<" => you can use this to fine tune parameters and load them if you save to file 'RetinaSpecificParameters.xml'"<<std::endl;
}
@endcode
Then, start the main program and first declare a *cv::Mat* matrix in which input images will be
loaded. Also allocate a *cv::VideoCapture* object ready to load video streams (if necessary)
@code{.cpp}
int main(int argc, char* argv[]) {
// declare the retina input buffer... that will be fed differently in regard of the input media
cv::Mat inputFrame;
cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here
@endcode
In the main program, before processing, first check input command parameters. Here it loads a first
input image coming from a single loaded image (if user chose command *-image*) or from a video
stream (if user chose command *-video*). Also, if the user added *log* command at the end of its
program call, the spatial logarithmic image sampling performed by the retina is taken into account
by the Boolean flag *useLogSampling*.
@code{.cpp}
// welcome message
std::cout<<"****************************************************"<<std::endl;
std::cout<<"* Retina demonstration : demonstrates the use of is a wrapper class of the Gipsa/Listic Labs retina model."<<std::endl;
std::cout<<"* This demo will try to load the file 'RetinaSpecificParameters.xml' (if exists).\nTo create it, copy the autogenerated template 'RetinaDefaultParameters.xml'.\nThen twaek it with your own retina parameters."<<std::endl;
// basic input arguments checking
if (argc<2)
{
help("bad number of parameter");
return -1;
}
bool useLogSampling = !strcmp(argv[argc-1], "log"); // check if user wants retina log sampling processing
std::string inputMediaType=argv[1];
//////////////////////////////////////////////////////////////////////////////
// checking input media type (still image, video file, live video acquisition)
if (!strcmp(inputMediaType.c_str(), "-image") && argc >= 3)
{
std::cout<<"RetinaDemo: processing image "<<argv[2]<<std::endl;
// image processing case
inputFrame = cv::imread(std::string(argv[2]), 1); // load image in RGB mode
}else
if (!strcmp(inputMediaType.c_str(), "-video"))
{
if (argc == 2 || (argc == 3 && useLogSampling)) // attempt to grab images from a video capture device
{
videoCapture.open(0);
}else// attempt to grab images from a video filestream
{
std::cout<<"RetinaDemo: processing video stream "<<argv[2]<<std::endl;
videoCapture.open(argv[2]);
}
// grab a first frame to check if everything is ok
videoCapture>>inputFrame;
}else
{
// bad command parameter
help("bad command parameter");
return -1;
}
@endcode
Once all input parameters are processed, a first image should have been loaded, if not, display
error and stop program :
@code{.cpp}
if (inputFrame.empty())
{
help("Input media could not be loaded, aborting");
return -1;
}
@endcode
Now, everything is ready to run the retina model. I propose here to allocate a retina instance and
to manage the eventual log sampling option. The Retina constructor expects at least a cv::Size
object that shows the input data size that will have to be managed. One can activate other options
such as color and its related color multiplexing strategy (here Bayer multiplexing is chosen using
*enum cv::bioinspired::RETINA_COLOR_BAYER*). If using log sampling, the image reduction factor
(smaller output images) and log sampling strengh can be adjusted.
@code{.cpp}
// pointer to a retina object
cv::Ptr<cv::bioinspired::Retina> myRetina;
// if the last parameter is 'log', then activate log sampling (favour foveal vision and subsamples peripheral vision)
if (useLogSampling)
{
myRetina = cv::bioinspired::createRetina(inputFrame.size(), true, cv::bioinspired::RETINA_COLOR_BAYER, true, 2.0, 10.0);
}
else// -> else allocate "classical" retina :
myRetina = cv::bioinspired::createRetina(inputFrame.size());
@endcode
Once done, the proposed code writes a default xml file that contains the default parameters of the
retina. This is useful to make your own config using this template. Here generated template xml file
is called *RetinaDefaultParameters.xml*.
@code{.cpp}
// save default retina parameters file in order to let you see this and maybe modify it and reload using method "setup"
myRetina->write("RetinaDefaultParameters.xml");
@endcode
In the following line, the retina attempts to load another xml file called
*RetinaSpecificParameters.xml*. If you created it and introduced your own setup, it will be loaded,
in the other case, default retina parameters are used.
@code{.cpp}
// load parameters if file exists
myRetina->setup("RetinaSpecificParameters.xml");
@endcode
It is not required here but just to show it is possible, you can reset the retina buffers to zero to
force it to forget past events.
@code{.cpp}
// reset all retina buffers (imagine you close your eyes for a long time)
myRetina->clearBuffers();
@endcode
Now, it is time to run the retina ! First create some output buffers ready to receive the two retina
channels outputs
@code{.cpp}
// declare retina output buffers
cv::Mat retinaOutput_parvo;
cv::Mat retinaOutput_magno;
@endcode
Then, run retina in a loop, load new frames from video sequence if necessary and get retina outputs
back to dedicated buffers.
@code{.cpp}
// processing loop with no stop condition
while(true)
{
// if using video stream, then, grabbing a new frame, else, input remains the same
if (videoCapture.isOpened())
videoCapture>>inputFrame;
// run retina filter on the loaded input frame
myRetina->run(inputFrame);
// Retrieve and display retina output
myRetina->getParvo(retinaOutput_parvo);
myRetina->getMagno(retinaOutput_magno);
cv::imshow("retina input", inputFrame);
cv::imshow("Retina Parvo", retinaOutput_parvo);
cv::imshow("Retina Magno", retinaOutput_magno);
cv::waitKey(10);
}
@endcode
That's done ! But if you want to secure the system, take care and manage Exceptions. The retina can
throw some when it sees irrelevant data (no input frame, wrong setup, etc.). Then, i recommend to
surround all the retina code by a try/catch system like this :
@code{.cpp}
try{
// pointer to a retina object
cv::Ptr<cv::Retina> myRetina;
[---]
// processing loop with no stop condition
while(true)
{
[---]
}
}catch(cv::Exception e)
{
std::cerr<<"Error using Retina : "<<e.what()<<std::endl;
}
@endcode
Retina parameters, what to do ?
-------------------------------
First, it is recommended to read the reference paper @cite Benoit2010
Once done open the configuration file *RetinaDefaultParameters.xml* generated by the demo and let's
have a look at it.
@code{.cpp}
<?xml version="1.0"?>
<opencv_storage>
<OPLandIPLparvo>
<colorMode>1</colorMode>
<normaliseOutput>1</normaliseOutput>
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity>
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant>
<photoreceptorsSpatialConstant>5.7e-01</photoreceptorsSpatialConstant>
<horizontalCellsGain>0.01</horizontalCellsGain>
<hcellsTemporalConstant>0.5</hcellsTemporalConstant>
<hcellsSpatialConstant>7.</hcellsSpatialConstant>
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo>
<IPLmagno>
<normaliseOutput>1</normaliseOutput>
<parasolCells_beta>0.</parasolCells_beta>
<parasolCells_tau>0.</parasolCells_tau>
<parasolCells_k>7.</parasolCells_k>
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency>
<V0CompressionParameter>9.5e-01</V0CompressionParameter>
<localAdaptintegration_tau>0.</localAdaptintegration_tau>
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno>
</opencv_storage>
@endcode
Here are some hints but actually, the best parameter setup depends more on what you want to do with
the retina rather than the images input that you give to retina. Apart from the more specific case
of High Dynamic Range images (HDR) that require more specific setup for specific luminance
compression objective, the retina behaviors should be rather stable from content to content. Note
that OpenCV is able to manage such HDR format thanks to the OpenEXR images compatibility.
Then, if the application target requires details enhancement prior to specific image processing, you
need to know if mean luminance information is required or not. If not, the the retina can cancel or
significantly reduce its energy thus giving more visibility to higher spatial frequency details.
### Basic parameters
The most simple parameters are the following :
- **colorMode** : let the retina process color information (if 1) or gray scale images (if 0). In
this last case, only the first channel of the input will be processed.
- **normaliseOutput** : each channel has this parameter, if value is 1, then the considered
channel output is rescaled between 0 and 255. Take care in this case at the Magnocellular output
level (motion/transient channel detection). Residual noise will also be rescaled !
**Note :** using color requires color channels multiplexing/demultipexing which requires more
processing. You can expect much faster processing using gray levels : it would require around 30
product per pixel for all the retina processes and it has recently been parallelized for multicore
architectures.
### Photo-receptors parameters
The following parameters act on the entry point of the retina - photo-receptors - and impact all the
following processes. These sensors are low pass spatio-temporal filters that smooth temporal and
spatial data and also adjust there sensitivity to local luminance thus improving details extraction
and high frequency noise canceling.
- **photoreceptorsLocalAdaptationSensitivity** between 0 and 1. Values close to 1 allow high
luminance log compression effect at the photo-receptors level. Values closer to 0 give a more
linear sensitivity. Increased alone, it can burn the *Parvo (details channel)* output image. If
adjusted in collaboration with **ganglionCellsSensitivity** images can be very contrasted
whatever the local luminance there is... at the price of a naturalness decrease.
- **photoreceptorsTemporalConstant** this setups the temporal constant of the low pass filter
effect at the entry of the retina. High value lead to strong temporal smoothing effect : moving
objects are blurred and can disappear while static object are favored. But when starting the
retina processing, stable state is reached lately.
- **photoreceptorsSpatialConstant** specifies the spatial constant related to photo-receptors low
pass filter effect. This parameters specify the minimum allowed spatial signal period allowed in
the following. Typically, this filter should cut high frequency noise. Then a 0 value doesn't
cut anything noise while higher values start to cut high spatial frequencies and more and more
lower frequencies... Then, do not go to high if you wanna see some details of the input images !
A good compromise for color images is 0.53 since this won't affect too much the color spectrum.
Higher values would lead to gray and blurred output images.
### Horizontal cells parameters
This parameter set tunes the neural network connected to the photo-receptors, the horizontal cells.
It modulates photo-receptors sensitivity and completes the processing for final spectral whitening
(part of the spatial band pass effect thus favoring visual details enhancement).
- **horizontalCellsGain** here is a critical parameter ! If you are not interested by the mean
luminance and focus on details enhancement, then, set to zero. But if you want to keep some
environment luminance data, let some low spatial frequencies pass into the system and set a
higher value (\<1).
- **hcellsTemporalConstant** similar to photo-receptors, this acts on the temporal constant of a
low pass temporal filter that smooths input data. Here, a high value generates a high retina
after effect while a lower value makes the retina more reactive. This value should be lower than
**photoreceptorsTemporalConstant** to limit strong retina after effects.
- **hcellsSpatialConstant** is the spatial constant of the low pass filter of these cells filter.
It specifies the lowest spatial frequency allowed in the following. Visually, a high value leads
to very low spatial frequencies processing and leads to salient halo effects. Lower values
reduce this effect but the limit is : do not go lower than the value of
**photoreceptorsSpatialConstant**. Those 2 parameters actually specify the spatial band-pass of
the retina.
**NOTE** after the processing managed by the previous parameters, input data is cleaned from noise
and luminance in already partly enhanced. The following parameters act on the last processing stages
of the two outing retina signals.
### Parvo (details channel) dedicated parameter
- **ganglionCellsSensitivity** specifies the strength of the final local adaptation occurring at
the output of this details dedicated channel. Parameter values remain between 0 and 1. Low value
tend to give a linear response while higher values enforces the remaining low contrasted areas.
**Note :** this parameter can correct eventual burned images by favoring low energetic details of
the visual scene, even in bright areas.
### IPL Magno (motion/transient channel) parameters
Once image information is cleaned, this channel acts as a high pass temporal filter that only
selects signals related to transient signals (events, motion, etc.). A low pass spatial filter
smooths extracted transient data and a final logarithmic compression enhances low transient events
thus enhancing event sensitivity.
- **parasolCells_beta** generally set to zero, can be considered as an amplifier gain at the
entry point of this processing stage. Generally set to 0.
- **parasolCells_tau** the temporal smoothing effect that can be added
- **parasolCells_k** the spatial constant of the spatial filtering effect, set it at a high value
to favor low spatial frequency signals that are lower subject to residual noise.
- **amacrinCellsTemporalCutFrequency** specifies the temporal constant of the high pass filter.
High values let slow transient events to be selected.
- **V0CompressionParameter** specifies the strength of the log compression. Similar behaviors to
previous description but here it enforces sensitivity of transient events.
- **localAdaptintegration_tau** generally set to 0, no real use here actually
- **localAdaptintegration_k** specifies the size of the area on which local adaptation is
performed. Low values lead to short range local adaptation (higher sensitivity to noise), high
values secure log compression.
Interactive Visual Debugging of Computer Vision applications {#tutorial_cvv_introduction}
============================================================
What is the most common way to debug computer vision applications? Usually the answer is temporary,
hacked together, custom code that must be removed from the code for release compilation.
In this tutorial we will show how to use the visual debugging features of the **cvv** module
(*opencv2/cvv.hpp*) instead.
Goals
-----
In this tutorial you will learn how to:
- Add cvv debug calls to your application
- Use the visual debug GUI
- Enable and disable the visual debug features during compilation (with zero runtime overhead when
disabled)
Code
----
The example code
- captures images (*videoio*), e.g. from a webcam,
- applies some filters to each image (*imgproc*),
- detects image features and matches them to the previous image (*features2d*).
If the program is compiled without visual debugging (see CMakeLists.txt below) the only result is
some information printed to the command line. We want to demonstrate how much debugging or
development functionality is added by just a few lines of *cvv* commands.
@includelineno cvv/samples/cvv_demo.cpp
@code{.cmake}
cmake_minimum_required(VERSION 2.8)
project(cvvisual_test)
SET(CMAKE_PREFIX_PATH ~/software/opencv/install)
SET(CMAKE_CXX_COMPILER "g++-4.8")
SET(CMAKE_CXX_FLAGS "-std=c++11 -O2 -pthread -Wall -Werror")
# (un)set: cmake -DCVV_DEBUG_MODE=OFF ..
OPTION(CVV_DEBUG_MODE "cvvisual-debug-mode" ON)
if(CVV_DEBUG_MODE MATCHES ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DCVVISUAL_DEBUGMODE")
endif()
FIND_PACKAGE(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
add_executable(cvvt main.cpp)
target_link_libraries(cvvt
opencv_core opencv_videoio opencv_imgproc opencv_features2d
opencv_cvv
)
@endcode
Explanation
-----------
-# We compile the program either using the above CmakeLists.txt with Option *CVV_DEBUG_MODE=ON*
(*cmake -DCVV_DEBUG_MODE=ON*) or by adding the corresponding define *CVVISUAL_DEBUGMODE* to
our compiler (e.g. *g++ -DCVVISUAL_DEBUGMODE*).
-# The first cvv call simply shows the image (similar to *imshow*) with the imgIdString as comment.
@code{.cpp}
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str());
@endcode
The image is added to the overview tab in the visual debug GUI and the cvv call blocks.
![image](images/01_overview_single.jpg)
The image can then be selected and viewed
![image](images/02_single_image_view.jpg)
Whenever you want to continue in the code, i.e. unblock the cvv call, you can either continue
until the next cvv call (*Step*), continue until the last cvv call (*\>\>*) or run the
application until it exists (*Close*).
We decide to press the green *Step* button.
-# The next cvv calls are used to debug all kinds of filter operations, i.e. operations that take a
picture as input and return a picture as output.
@code{.cpp}
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray");
@endcode
As with every cvv call, you first end up in the overview.
![image](images/03_overview_two.jpg)
We decide not to care about the conversion to gray scale and press *Step*.
@code{.cpp}
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed");
@endcode
If you open the filter call, you will end up in the so called "DefaultFilterView". Both images
are shown next to each other and you can (synchronized) zoom into them.
![image](images/04_default_filter_view.jpg)
When you go to very high zoom levels, each pixel is annotated with its numeric values.
![image](images/05_default_filter_view_high_zoom.jpg)
We press *Step* twice and have a look at the dilated image.
@code{.cpp}
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges");
@endcode
The DefaultFilterView showing both images
![image](images/06_default_filter_view_edges.jpg)
Now we use the *View* selector in the top right and select the "DualFilterView". We select
"Changed Pixels" as filter and apply it (middle image).
![image](images/07_dual_filter_view_edges.jpg)
After we had a close look at these images, perhaps using different views, filters or other GUI
features, we decide to let the program run through. Therefore we press the yellow *\>\>* button.
The program will block at
@code{.cpp}
cvv::finalShow();
@endcode
and display the overview with everything that was passed to cvv in the meantime.
![image](images/08_overview_all.jpg)
-# The cvv debugDMatch call is used in a situation where there are two images each with a set of
descriptors that are matched to each other.
We pass both images, both sets of keypoints and their matching to the visual debug module.
@code{.cpp}
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str());
@endcode
Since we want to have a look at matches, we use the filter capabilities (*\#type match*) in the
overview to only show match calls.
![image](images/09_overview_filtered_type_match.jpg)
We want to have a closer look at one of them, e.g. to tune our parameters that use the matching.
The view has various settings how to display keypoints and matches. Furthermore, there is a
mouseover tooltip.
![image](images/10_line_match_view.jpg)
We see (visual debugging!) that there are many bad matches. We decide that only 70% of the
matches should be shown - those 70% with the lowest match distance.
![image](images/11_line_match_view_portion_selector.jpg)
Having successfully reduced the visual distraction, we want to see more clearly what changed
between the two images. We select the "TranslationMatchView" that shows to where the keypoint
was matched in a different way.
![image](images/12_translation_match_view_portion_selector.jpg)
It is easy to see that the cup was moved to the left during the two images.
Although, cvv is all about interactively *seeing* the computer vision bugs, this is complemented
by a "RawView" that allows to have a look at the underlying numeric data.
![image](images/13_raw_view.jpg)
-# There are many more useful features contained in the cvv GUI. For instance, one can group the
overview tab.
![image](images/14_overview_group_by_line.jpg)
Result
------
- By adding a view expressive lines to our computer vision program we can interactively debug it
through different visualizations.
- Once we are done developing/debugging we do not have to remove those lines. We simply disable
cvv debugging (*cmake -DCVV_DEBUG_MODE=OFF* or g++ without *-DCVVISUAL_DEBUGMODE*) and our
programs runs without any debug overhead.
Enjoy computer vision!
......@@ -45,7 +45,7 @@ the use of this software, even if advised of the possibility of such damage.
@defgroup face Face Recognition
- @ref face_changelog
- @ref face_tutorial
- @ref tutorial_face_main
*/
......
Face Recognition with OpenCV {#face_tutorial}
Face Recognition with OpenCV {#tutorial_face_main}
============================
[TOC]
Introduction {#face_tutorial_intro}
Introduction {#tutorial_face_intro}
============
[OpenCV (Open Source Computer Vision)](http://opencv.org) is a popular computer vision library
......@@ -36,7 +36,7 @@ users.
All code in this document is released under the [BSD
license](http://www.opensource.org/licenses/bsd-license), so feel free to use it for your projects.
Face Recognition {#face_tutorial_facerec}
Face Recognition {#tutorial_face_facerec}
----------------
Face recognition is an easy task for humans. Experiments in @cite Tu06 have shown, that even one to
......@@ -59,7 +59,7 @@ to face recognition. One of the first automated face recognition systems was des
the euclidean distance between feature vectors of a probe and reference image. Such a method is
robust against changes in illumination by its nature, but has a huge drawback: the accurate
registration of the marker points is complicated, even with state of the art algorithms. Some of the
latest work on geometric face recognition was carried out in @cite Bru92. A 22-dimensional feature
latest work on geometric face recognition was carried out in @cite Bru92 . A 22-dimensional feature
vector was used and experiments on large datasets have shown, that geometrical features alone my not
carry enough information for face recognition.
......@@ -71,7 +71,7 @@ transformation is optimal from a reconstruction standpoint, it doesn't take any
account. Imagine a situation where the variance is generated from external sources, let it be light.
The axes with maximum variance do not necessarily contain any discriminative information at all,
hence a classification becomes impossible. So a class-specific projection with a Linear Discriminant
Analysis was applied to face recognition in @cite BHK97. The basic idea is to minimize the variance
Analysis was applied to face recognition in @cite BHK97 . The basic idea is to minimize the variance
within a class, while maximizing the variance between the classes at the same time.
Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of
......@@ -82,7 +82,7 @@ Local Binary Patterns (@cite AHP04). It's still an open research question what's
preserve spatial information when applying a local feature extraction, because spatial information
is potentially useful information.
Face Database {#face_tutorial_facedb}
Face Database {#tutorial_face_facedb}
-------------
Let's get some data to experiment with first. I don't want to do a toy example here. We are doing
......@@ -123,7 +123,7 @@ Three interesting databases are (parts of the description are quoted from
same setup to take 16128 images of 28 people. The Extended Yale Facedatabase B is the merge of
the two databases, which is now known as Extended Yalefacedatabase B.
### Preparing the data {#face_tutorial_prepare}
### Preparing the data {#tutorial_face_prepare}
Once we have acquired some data, we'll need to read it in our program. In the demo applications I
have decided to read the images from a very simple CSV file. Why? Because it's the simplest
......@@ -131,9 +131,9 @@ platform-independent approach I can think of. However, if you know a simpler sol
about it. Basically all the CSV file needs to contain are lines composed of a filename followed by a
; followed by the label (as *integer number*), making up a line like this:
~~~
@code{.csv}
/path/to/image.ext;0
~~~
@endcode
Let's dissect the line. /path/to/image.ext is the path to an image, probably something like this if
you are in Windows: C:/faces/person0/image0.jpg. Then there is the separator ; and finally we assign
......@@ -143,7 +143,7 @@ same subjects (persons) should have the same label.
Download the AT&T Facedatabase from AT&T Facedatabase and the corresponding CSV file from at.txt,
which looks like this (file is without ... of course):
~~~
@code{.csv}
./at/s1/1.pgm;0
./at/s1/2.pgm;0
...
......@@ -152,20 +152,20 @@ which looks like this (file is without ... of course):
...
./at/s40/1.pgm;39
./at/s40/2.pgm;39
~~~
@endcode
Imagine I have extracted the files to D:/data/at and have downloaded the CSV file to D:/data/at.txt.
Then you would simply need to Search & Replace ./ with D:/data/. You can do that in an editor of
your choice, every sufficiently advanced editor can do this. Once you have a CSV file with valid
filenames and labels, you can run any of the demos by passing the path to the CSV file as parameter:
~~~
@code{.sh}
facerec_demo.exe D:/data/at.txt
~~~
@endcode
Please, see @ref face_tutorial_appendix_csv for details on creating CSV file.
Please, see @ref tutorial_face_appendix_csv for details on creating CSV file.
Eigenfaces {#face_tutorial_eigenfaces}
Eigenfaces {#tutorial_face_eigenfaces}
----------
The problem with the image representation we are given is its high dimensionality. Two-dimensional
......@@ -181,7 +181,7 @@ high-dimensional dataset is often described by correlated variables and therefor
meaningful dimensions account for most of the information. The PCA method finds the directions with
the greatest variance in the data, called principal components.
### Algorithmic Description of Eigenfaces method {#face_tutorial_eigenfaces_algo}
### Algorithmic Description of Eigenfaces method {#tutorial_face_eigenfaces_algo}
Let \f$X = \{ x_{1}, x_{2}, \ldots, x_{n} \}\f$ be a random vector with observations \f$x_i \in R^{d}\f$.
......@@ -237,7 +237,7 @@ The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they
normalized to unit length. I don't want to turn this into a publication, so please look into
@cite Duda01 for the derivation and proof of the equations.
### Eigenfaces in OpenCV {#face_tutorial_eigenfaces_use}
### Eigenfaces in OpenCV {#tutorial_face_eigenfaces_use}
For the first source code example, I'll go through it with you. I am first giving you the whole
source code listing, and after this we'll look at the most important lines in detail. Please note:
......@@ -258,7 +258,7 @@ We've already seen, that we can reconstruct a face from its lower dimensional ap
let's see how many Eigenfaces are needed for a good reconstruction. I'll do a subplot with
\f$10,30,\ldots,310\f$ Eigenfaces:
~~~{cpp}
@code{.cpp}
// Display or save the image reconstruction at some predefined steps:
for(int num_components = 10; num_components < 300; num_components+=15) {
// slice the eigenvectors from the model
......@@ -274,7 +274,7 @@ for(int num_components = 10; num_components < 300; num_components+=15) {
imwrite(format("%s/eigenface_reconstruction_%d.png", output_folder.c_str(), num_components), reconstruction);
}
}
~~~
@endcode
10 Eigenvectors are obviously not sufficient for a good image reconstruction, 50 Eigenvectors may
already be sufficient to encode important facial features. You'll get a good reconstruction with
......@@ -284,7 +284,7 @@ data. @cite Zhao03 is the perfect point to start researching for this:
![image](img/eigenface_reconstruction_opencv.png)
Fisherfaces {#face_tutorial_fisherfaces}
Fisherfaces {#tutorial_face_fisherfaces}
-----------
The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear
......@@ -300,16 +300,16 @@ for an example).
The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented
by the great statistician [Sir R. A. Fisher](http://en.wikipedia.org/wiki/Ronald_Fisher). He
successfully used it for classifying flowers in his 1936 paper *The use of multiple measurements in
taxonomic problems* @cite Fisher36. In order to find the combination of features that separates best
taxonomic problems* @cite Fisher36 . In order to find the combination of features that separates best
between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to
within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes
should cluster tightly together, while different classes are as far away as possible from each other
in the lower-dimensional representation. This was also recognized by
[Belhumeur](http://www.cs.columbia.edu/~belhumeur/), [Hespanha](http://www.ece.ucsb.edu/~hespanha/)
and [Kriegman](http://cseweb.ucsd.edu/~kriegman/) and so they applied a Discriminant Analysis to
face recognition in @cite BHK97.
face recognition in @cite BHK97 .
### Algorithmic Description of Fisherfaces method {#face_tutorial_fisherfaces_algo}
### Algorithmic Description of Fisherfaces method {#tutorial_face_fisherfaces_algo}
Let \f$X\f$ be a random vector with samples drawn from \f$c\f$ classes:
......@@ -365,7 +365,7 @@ given by:
\f[W = W_{fld}^{T} W_{pca}^{T}\f]
### Fisherfaces in OpenCV {#face_tutorial_fisherfaces_use}
### Fisherfaces in OpenCV {#tutorial_face_fisherfaces_use}
The source code for this demo application is also available in the src folder coming with this
documentation:
......@@ -393,7 +393,7 @@ reconstruction of the original image. For the Fisherfaces method we'll project t
each of the Fisherfaces instead. So you'll have a nice visualization, which feature each of the
Fisherfaces describes:
~~~{cpp}
@code{.cpp}
// Display or save the image reconstruction at some predefined steps:
for(int num_component = 0; num_component < min(16, W.cols); num_component++) {
// Slice the Fisherface from the model:
......@@ -409,13 +409,13 @@ for(int num_component = 0; num_component < min(16, W.cols); num_component++) {
imwrite(format("%s/fisherface_reconstruction_%d.png", output_folder.c_str(), num_component), reconstruction);
}
}
~~~
@endcode
The differences may be subtle for the human eyes, but you should be able to see some differences:
![image](img/fisherface_reconstruction_opencv.png)
Local Binary Patterns Histograms {#face_tutorial_lbph}
Local Binary Patterns Histograms {#tutorial_face_lbph}
--------------------------------
Eigenfaces and Fisherfaces take a somewhat holistic approach to face recognition. You treat your
......@@ -461,7 +461,7 @@ literature actually used a fixed 3 x 3 neighborhood just like this:
![image](img/lbp/lbp.png)
### Algorithmic Description of LBPH method {#face_tutorial_lbph_algo}
### Algorithmic Description of LBPH method {#tutorial_face_lbph_algo}
A more formal description of the LBP operator can be given as:
......@@ -481,7 +481,7 @@ s(x) =
This description enables you to capture very fine grained details in images. In fact the authors
were able to compete with state of the art results for texture classification. Soon after the
operator was published it was noted, that a fixed neighborhood fails to encode details differing in
scale. So the operator was extended to use a variable neighborhood in @cite AHP04. The idea is to
scale. So the operator was extended to use a variable neighborhood in @cite AHP04 . The idea is to
align an abritrary number of neighbors on a circle with a variable radius, which enables to capture
the following neighborhoods:
......@@ -523,27 +523,27 @@ regions and extract a histogram from each. The spatially enhanced feature vector
concatenating the local histograms (**not merging them**). These histograms are called *Local Binary
Patterns Histograms*.
### Local Binary Patterns Histograms in OpenCV {#face_tutorial_lbph_use}
### Local Binary Patterns Histograms in OpenCV {#tutorial_face_lbph_use}
The source code for this demo application is also available in the src folder coming with this
documentation:
@include src/facerec_lbph.cpp
Conclusion {#face_tutorial_conclusion}
Conclusion {#tutorial_face_conclusion}
----------
You've learned how to use the new FaceRecognizer in real applications. After reading the document
you also know how the algorithms work, so now it's time for you to experiment with the available
algorithms. Use them, improve them and let the OpenCV community participate!
Credits {#face_tutorial_credits}
Credits {#tutorial_face_credits}
-------
This document wouldn't be possible without the kind permission to use the face images of the *AT&T
Database of Faces* and the *Yale Facedatabase A/B*.
### The Database of Faces {#face_tutorial_credits_db}
### The Database of Faces {#tutorial_face_credits_db}
__Important: when using these images, please give credit to "AT&T Laboratories, Cambridge."__
......@@ -567,7 +567,7 @@ image number for that subject (between 1 and 10).
A copy of the database can be retrieved from:
[<http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip>](http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip).
### Yale Facedatabase A {#face_tutorial_credits_yalea}
### Yale Facedatabase A {#tutorial_face_credits_yalea}
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and
all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase
......@@ -579,7 +579,7 @@ w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, su
(Source:
[<http://cvc.yale.edu/projects/yalefaces/yalefaces.html>](http://cvc.yale.edu/projects/yalefaces/yalefaces.html))
### Yale Facedatabase B {#face_tutorial_credits_yaleb}
### Yale Facedatabase B {#tutorial_face_credits_yaleb}
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and
all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase
......@@ -607,14 +607,14 @@ experimental results with the cropped images, please reference the PAMI2005 pape
Appendix {#face_appendix}
--------
### Creating the CSV File {#face_tutorial_appendix_csv}
### Creating the CSV File {#tutorial_face_appendix_csv}
You don't really want to create the CSV file by hand. I have prepared you a little Python script
`create_csv.py` (you find it at `src/create_csv.py` coming with this tutorial) that automatically
creates you a CSV file. If you have your images in hierarchie like this
(`/basepath/<subject>/<image.ext>`):
~~~~~~
@code{.sh}
philipp@mango:~/facerec/data/at$ tree
.
|-- s1
......@@ -630,12 +630,12 @@ philipp@mango:~/facerec/data/at$ tree
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
~~~~~~
@endcode
Then simply call `create_csv.py` with the path to the folder, just like this and you could save the
output:
~~~~~~
@code{.sh}
philipp@mango:~/facerec/data$ python create_csv.py
at/s13/2.pgm;0
at/s13/7.pgm;0
......@@ -654,30 +654,30 @@ at/s17/9.pgm;1
at/s17/5.pgm;1
at/s17/3.pgm;1
[...]
~~~~~~
@endcode
Here is the script, if you can't find it:
@verbinclude src/create_csv.py
@verbinclude face/doc/src/create_csv.py
### Aligning Face Images {#face_tutorial_appendix_align}
### Aligning Face Images {#tutorial_face_appendix_align}
An accurate alignment of your image data is especially important in tasks like emotion detection,
were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've
prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the
face image you just need to call *CropFace(image, eye\_left, eye\_right, offset\_pct, dest\_sz)*,
face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*,
where:
- *eye\_left* is the position of the left eye
- *eye\_right* is the position of the right eye
- *offset\_pct* is the percent of the image you want to keep next to the eyes (horizontal,
- *eye_left* is the position of the left eye
- *eye_right* is the position of the right eye
- *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal,
vertical direction)
- *dest\_sz* is the size of the output image
- *dest_sz* is the size of the output image
If you are using the same *offset\_pct* and *dest\_sz* for your images, they are all aligned at the
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the
eyes.
@verbinclude src/crop_face.py
@verbinclude face/doc/src/crop_face.py
Imagine we are given [this photo of Arnold
Schwarzenegger](http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg), which is
......@@ -694,6 +694,6 @@ Configuration | Cropped, Scaled, Rotated Face
0.3 (30%), 0.3 (30%), (200,200) | ![](tutorial/gender_classification/arnie_30_30_200_200.jpg)
0.2 (20%), 0.2 (20%), (70,70) | ![](tutorial/gender_classification/arnie_20_20_70_70.jpg)
### CSV for the AT&T Facedatabase {#face_tutorial_appendix_attcsv}
### CSV for the AT&T Facedatabase {#tutorial_face_appendix_attcsv}
@verbinclude etc/at.txt
@verbinclude face/doc/etc/at.txt
......@@ -61,7 +61,7 @@ Discriminatively Trained Part Based Models for Object Detection
---------------------------------------------------------------
The object detector described below has been initially proposed by P.F. Felzenszwalb in
@cite Felzenszwalb2010a. It is based on a Dalal-Triggs detector that uses a single filter on histogram
@cite Felzenszwalb2010a . It is based on a Dalal-Triggs detector that uses a single filter on histogram
of oriented gradients (HOG) features to represent an object category. This detector uses a sliding
window approach, where a filter is applied at all positions and scales of an image. The first
innovation is enriching the Dalal-Triggs model using a star-structured part-based model defined by a
......@@ -77,7 +77,7 @@ and scale is the maximum over components, of the score of that component model a
location.
The detector was dramatically speeded-up with cascade algorithm proposed by P.F. Felzenszwalb in
@cite Felzenszwalb2010b. The algorithm prunes partial hypotheses using thresholds on their scores.The
@cite Felzenszwalb2010b . The algorithm prunes partial hypotheses using thresholds on their scores.The
basic idea of the algorithm is to use a hierarchy of models defined by an ordering of the original
model's parts. For a model with (n+1) parts, including the root, a sequence of (n+1) models is
obtained. The i-th model in this sequence is defined by the first i parts from the original model.
......
Line Features Tutorial {#line_descriptor_tutorial}
======================
In this tutorial it will be shown how to:
- use the *BinaryDescriptor* interface to extract lines and store them in *KeyLine* objects
- use the same interface to compute descriptors for every extracted line
- use the *BynaryDescriptorMatcher* to determine matches among descriptors obtained from different
images
Lines extraction and descriptors computation
--------------------------------------------
In the following snippet of code, it is shown how to detect lines from an image. The LSD extractor
is initialized with *LSD\_REFINE\_ADV* option; remaining parameters are left to their default
values. A mask of ones is used in order to accept all extracted lines, which, at the end, are
displayed using random colors for octave 0.
~~~{cpp}
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
using namespace std;
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
cout << "\nThis example shows the functionalities of lines extraction " << "furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_lines_extraction <path_to_input_image>" << endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String image_path = parser.get<String>( 0 );
if( image_path.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat = imread( image_path, 1 );
if( imageMat.data == NULL )
{
std::cout << "Error, image could not be loaded. Please, check its path" << std::endl;
}
/* create a ramdom binary mask */
cv::Mat mask = Mat::ones( imageMat.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with deafult parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* create a structure to store extracted lines */
vector<KeyLine> lines;
/* extract lines */
bd->detect( imageMat, lines, mask );
/* draw lines extracted from octave 0 */
cv::Mat output = imageMat.clone();
if( output.channels() == 1 )
cvtColor( output, output, COLOR_GRAY2BGR );
for ( size_t i = 0; i < lines.size(); i++ )
{
KeyLine kl = lines[i];
if( kl.octave == 0)
{
/* get a random color */
int R = ( rand() % (int) ( 255 + 1 ) );
int G = ( rand() % (int) ( 255 + 1 ) );
int B = ( rand() % (int) ( 255 + 1 ) );
/* get extremes of line */
Point pt1 = Point( kl.startPointX, kl.startPointY );
Point pt2 = Point( kl.endPointX, kl.endPointY );
/* draw line */
line( output, pt1, pt2, Scalar( B, G, R ), 5 );
}
}
/* show lines on image */
imshow( "Lines", output );
waitKey();
}
~~~
This is the result obtained for famous cameraman image:
![alternate text](pics/lines_cameraman_edl.png)
Another way to extract lines is using *LSDDetector* class; such class uses the LSD extractor to
compute lines. To obtain this result, it is sufficient to use the snippet code seen above, just
modifying it by the rows
~~~{cpp}
/* create a pointer to an LSDDetector object */
Ptr<LSDDetector> lsd = LSDDetector::createLSDDetector();
/* compute lines */
std::vector<KeyLine> keylines;
lsd->detect( imageMat, keylines, mask );
~~~
Here's the result returned by LSD detector again on cameraman picture:
![alternate text](pics/cameraman_lines2.png)
Once keylines have been detected, it is possible to compute their descriptors as shown in the
following:
~~~{cpp}
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of lines extraction " << "and descriptors computation furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_compute_descriptors <path_to_input_image>"
<< std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from command line */
CommandLineParser parser( argc, argv, keys );
String image_path = parser.get<String>( 0 );
if( image_path.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat = imread( image_path, 1 );
if( imageMat.data == NULL )
{
std::cout << "Error, image could not be loaded. Please, check its path" << std::endl;
}
/* create a binary mask */
cv::Mat mask = Mat::ones( imageMat.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with default parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines */
std::vector<KeyLine> keylines;
bd->detect( imageMat, keylines, mask );
/* compute descriptors */
cv::Mat descriptors;
bd->compute( imageMat, keylines, descriptors );
}
~~~
Matching among descriptors
--------------------------
If we have extracted descriptors from two different images, it is possible to search for matches
among them. One way of doing it is matching exactly a descriptor to each input query descriptor,
choosing the one at closest distance:
~~~{cpp}
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
static const char* keys =
{ "{@image_path1 | | Image path 1 }"
"{@image_path2 | | Image path 2 }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of lines extraction " << "and descriptors computation furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_compute_descriptors <path_to_input_image 1>"
<< "<path_to_input_image 2>" << std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String image_path1 = parser.get<String>( 0 );
String image_path2 = parser.get<String>( 1 );
if( image_path1.empty() || image_path2.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat1 = imread( image_path1, 1 );
cv::Mat imageMat2 = imread( image_path2, 1 );
waitKey();
if( imageMat1.data == NULL || imageMat2.data == NULL )
{
std::cout << "Error, images could not be loaded. Please, check their path" << std::endl;
}
/* create binary masks */
cv::Mat mask1 = Mat::ones( imageMat1.size(), CV_8UC1 );
cv::Mat mask2 = Mat::ones( imageMat2.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with default parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines */
std::vector<KeyLine> keylines1, keylines2;
bd->detect( imageMat1, keylines1, mask1 );
bd->detect( imageMat2, keylines2, mask2 );
/* compute descriptors */
cv::Mat descr1, descr2;
bd->compute( imageMat1, keylines1, descr1 );
bd->compute( imageMat2, keylines2, descr2 );
/* create a BinaryDescriptorMatcher object */
Ptr<BinaryDescriptorMatcher> bdm = BinaryDescriptorMatcher::createBinaryDescriptorMatcher();
/* require match */
std::vector<DMatch> matches;
bdm->match( descr1, descr2, matches );
/* plot matches */
cv::Mat outImg;
std::vector<char> mask( matches.size(), 1 );
drawLineMatches( imageMat1, keylines1, imageMat2, keylines2, matches, outImg, Scalar::all( -1 ), Scalar::all( -1 ), mask,
DrawLinesMatchesFlags::DEFAULT );
imshow( "Matches", outImg );
waitKey();
}
~~~
Sometimes, we could be interested in searching for the closest *k* descriptors, given an input one.
This requires to modify slightly previous code:
~~~{cpp}
/* prepare a structure to host matches */
std::vector<std::vector<DMatch> > matches;
/* require knn match */
bdm->knnMatch( descr1, descr2, matches, 6 );
~~~
In the above example, the closest 6 descriptors are returned for every query. In some cases, we
could have a search radius and look for all descriptors distant at the most *r* from input query.
Previous code must me modified:
~~~{cpp}
/* prepare a structure to host matches */
std::vector<std::vector<DMatch> > matches;
/* compute matches */
bdm->radiusMatch( queries, matches, 30 );
~~~
Here's an example om matching among descriptors extratced from original cameraman image and its
downsampled (and blurred) version:
![alternate text](pics/matching2.png)
Querying internal database
--------------------------
The *BynaryDescriptorMatcher* class, owns an internal database that can be populated with
descriptors extracted from different images and queried using one of the modalities described in
previous section. Population of internal dataset can be done using the *add* function; such function
doesn't directly add new data to database, but it just stores it them locally. The real update
happens when function *train* is invoked or when any querying function is executed, since each of
them invokes *train* before querying. When queried, internal database not only returns required
descriptors, but, for every returned match, it is able to tell which image matched descriptor was
extracted from. An example of internal dataset usage is described in the following code; after
adding locally new descriptors, a radius search is invoked. This provokes local data to be
transferred to dataset, which, in turn, is then queried.
~~~{cpp}
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
#include <vector>
using namespace cv;
static const std::string images[] =
{ "cameraman.jpg", "church.jpg", "church2.png", "einstein.jpg", "stuff.jpg" };
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of radius matching " << "Please, run this sample using a command in the form\n"
<< "./example_line_descriptor_radius_matching <path_to_input_images>/" << std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String pathToImages = parser.get<String>( 0 );
/* create structures for hosting KeyLines and descriptors */
int num_elements = sizeof ( images ) / sizeof ( images[0] );
std::vector<Mat> descriptorsMat;
std::vector<std::vector<KeyLine> > linesMat;
/*create a pointer to a BinaryDescriptor object */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines and descriptors */
for ( int i = 0; i < num_elements; i++ )
{
/* get path to image */
std::stringstream image_path;
image_path << pathToImages << images[i];
/* load image */
Mat loadedImage = imread( image_path.str().c_str(), 1 );
if( loadedImage.data == NULL )
{
std::cout << "Could not load images." << std::endl;
help();
exit( -1 );
}
/* compute lines and descriptors */
std::vector<KeyLine> lines;
Mat computedDescr;
bd->detect( loadedImage, lines );
bd->compute( loadedImage, lines, computedDescr );
descriptorsMat.push_back( computedDescr );
linesMat.push_back( lines );
}
/* compose a queries matrix */
Mat queries;
for ( size_t j = 0; j < descriptorsMat.size(); j++ )
{
if( descriptorsMat[j].rows >= 5 )
queries.push_back( descriptorsMat[j].rowRange( 0, 5 ) );
else if( descriptorsMat[j].rows > 0 && descriptorsMat[j].rows < 5 )
queries.push_back( descriptorsMat[j] );
}
std::cout << "It has been generated a matrix of " << queries.rows << " descriptors" << std::endl;
/* create a BinaryDescriptorMatcher object */
Ptr<BinaryDescriptorMatcher> bdm = BinaryDescriptorMatcher::createBinaryDescriptorMatcher();
/* populate matcher */
bdm->add( descriptorsMat );
/* compute matches */
std::vector<std::vector<DMatch> > matches;
bdm->radiusMatch( queries, matches, 30 );
/* print matches */
for ( size_t q = 0; q < matches.size(); q++ )
{
for ( size_t m = 0; m < matches[q].size(); m++ )
{
DMatch dm = matches[q][m];
std::cout << "Descriptor: " << q << " Image: " << dm.imgIdx << " Distance: " << dm.distance << std::endl;
}
}
}
~~~
......@@ -63,8 +63,8 @@ Computation of binary descriptors
---------------------------------
To obtatin a binary descriptor representing a certain line detected from a certain octave of an
image, we first compute a non-binary descriptor as described in @cite LBD. Such algorithm works on
lines extracted using EDLine detector, as explained in @cite EDL. Given a line, we consider a
image, we first compute a non-binary descriptor as described in @cite LBD . Such algorithm works on
lines extracted using EDLine detector, as explained in @cite EDL . Given a line, we consider a
rectangular region centered at it and called *line support region (LSR)*. Such region is divided
into a set of bands \f$\{B_1, B_2, ..., B_m\}\f$, whose length equals the one of line.
......
......@@ -854,7 +854,7 @@ std::vector<cv::Mat> octaveImages;
Lines extraction methodology
----------------------------
The lines extraction methodology described in the following is mainly based on @cite EDL. The
The lines extraction methodology described in the following is mainly based on @cite EDL . The
extraction starts with a Gaussian pyramid generated from an original image, downsampled N-1 times,
blurred N times, to obtain N layers (one for each octave), with layer 0 corresponding to input
image. Then, from each layer (octave) in the pyramid, lines are extracted using LSD algorithm.
......@@ -931,7 +931,7 @@ based on *Multi-Index Hashing (MiHashing)* will be described.
Multi-Index Hashing
-------------------
The theory described in this section is based on @cite MIH. Given a dataset populated with binary
The theory described in this section is based on @cite MIH . Given a dataset populated with binary
codes, each code is indexed *m* times into *m* different hash tables, according to *m* substrings it
has been divided into. Thus, given a query code, all the entries close to it at least in one
substring are returned by search as *neighbor candidates*. Returned entries are then checked for
......
Line Features Tutorial {#tutorial_line_descriptor_main}
======================
In this tutorial it will be shown how to:
- use the *BinaryDescriptor* interface to extract lines and store them in *KeyLine* objects
- use the same interface to compute descriptors for every extracted line
- use the *BynaryDescriptorMatcher* to determine matches among descriptors obtained from different
images
Lines extraction and descriptors computation
--------------------------------------------
In the following snippet of code, it is shown how to detect lines from an image. The LSD extractor
is initialized with *LSD\_REFINE\_ADV* option; remaining parameters are left to their default
values. A mask of ones is used in order to accept all extracted lines, which, at the end, are
displayed using random colors for octave 0.
@includelineno line_descriptor/samples/lsd_lines_extraction.cpp
This is the result obtained for famous cameraman image:
![alternate text](pics/lines_cameraman_edl.png)
Another way to extract lines is using *LSDDetector* class; such class uses the LSD extractor to
compute lines. To obtain this result, it is sufficient to use the snippet code seen above, just
modifying it by the rows
@code{.cpp}
// create a pointer to an LSDDetector object
Ptr<LSDDetector> lsd = LSDDetector::createLSDDetector();
// compute lines
std::vector<KeyLine> keylines;
lsd->detect( imageMat, keylines, mask );
@endcode
Here's the result returned by LSD detector again on cameraman picture:
![alternate text](pics/cameraman_lines2.png)
Once keylines have been detected, it is possible to compute their descriptors as shown in the
following:
@includelineno line_descriptor/samples/compute_descriptors.cpp
Matching among descriptors
--------------------------
If we have extracted descriptors from two different images, it is possible to search for matches
among them. One way of doing it is matching exactly a descriptor to each input query descriptor,
choosing the one at closest distance:
@includelineno line_descriptor/samples/matching.cpp
Sometimes, we could be interested in searching for the closest *k* descriptors, given an input one.
This requires to modify slightly previous code:
@code{.cpp}
// prepare a structure to host matches
std::vector<std::vector<DMatch> > matches;
// require knn match
bdm->knnMatch( descr1, descr2, matches, 6 );
@endcode
In the above example, the closest 6 descriptors are returned for every query. In some cases, we
could have a search radius and look for all descriptors distant at the most *r* from input query.
Previous code must me modified:
@code{.cpp}
// prepare a structure to host matches
std::vector<std::vector<DMatch> > matches;
// compute matches
bdm->radiusMatch( queries, matches, 30 );
@endcode
Here's an example om matching among descriptors extratced from original cameraman image and its
downsampled (and blurred) version:
![alternate text](pics/matching2.png)
Querying internal database
--------------------------
The *BynaryDescriptorMatcher* class, owns an internal database that can be populated with
descriptors extracted from different images and queried using one of the modalities described in
previous section. Population of internal dataset can be done using the *add* function; such function
doesn't directly add new data to database, but it just stores it them locally. The real update
happens when function *train* is invoked or when any querying function is executed, since each of
them invokes *train* before querying. When queried, internal database not only returns required
descriptors, but, for every returned match, it is able to tell which image matched descriptor was
extracted from. An example of internal dataset usage is described in the following code; after
adding locally new descriptors, a radius search is invoked. This provokes local data to be
transferred to dataset, which, in turn, is then queried.
@includelineno line_descriptor/samples/radius_matching.cpp
......@@ -99,7 +99,7 @@ for pixel
@param speed_up_thr threshold to detect point with irregular flow - where flow should be
recalculated after upscale
See @cite Tao2012. And site of project - <http://graphics.berkeley.edu/papers/Tao-SAN-2012-05/>.
See @cite Tao2012 . And site of project - <http://graphics.berkeley.edu/papers/Tao-SAN-2012-05/>.
@note
- An example using the simpleFlow algorithm can be found at samples/simpleflow_demo.cpp
......
......@@ -66,7 +66,7 @@ That is, MHI pixels where the motion occurs are set to the current timestamp , w
where the motion happened last time a long time ago are cleared.
The function, together with calcMotionGradient and calcGlobalOrientation , implements a motion
templates technique described in @cite Davis97 and @cite Bradski00.
templates technique described in @cite Davis97 and @cite Bradski00 .
*/
CV_EXPORTS_W void updateMotionHistory( InputArray silhouette, InputOutputArray mhi,
double timestamp, double duration );
......
......@@ -45,7 +45,7 @@
The Registration module implements parametric image registration. The implemented method is direct
alignment, that is, it uses directly the pixel values for calculating the registration between a
pair of images, as opposed to feature-based registration. The implementation follows essentially the
corresponding part of @cite Szeliski06.
corresponding part of @cite Szeliski06 .
Feature based methods have some advantages over pixel based methods when we are trying to register
pictures that have been shoot under different lighting conditions or exposition times, or when the
......
......@@ -365,7 +365,7 @@ accurate representation. However, note that number of point pair features to be
quadratically increased as the complexity is O(N\^2). This is especially a concern for 32 bit
systems, where large models can easily overshoot the available memory. Typically, values in the
range of 0.025 - 0.05 seem adequate for most of the applications, where the default value is 0.03.
(Note that there is a difference in this paremeter with the one presented in @cite drost2010. In
(Note that there is a difference in this paremeter with the one presented in @cite drost2010 . In
@cite drost2010 a uniform cuboid is used for quantization and model diameter is used for reference of
sampling. In my implementation, the cuboid is a rectangular prism, and each dimension is quantized
independently. I do not take reference from the diameter but along the individual dimensions.
......
Tracking diagrams {#tracking_diagrams}
=================
General diagram
===============
@startuml{tracking_uml_general.png}
package "Tracker"
package "TrackerFeature"
package "TrackerSampler"
package "TrackerModel"
Tracker -> TrackerModel: create
Tracker -> TrackerSampler: create
Tracker -> TrackerFeature: create
@enduml
Tracker diagram
===============
@startuml{tracking_uml_tracking.png}
package "Tracker package" #DDDDDD {
class Algorithm
class Tracker{
Ptr<TrackerFeatureSet> featureSet;
Ptr<TrackerSampler> sampler;
Ptr<TrackerModel> model;
---
+static Ptr<Tracker> create(const string& trackerType);
+bool init(const Mat& image, const Rect& boundingBox);
+bool update(const Mat& image, Rect& boundingBox);
}
class Tracker
note right: Tracker is the general interface for each specialized trackers
class TrackerMIL{
+static Ptr<TrackerMIL> createTracker(const TrackerMIL::Params &parameters);
+virtual ~TrackerMIL();
}
class TrackerBoosting{
+static Ptr<TrackerBoosting> createTracker(const TrackerBoosting::Params &parameters);
+virtual ~TrackerBoosting();
}
Algorithm <|-- Tracker : virtual inheritance
Tracker <|-- TrackerMIL
Tracker <|-- TrackerBoosting
note "Single instance of the Tracker" as N1
TrackerBoosting .. N1
TrackerMIL .. N1
}
@enduml
TrackerFeatureSet diagram
=========================
@startuml{tracking_uml_feature.png}
package "TrackerFeature package" #DDDDDD {
class TrackerFeatureSet{
-vector<pair<string, Ptr<TrackerFeature> > > features
-vector<Mat> responses
...
TrackerFeatureSet();
~TrackerFeatureSet();
--
+extraction(const std::vector<Mat>& images);
+selection();
+removeOutliers();
+vector<Mat> response getResponses();
+vector<pair<string TrackerFeatureType, Ptr<TrackerFeature> > > getTrackerFeatures();
+bool addTrackerFeature(string trackerFeatureType);
+bool addTrackerFeature(Ptr<TrackerFeature>& feature);
-clearResponses();
}
class TrackerFeature <<virtual>>{
static Ptr<TrackerFeature> = create(const string& trackerFeatureType);
compute(const std::vector<Mat>& images, Mat& response);
selection(Mat& response, int npoints);
}
note bottom: Can be specialized as in table II\nA tracker can use more types of features
class TrackerFeatureFeature2D{
-vector<Keypoints> keypoints
---
TrackerFeatureFeature2D(string detectorType, string descriptorType);
~TrackerFeatureFeature2D();
---
compute(const std::vector<Mat>& images, Mat& response);
selection( Mat& response, int npoints);
}
class TrackerFeatureHOG{
TrackerFeatureHOG();
~TrackerFeatureHOG();
---
compute(const std::vector<Mat>& images, Mat& response);
selection(Mat& response, int npoints);
}
TrackerFeatureSet *-- TrackerFeature
TrackerFeature <|-- TrackerFeatureHOG
TrackerFeature <|-- TrackerFeatureFeature2D
note "Per readability and simplicity in this diagram\n there are only two TrackerFeature but you\n can considering the implementation of the other TrackerFeature" as N1
TrackerFeatureHOG .. N1
TrackerFeatureFeature2D .. N1
}
@enduml
TrackerModel diagram
====================
@startuml{tracking_uml_model.png}
package "TrackerModel package" #DDDDDD {
class Typedef << (T,#FF7700) >>{
ConfidenceMap
Trajectory
}
class TrackerModel{
-vector<ConfidenceMap> confidenceMaps;
-Trajectory trajectory;
-Ptr<TrackerStateEstimator> stateEstimator;
...
TrackerModel();
~TrackerModel();
+bool setTrackerStateEstimator(Ptr<TrackerStateEstimator> trackerStateEstimator);
+Ptr<TrackerStateEstimator> getTrackerStateEstimator();
+void modelEstimation(const vector<Mat>& responses);
+void modelUpdate();
+void setLastTargetState(const Ptr<TrackerTargetState> lastTargetState);
+void runStateEstimator();
+const vector<ConfidenceMap>& getConfidenceMaps();
+const ConfidenceMap& getLastConfidenceMap();
}
class TrackerTargetState <<virtual>>{
Point2f targetPosition;
---
Point2f getTargetPosition();
void setTargetPosition(Point2f position);
}
class TrackerTargetState
note bottom: Each tracker can create own state
class TrackerStateEstimator <<virtual>>{
~TrackerStateEstimator();
static Ptr<TrackerStateEstimator> create(const String& trackeStateEstimatorType);
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
class TrackerStateEstimatorSVM{
TrackerStateEstimatorSVM()
~TrackerStateEstimatorSVM()
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
class TrackerStateEstimatorMILBoosting{
TrackerStateEstimatorMILBoosting()
~TrackerStateEstimatorMILBoosting()
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
TrackerModel -> TrackerStateEstimator: create
TrackerModel *-- TrackerTargetState
TrackerStateEstimator <|-- TrackerStateEstimatorMILBoosting
TrackerStateEstimator <|-- TrackerStateEstimatorSVM
}
@enduml
TrackerSampler diagram
======================
@startuml{tracking_uml_sampler.png}
package "TrackerSampler package" #DDDDDD {
class TrackerSampler{
-vector<pair<String, Ptr<TrackerSamplerAlgorithm> > > samplers
-vector<Mat> samples;
...
TrackerSampler();
~TrackerSampler();
+sampling(const Mat& image, Rect boundingBox);
+const vector<pair<String, Ptr<TrackerSamplerAlgorithm> > >& getSamplers();
+const vector<Mat>& getSamples();
+bool addTrackerSamplerAlgorithm(String trackerSamplerAlgorithmType);
+bool addTrackerSamplerAlgorithm(Ptr<TrackerSamplerAlgorithm>& sampler);
---
-void clearSamples();
}
class TrackerSamplerAlgorithm{
~TrackerSamplerAlgorithm();
+static Ptr<TrackerSamplerAlgorithm> create(const String& trackerSamplerType);
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
note bottom: A tracker could sample the target\nor it could sample the target and the background
class TrackerSamplerCS{
TrackerSamplerCS();
~TrackerSamplerCS();
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
class TrackerSamplerCSC{
TrackerSamplerCSC();
~TrackerSamplerCSC();
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
}
@enduml
......@@ -52,7 +52,7 @@ Long-term optical tracking API
Long-term optical tracking is one of most important issue for many computer vision applications in
real world scenario. The development in this area is very fragmented and this API is an unique
interface useful for plug several algorithms and compare them. This work is partially based on
@cite AAM and @cite AMVOT.
@cite AAM and @cite AMVOT .
This algorithms start from a bounding box of the target and with their internal representation they
avoid the drift during the tracking. These long-term trackers are able to evaluate online the
......@@ -67,36 +67,15 @@ most likely target states). The class TrackerTargetState represents a possible s
The TrackerSampler and the TrackerFeatureSet are the visual representation of the target, instead
the TrackerModel is the statistical model.
A recent benchmark between these algorithms can be found in @cite OOT.
A recent benchmark between these algorithms can be found in @cite OOT
UML design:
-----------
**General diagram**
![General diagram](pics/package.png)
**Tracker diagram**
![Tracker diagram](pics/Tracker.png)
**TrackerSampler diagram**
![TrackerSampler diagram](pics/TrackerSampler.png)
**TrackerFeatureSet diagram**
![TrackerFeatureSet diagram](pics/TrackerFeature.png)
**TrackerModel diagram**
![TrackerModel diagram](pics/TrackerModel.png)
UML design: see @ref tracking_diagrams
To see how API works, try tracker demo:
<https://github.com/lenlen/opencv/blob/tracking_api/samples/cpp/tracker.cpp>
@note This Tracking API has been designed with PlantUML. If you modify this API please change UML
files under modules/tracking/misc/ The following reference was used in the API
in <em>modules/tracking/doc/tracking_diagrams.markdown</em>. The following reference was used in the API
Creating Own Tracker
--------------------
......
......@@ -1073,7 +1073,7 @@ class CV_EXPORTS_W TrackerFeatureLBP : public TrackerFeature
background.
Multiple Instance Learning avoids the drift problem for a robust tracking. The implementation is
based on @cite MIL.
based on @cite MIL .
Original code can be found here <http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml>
*/
......@@ -1105,7 +1105,7 @@ class CV_EXPORTS_W TrackerMIL : public Tracker
/** @brief This is a real-time object tracking based on a novel on-line version of the AdaBoost algorithm.
The classifier uses the surrounding background as negative examples in update step to avoid the
drifting problem. The implementation is based on @cite OLB.
drifting problem. The implementation is based on @cite OLB .
*/
class CV_EXPORTS_W TrackerBoosting : public Tracker
{
......@@ -1137,7 +1137,7 @@ class CV_EXPORTS_W TrackerBoosting : public Tracker
/** @brief Median Flow tracker implementation.
Implementation of a paper @cite MedianFlow.
Implementation of a paper @cite MedianFlow .
The tracker is suitable for very smooth and predictable movements when object is visible throughout
the whole sequence. It's quite and accurate for this type of problems (in particular, it was shown
......@@ -1168,7 +1168,7 @@ tracking, learning and detection.
The tracker follows the object from frame to frame. The detector localizes all appearances that
have been observed so far and corrects the tracker if necessary. The learning estimates detector’s
errors and updates it to avoid these errors in the future. The implementation is based on @cite TLD.
errors and updates it to avoid these errors in the future. The implementation is based on @cite TLD .
The Median Flow algorithm (see cv::TrackerMedianFlow) was chosen as a tracking component in this
implementation, following authors. Tracker is supposed to be able to handle rapid motions, partial
......
......@@ -64,7 +64,7 @@ namespace xfeatures2d
//! @addtogroup xfeatures2d_experiment
//! @{
/** @brief Class implementing the FREAK (*Fast Retina Keypoint*) keypoint descriptor, described in @cite AOV12.
/** @brief Class implementing the FREAK (*Fast Retina Keypoint*) keypoint descriptor, described in @cite AOV12 .
The algorithm propose a novel keypoint descriptor inspired by the human visual system and more
precisely the retina, coined Fast Retina Key- point (FREAK). A cascade of binary strings is
......@@ -116,7 +116,7 @@ public:
* BRIEF Descriptor
*/
/** @brief Class for computing BRIEF descriptors described in @cite calon2010
/** @brief Class for computing BRIEF descriptors described in @cite calon2010 .
@note
- A complete BRIEF extractor sample can be found at
......
......@@ -54,7 +54,7 @@ namespace xfeatures2d
//! @{
/** @brief Class for extracting keypoints and computing descriptors using the Scale Invariant Feature Transform
(SIFT) algorithm by D. Lowe @cite Lowe04.
(SIFT) algorithm by D. Lowe @cite Lowe04 .
*/
class CV_EXPORTS_W SIFT : public Feature2D
{
......@@ -84,7 +84,7 @@ public:
typedef SIFT SiftFeatureDetector;
typedef SIFT SiftDescriptorExtractor;
/** @brief Class for extracting Speeded Up Robust Features from an image @cite Bay06.
/** @brief Class for extracting Speeded Up Robust Features from an image @cite Bay06 .
The algorithm parameters:
- member int extended
......
......@@ -46,3 +46,12 @@
year={2010},
publisher={Springer}
}
@inproceedings{Lim2013,
title={Sketch tokens: A learned mid-level representation for contour and object detection},
author={Lim, Joseph J and Zitnick, C Lawrence and Doll{\'a}r, Piotr},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on},
pages={3158--3165},
year={2013},
organization={IEEE}
}
......@@ -61,7 +61,7 @@ enum EdgeAwareFiltersList
/** @brief Interface for realizations of Domain Transform filter.
For more details about this filter see @cite Gastal11.
For more details about this filter see @cite Gastal11 .
*/
class CV_EXPORTS_W DTFilter : public Algorithm
{
......@@ -125,7 +125,7 @@ void dtFilter(InputArray guide, InputArray src, OutputArray dst, double sigmaSpa
/** @brief Interface for realizations of Guided Filter.
For more details about this filter see @cite Kaiming10.
For more details about this filter see @cite Kaiming10 .
*/
class CV_EXPORTS_W GuidedFilter : public Algorithm
{
......@@ -153,7 +153,7 @@ channels then only first 3 channels will be used.
@param eps regularization term of Guided Filter. \f${eps}^2\f$ is similar to the sigma in the color
space into bilateralFilter.
For more details about Guided Filter parameters, see the original article @cite Kaiming10.
For more details about Guided Filter parameters, see the original article @cite Kaiming10 .
*/
CV_EXPORTS_W Ptr<GuidedFilter> createGuidedFilter(InputArray guide, int radius, double eps);
......@@ -228,7 +228,7 @@ bilateralFilter.
@param adjust_outliers optional, specify perform outliers adjust operation or not, (Eq. 9) in the
original paper.
For more details about Adaptive Manifold Filter parameters, see the original article @cite Gastal12.
For more details about Adaptive Manifold Filter parameters, see the original article @cite Gastal12 .
@note Joint images with CV_8U and CV_16U depth converted to images with CV_32F depth and [0; 1]
color range before processing. Hence color space sigma sigma_r must be in [0; 1] range, unlike same
......
......@@ -54,7 +54,7 @@ namespace ximgproc
//! @{
/** @brief Class implementing the SEEDS (Superpixels Extracted via Energy-Driven Sampling) superpixels
algorithm described in @cite VBRV14.
algorithm described in @cite VBRV14 .
The algorithm uses an efficient hill-climbing algorithm to optimize the superpixels' energy
function that is based on color histograms and a boundary term, which is optional. The energy
......
Structured forests for fast edge detection {#tutorial_ximgproc_prediction}
==========================================
Introduction
------------
In this tutorial you will learn how to use structured forests for the purpose of edge detection in
an image.
Examples
--------
![image](images/01.jpg)
![image](images/02.jpg)
![image](images/03.jpg)
![image](images/04.jpg)
![image](images/05.jpg)
![image](images/06.jpg)
![image](images/07.jpg)
![image](images/08.jpg)
![image](images/09.jpg)
![image](images/10.jpg)
![image](images/11.jpg)
![image](images/12.jpg)
@note binarization techniques like Canny edge detector are applicable to edges produced by both
algorithms (Sobel and StructuredEdgeDetection::detectEdges).
Source Code
-----------
@includelineno ximgproc/samples/structured_edge_detection.cpp
Explanation
-----------
-# **Load source color image**
@code{.cpp}
cv::Mat image = cv::imread(inFilename, 1);
if ( image.empty() )
{
printf("Cannot read image file: %s\n", inFilename.c_str());
return -1;
}
@endcode
-# **Convert source image to [0;1] range**
@code{.cpp}
image.convertTo(image, cv::DataType<float>::type, 1/255.0);
@endcode
-# **Run main algorithm**
@code{.cpp}
cv::Mat edges(image.size(), image.type());
cv::Ptr<StructuredEdgeDetection> pDollar =
cv::createStructuredEdgeDetection(modelFilename);
pDollar->detectEdges(image, edges);
@endcode
-# **Show results**
@code{.cpp}
if ( outFilename == "" )
{
cv::namedWindow("edges", 1);
cv::imshow("edges", edges);
cv::waitKey(0);
}
else
cv::imwrite(outFilename, 255*edges);
@endcode
Literature
----------
For more information, refer to the following papers : @cite Dollar2013 @cite Lim2013
function modelConvert(model, outname)
%% script for converting Piotr's matlab model into YAML format
outfile = fopen(outname, 'w');
fprintf(outfile, '%%YAML:1.0\n\n');
fprintf(outfile, ['options:\n'...
' numberOfTrees: 8\n'...
' numberOfTreesToEvaluate: 4\n'...
' selfsimilarityGridSize: 5\n'...
' stride: 2\n'...
' shrinkNumber: 2\n'...
' patchSize: 32\n'...
' patchInnerSize: 16\n'...
' numberOfGradientOrientations: 4\n'...
' gradientSmoothingRadius: 0\n'...
' regFeatureSmoothingRadius: 2\n'...
' ssFeatureSmoothingRadius: 8\n'...
' gradientNormalizationRadius: 4\n\n']);
fprintf(outfile, 'childs:\n');
printToYML(outfile, model.child', 0);
fprintf(outfile, 'featureIds:\n');
printToYML(outfile, model.fids', 0);
fprintf(outfile, 'thresholds:\n');
printToYML(outfile, model.thrs', 0);
N = 1000;
fprintf(outfile, 'edgeBoundaries:\n');
printToYML(outfile, model.eBnds, N);
fprintf(outfile, 'edgeBins:\n');
printToYML(outfile, model.eBins, N);
fclose(outfile);
gzip(outname);
end
function printToYML(outfile, A, N)
%% append matrix A to outfile as
%% - [a11, a12, a13, a14, ..., a1n]
%% - [a21, a22, a23, a24, ..., a2n]
%% ...
%%
%% if size(A, 2) == 1, A is printed by N elemnent per row
if (length(size(A)) ~= 2)
error('printToYML: second-argument matrix should have two dimensions');
end
if (size(A,2) ~= 1)
for i=1:size(A,1)
fprintf(outfile, ' - [');
fprintf(outfile, '%d,', A(i, 1:end-1));
fprintf(outfile, '%d]\n', A(i, end));
end
else
len = length(A);
for i=1:ceil(len/N)
first = (i-1)*N + 1;
last = min(i*N, len) - 1;
fprintf(outfile, ' - [');
fprintf(outfile, '%d,', A(first:last));
fprintf(outfile, '%d]\n', A(last + 1));
end
end
fprintf(outfile, '\n');
end
\ No newline at end of file
Structured forest training {#tutorial_ximgproc_training}
==========================
Introduction
------------
In this tutorial we show how to train your own structured forest using author's initial Matlab
implementation.
Training pipeline
-----------------
-# Download "Piotr's Toolbox" from [link](http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html)
and put it into separate directory, e.g. PToolbox
-# Download BSDS500 dataset from
link \<http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/\> and put it into
separate directory named exactly BSR
-# Add both directory and their subdirectories to Matlab path.
-# Download detector code from
link \<http://research.microsoft.com/en-us/downloads/389109f6-b4e8-404c-84bf-239f7cbf4e3d/\> and
put it into root directory. Now you should have :
@code
.
BSR
PToolbox
models
private
Contents.m
edgesChns.m
edgesDemo.m
edgesDemoRgbd.m
edgesDetect.m
edgesEval.m
edgesEvalDir.m
edgesEvalImg.m
edgesEvalPlot.m
edgesSweeps.m
edgesTrain.m
license.txt
readme.txt
@endcode
-# Rename models/forest/modelFinal.mat to models/forest/modelFinal.mat.backup
-# Open edgesChns.m and comment lines 26--41. Add after commented lines the following:
@code{.cpp}
shrink=opts.shrink;
chns = single(getFeatures( im2double(I) ));
@endcode
-# Now it is time to compile promised getFeatures. I do with the following code:
@code{.cpp}
#include <cv.h>
#include <highgui.h>
#include <mat.h>
#include <mex.h>
#include "MxArray.hpp" // https://github.com/kyamagu/mexopencv
class NewRFFeatureGetter : public cv::RFFeatureGetter
{
public:
NewRFFeatureGetter() : name("NewRFFeatureGetter"){}
virtual void getFeatures(const cv::Mat &src, NChannelsMat &features,
const int gnrmRad, const int gsmthRad,
const int shrink, const int outNum, const int gradNum) const
{
// here your feature extraction code, the default one is:
// resulting features Mat should be n-channels, floating point matrix
}
protected:
cv::String name;
};
MEXFUNCTION_LINKAGE void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
if (nlhs != 1) mexErrMsgTxt("nlhs != 1");
if (nrhs != 1) mexErrMsgTxt("nrhs != 1");
cv::Mat src = MxArray(prhs[0]).toMat();
src.convertTo(src, cv::DataType<float>::type);
std::string modelFile = MxArray(prhs[1]).toString();
NewRFFeatureGetter *pDollar = createNewRFFeatureGetter();
cv::Mat edges;
pDollar->getFeatures(src, edges, 4, 0, 2, 13, 4);
// you can use other numbers here
edges.convertTo(edges, cv::DataType<double>::type);
plhs[0] = MxArray(edges);
}
@endcode
-# Place compiled mex file into root dir and run edgesDemo. You will need to wait a couple of hours
after that the new model will appear inside models/forest/.
-# The final step is converting trained model from Matlab binary format to YAML which you can use
with our ocv::StructuredEdgeDetection. For this purpose run
opencv_contrib/ximpgroc/tutorials/scripts/modelConvert(model, "model.yml")
How to use your model
---------------------
Just use expanded constructor with above defined class NewRFFeatureGetter
@code{.cpp}
cv::StructuredEdgeDetection pDollar
= cv::createStructuredEdgeDetection( modelName, makePtr<NewRFFeatureGetter>() );
@endcode
......@@ -131,7 +131,7 @@ struct CV_EXPORTS WaldBoostParams
{}
};
/** @brief WaldBoost object detector from @cite Sochman05
/** @brief WaldBoost object detector from @cite Sochman05 .
*/
class CV_EXPORTS WaldBoost : public Algorithm
{
......@@ -190,7 +190,7 @@ struct CV_EXPORTS ICFDetectorParams
{}
};
/** @brief Integral Channel Features from @cite Dollar09
/** @brief Integral Channel Features from @cite Dollar09 .
*/
class CV_EXPORTS ICFDetector
{
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment