The goal of this tutorial is to show you how to use the OpenCV `parallel_for_` framework to easily
parallelize your code. To illustrate the concept, we will write a program to draw a Mandelbrot set
exploiting almost all the CPU load available.
The full tutorial code is [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
If you want more information about multithreading, you will have to refer to a reference book or course as this tutorial is intended
to remain simple.
Precondition
----
The first precondition is to have OpenCV built with a parallel framework.
In OpenCV 3.2, the following parallel frameworks are available in that order:
1. Intel Threading Building Blocks (3rdparty library, should be explicitly enabled)
2. C= Parallel C/C++ Programming Language Extension (3rdparty library, should be explicitly enabled)
3. OpenMP (integrated to compiler, should be explicitly enabled)
4. APPLE GCD (system wide, used automatically (APPLE only))
5. Windows RT concurrency (system wide, used automatically (Windows RT only))
6. Windows concurrency (part of runtime, used automatically (Windows only - MSVC++ >= 10))
7. Pthreads (if available)
As you can see, several parallel frameworks can be used in the OpenCV library. Some parallel libraries
are third party libraries and have to be explictly built and enabled in CMake (e.g. TBB, C=), others are
automatically available with the platform (e.g. APPLE GCD) but chances are that you should be enable to
have access to a parallel framework either directly or by enabling the option in CMake and rebuild the library.
The second (weak) precondition is more related to the task you want to achieve as not all computations
are suitable / can be adatapted to be run in a parallel way. To remain simple, tasks that can be splitted
into multiple elementary operations with no memory dependency (no possible race condition) are easily
parallelizable. Computer vision processing are often easily parallelizable as most of the time the processing of
one pixel does not depend to the state of other pixels.
Simple example: drawing a Mandelbrot set
----
We will use the example of drawing a Mandelbrot set to show how from a regular sequential code you can easily adapt
the code to parallize the computation.
Theory
-----------
The Mandelbrot set definition has been named in tribute to the mathematician Benoit Mandelbrot by the mathematician
Adrien Douady. It has been famous outside of the mathematics field as the image representation is an example of a
class of fractals, a mathematical set that exhibits a repeating pattern displayed at every scale (even more, a
Mandelbrot set is self-similar as the whole shape can be repeatedly seen at different scale). For a more in-depth
introduction, you can look at the corresponding [Wikipedia article](https://en.wikipedia.org/wiki/Mandelbrot_set).
Here, we will just introduce the formula to draw the Mandelbrot set (from the mentioned Wikipedia article).
> The Mandelbrot set is the set of values of \f$ c \f$ in the complex plane for which the orbit of 0 under iteration
Here, the range represents the total number of operations to be executed, so the total number of pixels in the image.
To set the number of threads, you can use: @ref cv::setNumThreads. You can also specify the number of splitting using the
nstripes parameter in @ref cv::parallel_for_. For instance, if your processor has 4 threads, setting `cv::setNumThreads(2)`
or setting `nstripes=2` should be the same as by default it will use all the processor threads available but will split the
workload only on two threads.
Results
-----------
You can find the full tutorial code [here](https://github.com/opencv/opencv/blob/master/samples/cpp/tutorial_code/core/how_to_use_OpenCV_parallel_for_/how_to_use_OpenCV_parallel_for_.cpp).
The performance of the parallel implementation depends of the type of CPU you have. For instance, on 4 cores / 8 threads
CPU, you can expect a speed-up of around 6.9X. There are many factors to explain why we do not achieve a speed-up of almost 8X.
Main reasons should be mostly due to:
* the overhead to create and manage the threads,
* background processes running in parallel,
* the difference between 4 hardware cores with 2 logical threads for each core and 8 hardware cores.
The resulting image produced by the tutorial code (you can modify the code to use more iterations and assign a pixel color
depending on the escaped iteration and using a color palette to get more aesthetic images):
![Mandelbrot set with xMin=-2.1, xMax=0.6, yMin=-1.2, yMax=1.2, maxIterations=500](images/how_to_use_OpenCV_parallel_for_Mandelbrot.png)