Miscellaneous Image Transformations
AdaptiveThreshold
The function transforms a grayscale image to a binary image according to the formulas:
CV_THRESH_BINARY
dst(x,y) = \fork{\texttt{maxValue}}{if $src(x,y) > T(x,y)$}{0}{otherwise}CV_THRESH_BINARY_INV
dst(x,y) = \fork{0}{if $src(x,y) > T(x,y)$}{\texttt{maxValue}}{otherwise}
where T(x,y) is a threshold calculated individually for each pixel.
For the method
CV_ADAPTIVE_THRESH_MEAN_C
it is the mean of a
\texttt{blockSize} \times \texttt{blockSize}
pixel neighborhood, minus
param1
.
For the method
CV_ADAPTIVE_THRESH_GAUSSIAN_C
it is the weighted sum (gaussian) of a
\texttt{blockSize} \times \texttt{blockSize}
pixel neighborhood, minus
param1
.
CvtColor
The function converts the input image from one color
space to another. The function ignores the
colorModel
and
channelSeq
fields of the
IplImage
header, so the
source image color space should be specified correctly (including
order of the channels in the case of RGB space. For example, BGR means 24-bit
format with
B_0, G_0, R_0, B_1, G_1, R_1, ...
layout
whereas RGB means 24-format with
R_0, G_0, B_0, R_1, G_1, B_1, ...
layout).
The conventional range for R,G,B channel values is:
- 0 to 255 for 8-bit images
- 0 to 65535 for 16-bit images and
- 0 to 1 for floating-point images.
Of course, in the case of linear transformations the range can be specific, but in order to get correct results in the case of non-linear transformations, the input image should be scaled.
The function can do the following transformations:
-
Transformations within RGB space like adding/removing the alpha channel, reversing the channel order, conversion to/from 16-bit RGB color (R5:G6:B5 or R5:G5:B5), as well as conversion to/from grayscale using:
\text{RGB[A] to Gray:} Y \leftarrow 0.299 \cdot R + 0.587 \cdot G + 0.114 \cdot B
and
\text{Gray to RGB[A]:} R \leftarrow Y, G \leftarrow Y, B \leftarrow Y, A \leftarrow 0
The conversion from a RGB image to gray is done with:
cvCvtColor(src ,bwsrc, CV_RGB2GRAY)
-
RGB \leftrightarrow CIE XYZ.Rec 709 with D65 white point (
CV_BGR2XYZ, CV_RGB2XYZ, CV_XYZ2BGR, CV_XYZ2RGB
):\begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \leftarrow \begin{bmatrix} 0.412453 & 0.357580 & 0.180423 \\ 0.212671 & 0.715160 & 0.072169 \\ 0.019334 & 0.119193 & 0.950227 \end{bmatrix} \cdot \begin{bmatrix} R \\ G \\ B \end{bmatrix}
\begin{bmatrix} R \\ G \\ B \end{bmatrix} \leftarrow \begin{bmatrix} 3.240479 & -1.53715 & -0.498535 \\ -0.969256 & 1.875991 & 0.041556 \\ 0.055648 & -0.204043 & 1.057311 \end{bmatrix} \cdot \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}
X , Y and Z cover the whole value range (in the case of floating-point images Z may exceed 1).
-
RGB \leftrightarrow YCrCb JPEG (a.k.a. YCC) (
CV_BGR2YCrCb, CV_RGB2YCrCb, CV_YCrCb2BGR, CV_YCrCb2RGB
)Y \leftarrow 0.299 \cdot R + 0.587 \cdot G + 0.114 \cdot B
Cr \leftarrow (R-Y) \cdot 0.713 + delta
Cb \leftarrow (B-Y) \cdot 0.564 + delta
R \leftarrow Y + 1.403 \cdot (Cr - delta)
G \leftarrow Y - 0.344 \cdot (Cr - delta) - 0.714 \cdot (Cb - delta)
B \leftarrow Y + 1.773 \cdot (Cb - delta)
where
delta = \left \{ \begin{array}{l l} 128 & \mbox{for 8-bit images} \\ 32768 & \mbox{for 16-bit images} \\ 0.5 & \mbox{for floating-point images} \end{array} \right .
Y, Cr and Cb cover the whole value range.
-
RGB \leftrightarrow HSV (
CV_BGR2HSV, CV_RGB2HSV, CV_HSV2BGR, CV_HSV2RGB
) in the case of 8-bit and 16-bit images R, G and B are converted to floating-point format and scaled to fit the 0 to 1 rangeV \leftarrow max(R,G,B)
S \leftarrow \fork{\frac{V-min(R,G,B)}{V}}{if $V \neq 0$}{0}{otherwise}
H \leftarrow \forkthree{{60(G - B)}/{S}}{if $V=R$}{{120+60(B - R)}/{S}}{if $V=G$}{{240+60(R - G)}/{S}}{if $V=B$}
if H<0 then H \leftarrow H+360 On output 0 \leq V \leq 1 , 0 \leq S \leq 1 , 0 \leq H \leq 360 .
The values are then converted to the destination data type:
-
8-bit images
V \leftarrow 255 V, S \leftarrow 255 S, H \leftarrow H/2 \text{(to fit to 0 to 255)}
-
16-bit images (currently not supported)
V <- 65535 V, S <- 65535 S, H <- H
-
- 32-bit images
-
H, S, V are left as is
-
-
RGB \leftrightarrow HLS (
CV_BGR2HLS, CV_RGB2HLS, CV_HLS2BGR, CV_HLS2RGB
). in the case of 8-bit and 16-bit images R, G and B are converted to floating-point format and scaled to fit the 0 to 1 range.V_{max} \leftarrow {max}(R,G,B)
V_{min} \leftarrow {min}(R,G,B)
L \leftarrow \frac{V_{max} + V_{min}}{2}
S \leftarrow \fork{\frac{V_{max} - V_{min}}{V_{max} + V_{min}}}{if $L < 0.5$}{\frac{V_{max} - V_{min}}{2 - (V_{max} + V_{min})}}{if $L \ge 0.5$}
H \leftarrow \forkthree{{60(G - B)}/{S}}{if $V_{max}=R$}{{120+60(B - R)}/{S}}{if $V_{max}=G$}{{240+60(R - G)}/{S}}{if $V_{max}=B$}
if H<0 then H \leftarrow H+360 On output 0 \leq L \leq 1 , 0 \leq S \leq 1 , 0 \leq H \leq 360 .
The values are then converted to the destination data type:
-
8-bit images
V \leftarrow 255 V, S \leftarrow 255 S, H \leftarrow H/2 \text{(to fit to 0 to 255)}
-
16-bit images (currently not supported)
V <- 65535 V, S <- 65535 S, H <- H
-
- 32-bit images
-
H, S, V are left as is
-
-
RGB \leftrightarrow CIE L*a*b* (
CV_BGR2Lab, CV_RGB2Lab, CV_Lab2BGR, CV_Lab2RGB
) in the case of 8-bit and 16-bit images R, G and B are converted to floating-point format and scaled to fit the 0 to 1 range\vecthree{X}{Y}{Z} \leftarrow \vecthreethree{0.412453}{0.357580}{0.180423}{0.212671}{0.715160}{0.072169}{0.019334}{0.119193}{0.950227} \cdot \vecthree{R}{G}{B}
X \leftarrow X/X_n, \text{where} X_n = 0.950456
Z \leftarrow Z/Z_n, \text{where} Z_n = 1.088754
L \leftarrow \fork{116*Y^{1/3}-16}{for $Y>0.008856$}{903.3*Y}{for $Y \le 0.008856$}
a \leftarrow 500 (f(X)-f(Y)) + delta
b \leftarrow 200 (f(Y)-f(Z)) + delta
where
f(t)= \fork{t^{1/3}}{for $t>0.008856$}{7.787 t+16/116}{for $t<=0.008856$}
and
delta = \fork{128}{for 8-bit images}{0}{for floating-point images}
On output 0 \leq L \leq 100 , -127 \leq a \leq 127 , -127 \leq b \leq 127 The values are then converted to the destination data type:
-
8-bit images
L \leftarrow L*255/100, a \leftarrow a + 128, b \leftarrow b + 128
-
- 16-bit images
-
currently not supported
-
- 32-bit images
-
L, a, b are left as is
-
-
RGB \leftrightarrow CIE L*u*v* (
CV_BGR2Luv, CV_RGB2Luv, CV_Luv2BGR, CV_Luv2RGB
) in the case of 8-bit and 16-bit images R, G and B are converted to floating-point format and scaled to fit 0 to 1 range\vecthree{X}{Y}{Z} \leftarrow \vecthreethree{0.412453}{0.357580}{0.180423}{0.212671}{0.715160}{0.072169}{0.019334}{0.119193}{0.950227} \cdot \vecthree{R}{G}{B}
L \leftarrow \fork{116 Y^{1/3}}{for $Y>0.008856$}{903.3 Y}{for $Y<=0.008856$}
u' \leftarrow 4*X/(X + 15*Y + 3 Z)
v' \leftarrow 9*Y/(X + 15*Y + 3 Z)
u \leftarrow 13*L*(u' - u_n) \quad \text{where} \quad u_n=0.19793943
v \leftarrow 13*L*(v' - v_n) \quad \text{where} \quad v_n=0.46831096
On output 0 \leq L \leq 100 , -134 \leq u \leq 220 , -140 \leq v \leq 122 .
The values are then converted to the destination data type:
-
8-bit images
L \leftarrow 255/100 L, u \leftarrow 255/354 (u + 134), v \leftarrow 255/256 (v + 140)
-
- 16-bit images
-
currently not supported
-
- 32-bit images
-
L, u, v are left as is
The above formulas for converting RGB to/from various color spaces have been taken from multiple sources on Web, primarily from the Ford98 at the Charles Poynton site.
-
-
Bayer \rightarrow RGB (
CV_BayerBG2BGR, CV_BayerGB2BGR, CV_BayerRG2BGR, CV_BayerGR2BGR, CV_BayerBG2RGB, CV_BayerGB2RGB, CV_BayerRG2RGB, CV_BayerGR2RGB
) The Bayer pattern is widely used in CCD and CMOS cameras. It allows one to get color pictures from a single plane where R,G and B pixels (sensors of a particular component) are interleaved like this:\newcommand{\Rcell}{\color{red}R} \newcommand{\Gcell}{\color{green}G} \newcommand{\Bcell}{\color{blue}B} \definecolor{BackGray}{rgb}{0.8,0.8,0.8} \begin{array}{ c c c c c } \Rcell & \Gcell & \Rcell & \Gcell & \Rcell \\ \Gcell & \colorbox{BackGray}{\Bcell} & \colorbox{BackGray}{\Gcell} & \Bcell & \Gcell \\ \Rcell & \Gcell & \Rcell & \Gcell & \Rcell \\ \Gcell & \Bcell & \Gcell & \Bcell & \Gcell \\ \Rcell & \Gcell & \Rcell & \Gcell & \Rcell \end{array}
The output RGB components of a pixel are interpolated from 1, 2 or 4 neighbors of the pixel having the same color. There are several modifications of the above pattern that can be achieved by shifting the pattern one pixel left and/or one pixel up. The two letters C_1 and C_2 in the conversion constants
CV_Bayer
C_1 C_22BGR
andCV_Bayer
C_1 C_22RGB
indicate the particular pattern type - these are components from the second row, second and third columns, respectively. For example, the above pattern has very popular "BG" type.
DistTransform
The function calculates the approximated
distance from every binary image pixel to the nearest zero pixel.
For zero pixels the function sets the zero distance, for others it
finds the shortest path consisting of basic shifts: horizontal,
vertical, diagonal or knight's move (the latest is available for a
5\times 5
mask). The overall distance is calculated as a sum of these
basic distances. Because the distance function should be symmetric,
all of the horizontal and vertical shifts must have the same cost (that
is denoted as
a
), all the diagonal shifts must have the
same cost (denoted
b
), and all knight's moves must have
the same cost (denoted
c
). For
CV_DIST_C
and
CV_DIST_L1
types the distance is calculated precisely,
whereas for
CV_DIST_L2
(Euclidian distance) the distance
can be calculated only with some relative error (a
5\times 5
mask
gives more accurate results), OpenCV uses the values suggested in
Borgefors86
:
CV_DIST_C |
(3\times 3) | a = 1, b = 1 |
---|---|---|
CV_DIST_L1 |
(3\times 3) | a = 1, b = 2 |
CV_DIST_L2 |
(3\times 3) | a=0.955, b=1.3693 |
CV_DIST_L2 |
(5\times 5) | a=1, b=1.4, c=2.1969 |
And below are samples of the distance field (black (0) pixel is in the middle of white square) in the case of a user-defined distance:
User-defined 3 \times 3 mask (a=1, b=1.5)
4.5 | 4 | 3.5 | 3 | 3.5 | 4 | 4.5 |
---|---|---|---|---|---|---|
4 | 3 | 2.5 | 2 | 2.5 | 3 | 4 |
3.5 | 2.5 | 1.5 | 1 | 1.5 | 2.5 | 3.5 |
3 | 2 | 1 | 1 | 2 | 3 | |
3.5 | 2.5 | 1.5 | 1 | 1.5 | 2.5 | 3.5 |
4 | 3 | 2.5 | 2 | 2.5 | 3 | 4 |
4.5 | 4 | 3.5 | 3 | 3.5 | 4 | 4.5 |
User-defined 5 \times 5 mask (a=1, b=1.5, c=2)
4.5 | 3.5 | 3 | 3 | 3 | 3.5 | 4.5 |
---|---|---|---|---|---|---|
3.5 | 3 | 2 | 2 | 2 | 3 | 3.5 |
3 | 2 | 1.5 | 1 | 1.5 | 2 | 3 |
3 | 2 | 1 | 1 | 2 | 3 | |
3 | 2 | 1.5 | 1 | 1.5 | 2 | 3 |
3.5 | 3 | 2 | 2 | 2 | 3 | 3.5 |
4 | 3.5 | 3 | 3 | 3 | 3.5 | 4 |
Typically, for a fast, coarse distance estimation
CV_DIST_L2
,
a
3\times 3
mask is used, and for a more accurate distance estimation
CV_DIST_L2
, a
5\times 5
mask is used.
When the output parameter
labels
is not
NULL
, for
every non-zero pixel the function also finds the nearest connected
component consisting of zero pixels. The connected components
themselves are found as contours in the beginning of the function.
In this mode the processing time is still O(N), where N is the number of pixels. Thus, the function provides a very fast way to compute approximate Voronoi diagram for the binary image.
CvConnectedComp
Connected component, represented as a tuple (area, value, rect), where area is the area of the component as a float, value is the average color as a :ref:`CvScalar` , and rect is the ROI of the component, as a :ref:`CvRect` .
FloodFill
The function fills a connected component starting from the seed point with the specified color. The connectivity is determined by the closeness of pixel values. The pixel at (x,y) is considered to belong to the repainted domain if:
-
grayscale image, floating range
src(x',y')- \texttt{lo\_diff} <= src(x,y) <= src(x',y')+ \texttt{up\_diff}
-
grayscale image, fixed range
src(seed.x,seed.y)- \texttt{lo\_diff} <=src(x,y)<=src(seed.x,seed.y)+ \texttt{up\_diff}
-
color image, floating range
src(x',y')_r- \texttt{lo\_diff} _r<=src(x,y)_r<=src(x',y')_r+ \texttt{up\_diff} _r
src(x',y')_g- \texttt{lo\_diff} _g<=src(x,y)_g<=src(x',y')_g+ \texttt{up\_diff} _g
src(x',y')_b- \texttt{lo\_diff} _b<=src(x,y)_b<=src(x',y')_b+ \texttt{up\_diff} _b
-
color image, fixed range
src(seed.x,seed.y)_r- \texttt{lo\_diff} _r<=src(x,y)_r<=src(seed.x,seed.y)_r+ \texttt{up\_diff} _r
src(seed.x,seed.y)_g- \texttt{lo\_diff} _g<=src(x,y)_g<=src(seed.x,seed.y)_g+ \texttt{up\_diff} _g
src(seed.x,seed.y)_b- \texttt{lo\_diff} _b<=src(x,y)_b<=src(seed.x,seed.y)_b+ \texttt{up\_diff} _b
where src(x',y') is the value of one of pixel neighbors. That is, to be added to the connected component, a pixel's color/brightness should be close enough to the:
- color/brightness of one of its neighbors that are already referred to the connected component in the case of floating range
- color/brightness of the seed point in the case of fixed range.
Inpaint
The function reconstructs the selected image area from the pixel near the area boundary. The function may be used to remove dust and scratches from a scanned photo, or to remove undesirable objects from still images or video.
Integral
The function calculates one or more integral images for the source image as following:
\texttt{sum} (X,Y) = \sum _{x<X,y<Y} \texttt{image} (x,y)
\texttt{sqsum} (X,Y) = \sum _{x<X,y<Y} \texttt{image} (x,y)^2
\texttt{tiltedSum} (X,Y) = \sum _{y<Y,abs(x-X+1) \leq Y-y-1} \texttt{image} (x,y)
Using these integral images, one may calculate sum, mean and standard deviation over a specific up-right or rotated rectangular region of the image in a constant time, for example:
\sum _{x_1<=x<x_2, \, y_1<=y<y_2} = \texttt{sum} (x_2,y_2)- \texttt{sum} (x_1,y_2)- \texttt{sum} (x_2,y_1)+ \texttt{sum} (x_1,x_1)
It makes possible to do a fast blurring or fast block correlation with variable window size, for example. In the case of multi-channel images, sums for each channel are accumulated independently.
PyrMeanShiftFiltering
The function implements the filtering stage of meanshift segmentation, that is, the output of the function is the filtered "posterized" image with color gradients and fine-grain texture flattened. At every pixel (X,Y) of the input image (or down-sized input image, see below) the function executes meanshift iterations, that is, the pixel (X,Y) neighborhood in the joint space-color hyperspace is considered:
(x,y): X- \texttt{sp} \le x \le X+ \texttt{sp} , Y- \texttt{sp} \le y \le Y+ \texttt{sp} , ||(R,G,B)-(r,g,b)|| \le \texttt{sr}
where
(R,G,B)
and
(r,g,b)
are the vectors of color components at
(X,Y)
and
(x,y)
, respectively (though, the algorithm does not depend on the color space used, so any 3-component color space can be used instead). Over the neighborhood the average spatial value
(X',Y')
and average color vector
(R',G',B')
are found and they act as the neighborhood center on the next iteration:
(X,Y)~(X',Y'), (R,G,B)~(R',G',B'). After the iterations over, the color components of the initial pixel (that is, the pixel from where the iterations started) are set to the final value (average color at the last iteration):
I(X,Y) <- (R*,G*,B*) Then \texttt{max\_level}>0 , the gaussian pyramid of \texttt{max\_level}+1 levels is built, and the above procedure is run on the smallest layer. After that, the results are propagated to the larger layer and the iterations are run again only on those pixels where the layer colors differ much ( >\texttt{sr} ) from the lower-resolution layer, that is, the boundaries of the color regions are clarified. Note, that the results will be actually different from the ones obtained by running the meanshift procedure on the whole original image (i.e. when \texttt{max\_level}==0 ).
PyrSegmentation
The function implements image segmentation by pyramids. The pyramid builds up to the level
level
. The links between any pixel
a
on level
i
and its candidate father pixel
b
on the adjacent level are established if
p(c(a),c(b))<threshold1
.
After the connected components are defined, they are joined into several clusters.
Any two segments A and B belong to the same cluster, if
p(c(A),c(B))<threshold2
.
If the input image has only one channel, then
p(c^1,c^2)=|c^1-c^2|
.
If the input image has three channels (red, green and blue), then
p(c^1,c^2) = 0.30 (c^1_r - c^2_r) + 0.59 (c^1_g - c^2_g) + 0.11 (c^1_b - c^2_b).
There may be more than one connected component per a cluster. The images
src
and
dst
should be 8-bit single-channel or 3-channel images or equal size.
Threshold
The function applies fixed-level thresholding
to a single-channel array. The function is typically used to get a
bi-level (binary) image out of a grayscale image (
:ref:`CmpS`
could
be also used for this purpose) or for removing a noise, i.e. filtering
out pixels with too small or too large values. There are several
types of thresholding that the function supports that are determined by
thresholdType
:
CV_THRESH_BINARY
\texttt{dst} (x,y) = \fork{\texttt{maxValue}}{if $\texttt{src}(x,y) > \texttt{threshold}$}{0}{otherwise}CV_THRESH_BINARY_INV
\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{threshold}$}{\texttt{maxValue}}{otherwise}CV_THRESH_TRUNC
\texttt{dst} (x,y) = \fork{\texttt{threshold}}{if $\texttt{src}(x,y) > \texttt{threshold}$}{\texttt{src}(x,y)}{otherwise}CV_THRESH_TOZERO
\texttt{dst} (x,y) = \fork{\texttt{src}(x,y)}{if $\texttt{src}(x,y) > \texttt{threshold}$}{0}{otherwise}CV_THRESH_TOZERO_INV
\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{threshold}$}{\texttt{src}(x,y)}{otherwise}
Also, the special value
CV_THRESH_OTSU
may be combined with
one of the above values. In this case the function determines the optimal threshold
value using Otsu's algorithm and uses it instead of the specified
thresh
.
The function returns the computed threshold value.
Currently, Otsu's method is implemented only for 8-bit images.
