Commit 875f9223 authored by Maksim Shabunin's avatar Maksim Shabunin

Doxygen tutorials: python basic

parent 36a04ef8
......@@ -197,8 +197,9 @@ if(BUILD_DOCS AND HAVE_DOXYGEN)
set(rootfile "${CMAKE_CURRENT_BINARY_DIR}/root.markdown")
set(bibfile "${CMAKE_CURRENT_SOURCE_DIR}/opencv.bib")
set(tutorial_path "${CMAKE_CURRENT_SOURCE_DIR}/tutorials")
string(REPLACE ";" " \\\n" CMAKE_DOXYGEN_INPUT_LIST "${rootfile} ; ${paths_include} ; ${paths_doc} ; ${tutorial_path}")
string(REPLACE ";" " \\\n" CMAKE_DOXYGEN_IMAGE_PATH "${paths_doc} ; ${tutorial_path}")
set(tutorial_py_path "${CMAKE_CURRENT_SOURCE_DIR}/py_tutorials")
string(REPLACE ";" " \\\n" CMAKE_DOXYGEN_INPUT_LIST "${rootfile} ; ${paths_include} ; ${paths_doc} ; ${tutorial_path} ; ${tutorial_py_path}")
string(REPLACE ";" " \\\n" CMAKE_DOXYGEN_IMAGE_PATH "${paths_doc} ; ${tutorial_path} ; ${tutorial_py_path}")
string(REPLACE ";" " \\\n" CMAKE_DOXYGEN_EXAMPLE_PATH "${CMAKE_SOURCE_DIR}/samples ; ${paths_doc}")
set(CMAKE_DOXYGEN_LAYOUT "${CMAKE_CURRENT_SOURCE_DIR}/DoxygenLayout.xml")
set(CMAKE_DOXYGEN_OUTPUT_PATH "doxygen")
......
How OpenCV-Python Bindings Works? {#tutorial_py_bindings_basics}
=================================
Goal
----
Learn:
- How OpenCV-Python bindings are generated?
- How to extend new OpenCV modules to Python?
How OpenCV-Python bindings are generated?
-----------------------------------------
In OpenCV, all algorithms are implemented in C++. But these algorithms can be used from different
languages like Python, Java etc. This is made possible by the bindings generators. These generators
create a bridge between C++ and Python which enables users to call C++ functions from Python. To get
a complete picture of what is happening in background, a good knowledge of Python/C API is required.
A simple example on extending C++ functions to Python can be found in official Python
documentation[1]. So extending all functions in OpenCV to Python by writing their wrapper functions
manually is a time-consuming task. So OpenCV does it in a more intelligent way. OpenCV generates
these wrapper functions automatically from the C++ headers using some Python scripts which are
located in modules/python/src2. We will look into what they do.
First, modules/python/CMakeFiles.txt is a CMake script which checks the modules to be extended to
Python. It will automatically check all the modules to be extended and grab their header files.
These header files contain list of all classes, functions, constants etc. for that particular
modules.
Second, these header files are passed to a Python script, modules/python/src2/gen2.py. This is the
Python bindings generator script. It calls another Python script modules/python/src2/hdr_parser.py.
This is the header parser script. This header parser splits the complete header file into small
Python lists. So these lists contain all details about a particular function, class etc. For
example, a function will be parsed to get a list containing function name, return type, input
arguments, argument types etc. Final list contains details of all the functions, structs, classes
etc. in that header file.
But header parser doesn't parse all the functions/classes in the header file. The developer has to
specify which functions should be exported to Python. For that, there are certain macros added to
the beginning of these declarations which enables the header parser to identify functions to be
parsed. These macros are added by the developer who programs the particular function. In short, the
developer decides which functions should be extended to Python and which are not. Details of those
macros will be given in next session.
So header parser returns a final big list of parsed functions. Our generator script (gen2.py) will
create wrapper functions for all the functions/classes/enums/structs parsed by header parser (You
can find these header files during compilation in the build/modules/python/ folder as
pyopencv_generated_\*.h files). But there may be some basic OpenCV datatypes like Mat, Vec4i,
Size. They need to be extended manually. For example, a Mat type should be extended to Numpy array,
Size should be extended to a tuple of two integers etc. Similarly, there may be some complex
structs/classes/functions etc. which need to be extended manually. All such manual wrapper functions
are placed in modules/python/src2/pycv2.hpp.
So now only thing left is the compilation of these wrapper files which gives us **cv2** module. So
when you call a function, say res = equalizeHist(img1,img2) in Python, you pass two numpy arrays and
you expect another numpy array as the output. So these numpy arrays are converted to cv::Mat and
then calls the equalizeHist() function in C++. Final result, res will be converted back into a Numpy
array. So in short, almost all operations are done in C++ which gives us almost same speed as that
of C++.
So this is the basic version of how OpenCV-Python bindings are generated.
How to extend new modules to Python?
------------------------------------
Header parser parse the header files based on some wrapper macros added to function declaration.
Enumeration constants don't need any wrapper macros. They are automatically wrapped. But remaining
functions, classes etc. need wrapper macros.
Functions are extended using CV_EXPORTS_W macro. An example is shown below.
@code{.cpp}
CV_EXPORTS_W void equalizeHist( InputArray src, OutputArray dst );
@endcode
Header parser can understand the input and output arguments from keywords like
InputArray, OutputArray etc. But sometimes, we may need to hardcode inputs and outputs. For that,
macros like CV_OUT, CV_IN_OUT etc. are used.
@code{.cpp}
CV_EXPORTS_W void minEnclosingCircle( InputArray points,
CV_OUT Point2f& center, CV_OUT float& radius );
@endcode
For large classes also, CV_EXPORTS_W is used. To extend class methods, CV_WRAP is used.
Similarly, CV_PROP is used for class fields.
@code{.cpp}
class CV_EXPORTS_W CLAHE : public Algorithm
{
public:
CV_WRAP virtual void apply(InputArray src, OutputArray dst) = 0;
CV_WRAP virtual void setClipLimit(double clipLimit) = 0;
CV_WRAP virtual double getClipLimit() const = 0;
}
@endcode
Overloaded functions can be extended using CV_EXPORTS_AS. But we need to pass a new name so that
each function will be called by that name in Python. Take the case of integral function below. Three
functions are available, so each one is named with a suffix in Python. Similarly CV_WRAP_AS can be
used to wrap overloaded methods.
@code{.cpp}
//! computes the integral image
CV_EXPORTS_W void integral( InputArray src, OutputArray sum, int sdepth = -1 );
//! computes the integral image and integral for the squared image
CV_EXPORTS_AS(integral2) void integral( InputArray src, OutputArray sum,
OutputArray sqsum, int sdepth = -1, int sqdepth = -1 );
//! computes the integral image, integral for the squared image and the tilted integral image
CV_EXPORTS_AS(integral3) void integral( InputArray src, OutputArray sum,
OutputArray sqsum, OutputArray tilted,
int sdepth = -1, int sqdepth = -1 );
@endcode
Small classes/structs are extended using CV_EXPORTS_W_SIMPLE. These structs are passed by value
to C++ functions. Examples are KeyPoint, Match etc. Their methods are extended by CV_WRAP and
fields are extended by CV_PROP_RW.
@code{.cpp}
class CV_EXPORTS_W_SIMPLE DMatch
{
public:
CV_WRAP DMatch();
CV_WRAP DMatch(int _queryIdx, int _trainIdx, float _distance);
CV_WRAP DMatch(int _queryIdx, int _trainIdx, int _imgIdx, float _distance);
CV_PROP_RW int queryIdx; // query descriptor index
CV_PROP_RW int trainIdx; // train descriptor index
CV_PROP_RW int imgIdx; // train image index
CV_PROP_RW float distance;
};
@endcode
Some other small classes/structs can be exported using CV_EXPORTS_W_MAP where it is exported to a
Python native dictionary. Moments() is an example of it.
@code{.cpp}
class CV_EXPORTS_W_MAP Moments
{
public:
//! spatial moments
CV_PROP_RW double m00, m10, m01, m20, m11, m02, m30, m21, m12, m03;
//! central moments
CV_PROP_RW double mu20, mu11, mu02, mu30, mu21, mu12, mu03;
//! central normalized moments
CV_PROP_RW double nu20, nu11, nu02, nu30, nu21, nu12, nu03;
};
@endcode
So these are the major extension macros available in OpenCV. Typically, a developer has to put
proper macros in their appropriate positions. Rest is done by generator scripts. Sometimes, there
may be an exceptional cases where generator scripts cannot create the wrappers. Such functions need
to be handled manually. But most of the time, a code written according to OpenCV coding guidelines
will be automatically wrapped by generator scripts.
OpenCV-Python Bindings {#tutorial_py_table_of_contents_bindings}
======================
Here, you will learn how OpenCV-Python bindings are generated.
- @subpage tutorial_py_bindings_basics
Learn how OpenCV-Python bindings are generated.
Depth Map from Stereo Images {#tutorial_py_depthmap}
============================
Goal
----
In this session,
- We will learn to create depth map from stereo images.
Basics
------
In last session, we saw basic concepts like epipolar constraints and other related terms. We also
saw that if we have two images of same scene, we can get depth information from that in an intuitive
way. Below is an image and some simple mathematical formulas which proves that intuition. (Image
Courtesy :
![image](images/stereo_depth.jpg)
The above diagram contains equivalent triangles. Writing their equivalent equations will yield us
following result:
\f[disparity = x - x' = \frac{Bf}{Z}\f]
\f$x\f$ and \f$x'\f$ are the distance between points in image plane corresponding to the scene point 3D and
their camera center. \f$B\f$ is the distance between two cameras (which we know) and \f$f\f$ is the focal
length of camera (already known). So in short, above equation says that the depth of a point in a
scene is inversely proportional to the difference in distance of corresponding image points and
their camera centers. So with this information, we can derive the depth of all pixels in an image.
So it finds corresponding matches between two images. We have already seen how epiline constraint
make this operation faster and accurate. Once it finds matches, it finds the disparity. Let's see
how we can do it with OpenCV.
Code
----
Below code snippet shows a simple procedure to create disparity map.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('tsukuba_l.png',0)
imgR = cv2.imread('tsukuba_r.png',0)
stereo = cv2.createStereoBM(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL,imgR)
plt.imshow(disparity,'gray')
plt.show()
@endcode
Below image contains the original image (left) and its disparity map (right). As you can see, result
is contaminated with high degree of noise. By adjusting the values of numDisparities and blockSize,
you can get a better result.
![image](images/disparity_map.jpg)
@note More details to be added
Additional Resources
--------------------
Exercises
---------
-# OpenCV samples contain an example of generating disparity map and its 3D reconstruction. Check
stereo_match.py in OpenCV-Python samples.
Epipolar Geometry {#tutorial_py_epipolar_geometry}
=================
Goal
----
In this section,
- We will learn about the basics of multiview geometry
- We will see what is epipole, epipolar lines, epipolar constraint etc.
Basic Concepts
--------------
When we take an image using pin-hole camera, we loose an important information, ie depth of the
image. Or how far is each point in the image from the camera because it is a 3D-to-2D conversion. So
it is an important question whether we can find the depth information using these cameras. And the
answer is to use more than one camera. Our eyes works in similar way where we use two cameras (two
eyes) which is called stereo vision. So let's see what OpenCV provides in this field.
(*Learning OpenCV* by Gary Bradsky has a lot of information in this field.)
Before going to depth images, let's first understand some basic concepts in multiview geometry. In
this section we will deal with epipolar geometry. See the image below which shows a basic setup with
two cameras taking the image of same scene.
![image](images/epipolar.jpg)
If we are using only the left camera, we can't find the 3D point corresponding to the point \f$x\f$ in
image because every point on the line \f$OX\f$ projects to the same point on the image plane. But
consider the right image also. Now different points on the line \f$OX\f$ projects to different points
(\f$x'\f$) in right plane. So with these two images, we can triangulate the correct 3D point. This is
the whole idea.
The projection of the different points on \f$OX\f$ form a line on right plane (line \f$l'\f$). We call it
**epiline** corresponding to the point \f$x\f$. It means, to find the point \f$x\f$ on the right image,
search along this epiline. It should be somewhere on this line (Think of it this way, to find the
matching point in other image, you need not search the whole image, just search along the epiline.
So it provides better performance and accuracy). This is called **Epipolar Constraint**. Similarly
all points will have its corresponding epilines in the other image. The plane \f$XOO'\f$ is called
**Epipolar Plane**.
\f$O\f$ and \f$O'\f$ are the camera centers. From the setup given above, you can see that projection of
right camera \f$O'\f$ is seen on the left image at the point, \f$e\f$. It is called the **epipole**. Epipole
is the point of intersection of line through camera centers and the image planes. Similarly \f$e'\f$ is
the epipole of the left camera. In some cases, you won't be able to locate the epipole in the image,
they may be outside the image (which means, one camera doesn't see the other).
All the epilines pass through its epipole. So to find the location of epipole, we can find many
epilines and find their intersection point.
So in this session, we focus on finding epipolar lines and epipoles. But to find them, we need two
more ingredients, **Fundamental Matrix (F)** and **Essential Matrix (E)**. Essential Matrix contains
the information about translation and rotation, which describe the location of the second camera
relative to the first in global coordinates. See the image below (Image courtesy: Learning OpenCV by
Gary Bradsky):
![image](images/essential_matrix.jpg)
But we prefer measurements to be done in pixel coordinates, right? Fundamental Matrix contains the
same information as Essential Matrix in addition to the information about the intrinsics of both
cameras so that we can relate the two cameras in pixel coordinates. (If we are using rectified
images and normalize the point by dividing by the focal lengths, \f$F=E\f$). In simple words,
Fundamental Matrix F, maps a point in one image to a line (epiline) in the other image. This is
calculated from matching points from both the images. A minimum of 8 such points are required to
find the fundamental matrix (while using 8-point algorithm). More points are preferred and use
RANSAC to get a more robust result.
Code
----
So first we need to find as many possible matches between two images to find the fundamental matrix.
For this, we use SIFT descriptors with FLANN based matcher and ratio test.
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img1 = cv2.imread('myleft.jpg',0) #queryimage # left image
img2 = cv2.imread('myright.jpg',0) #trainimage # right image
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
good = []
pts1 = []
pts2 = []
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.8*n.distance:
good.append(m)
pts2.append(kp2[m.trainIdx].pt)
pts1.append(kp1[m.queryIdx].pt)
@endcode
Now we have the list of best matches from both the images. Let's find the Fundamental Matrix.
@code{.py}
pts1 = np.int32(pts1)
pts2 = np.int32(pts2)
F, mask = cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS)
# We select only inlier points
pts1 = pts1[mask.ravel()==1]
pts2 = pts2[mask.ravel()==1]
@endcode
Next we find the epilines. Epilines corresponding to the points in first image is drawn on second
image. So mentioning of correct images are important here. We get an array of lines. So we define a
new function to draw these lines on the images.
@code{.py}
def drawlines(img1,img2,lines,pts1,pts2):
''' img1 - image on which we draw the epilines for the points in img2
lines - corresponding epilines '''
r,c = img1.shape
img1 = cv2.cvtColor(img1,cv2.COLOR_GRAY2BGR)
img2 = cv2.cvtColor(img2,cv2.COLOR_GRAY2BGR)
for r,pt1,pt2 in zip(lines,pts1,pts2):
color = tuple(np.random.randint(0,255,3).tolist())
x0,y0 = map(int, [0, -r[2]/r[1] ])
x1,y1 = map(int, [c, -(r[2]+r[0]*c)/r[1] ])
img1 = cv2.line(img1, (x0,y0), (x1,y1), color,1)
img1 = cv2.circle(img1,tuple(pt1),5,color,-1)
img2 = cv2.circle(img2,tuple(pt2),5,color,-1)
return img1,img2
@endcode
Now we find the epilines in both the images and draw them.
@code{.py}
# Find epilines corresponding to points in right image (second image) and
# drawing its lines on left image
lines1 = cv2.computeCorrespondEpilines(pts2.reshape(-1,1,2), 2,F)
lines1 = lines1.reshape(-1,3)
img5,img6 = drawlines(img1,img2,lines1,pts1,pts2)
# Find epilines corresponding to points in left image (first image) and
# drawing its lines on right image
lines2 = cv2.computeCorrespondEpilines(pts1.reshape(-1,1,2), 1,F)
lines2 = lines2.reshape(-1,3)
img3,img4 = drawlines(img2,img1,lines2,pts2,pts1)
plt.subplot(121),plt.imshow(img5)
plt.subplot(122),plt.imshow(img3)
plt.show()
@endcode
Below is the result we get:
![image](images/epiresult.jpg)
You can see in the left image that all epilines are converging at a point outside the image at right
side. That meeting point is the epipole.
For better results, images with good resolution and many non-planar points should be used.
Additional Resources
--------------------
Exercises
---------
-# One important topic is the forward movement of camera. Then epipoles will be seen at the same
locations in both with epilines emerging from a fixed point. [See this
discussion](http://answers.opencv.org/question/17912/location-of-epipole/).
2. Fundamental Matrix estimation is sensitive to quality of matches, outliers etc. It becomes worse
when all selected matches lie on the same plane. [Check this
discussion](http://answers.opencv.org/question/18125/epilines-not-correct/).
Pose Estimation {#tutorial_py_pose}
===============
Goal
----
In this section,
- We will learn to exploit calib3d module to create some 3D effects in images.
Basics
------
This is going to be a small section. During the last session on camera calibration, you have found
the camera matrix, distortion coefficients etc. Given a pattern image, we can utilize the above
information to calculate its pose, or how the object is situated in space, like how it is rotated,
how it is displaced etc. For a planar object, we can assume Z=0, such that, the problem now becomes
how camera is placed in space to see our pattern image. So, if we know how the object lies in the
space, we can draw some 2D diagrams in it to simulate the 3D effect. Let's see how to do it.
Our problem is, we want to draw our 3D coordinate axis (X, Y, Z axes) on our chessboard's first
corner. X axis in blue color, Y axis in green color and Z axis in red color. So in-effect, Z axis
should feel like it is perpendicular to our chessboard plane.
First, let's load the camera matrix and distortion coefficients from the previous calibration
result.
@code{.py}
import cv2
import numpy as np
import glob
# Load previously saved data
with np.load('B.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
@endcode
Now let's create a function, draw which takes the corners in the chessboard (obtained using
**cv2.findChessboardCorners()**) and **axis points** to draw a 3D axis.
@code{.py}
def draw(img, corners, imgpts):
corner = tuple(corners[0].ravel())
img = cv2.line(img, corner, tuple(imgpts[0].ravel()), (255,0,0), 5)
img = cv2.line(img, corner, tuple(imgpts[1].ravel()), (0,255,0), 5)
img = cv2.line(img, corner, tuple(imgpts[2].ravel()), (0,0,255), 5)
return img
@endcode
Then as in previous case, we create termination criteria, object points (3D points of corners in
chessboard) and axis points. Axis points are points in 3D space for drawing the axis. We draw axis
of length 3 (units will be in terms of chess square size since we calibrated based on that size). So
our X axis is drawn from (0,0,0) to (3,0,0), so for Y axis. For Z axis, it is drawn from (0,0,0) to
(0,0,-3). Negative denotes it is drawn towards the camera.
@code{.py}
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
axis = np.float32([[3,0,0], [0,3,0], [0,0,-3]]).reshape(-1,3)
@endcode
Now, as usual, we load each image. Search for 7x6 grid. If found, we refine it with subcorner
pixels. Then to calculate the rotation and translation, we use the function,
**cv2.solvePnPRansac()**. Once we those transformation matrices, we use them to project our **axis
points** to the image plane. In simple words, we find the points on image plane corresponding to
each of (3,0,0),(0,3,0),(0,0,3) in 3D space. Once we get them, we draw lines from the first corner
to each of these points using our draw() function. Done !!!
@code{.py}
for fname in glob.glob('left*.jpg'):
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret, corners = cv2.findChessboardCorners(gray, (7,6),None)
if ret == True:
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# Find the rotation and translation vectors.
rvecs, tvecs, inliers = cv2.solvePnPRansac(objp, corners2, mtx, dist)
# project 3D points to image plane
imgpts, jac = cv2.projectPoints(axis, rvecs, tvecs, mtx, dist)
img = draw(img,corners2,imgpts)
cv2.imshow('img',img)
k = cv2.waitKey(0) & 0xff
if k == 's':
cv2.imwrite(fname[:6]+'.png', img)
cv2.destroyAllWindows()
@endcode
See some results below. Notice that each axis is 3 squares long.:
![image](images/pose_1.jpg)
### Render a Cube
If you want to draw a cube, modify the draw() function and axis points as follows.
Modified draw() function:
@code{.py}
def draw(img, corners, imgpts):
imgpts = np.int32(imgpts).reshape(-1,2)
# draw ground floor in green
img = cv2.drawContours(img, [imgpts[:4]],-1,(0,255,0),-3)
# draw pillars in blue color
for i,j in zip(range(4),range(4,8)):
img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]),(255),3)
# draw top layer in red color
img = cv2.drawContours(img, [imgpts[4:]],-1,(0,0,255),3)
return img
@endcode
Modified axis points. They are the 8 corners of a cube in 3D space:
@code{.py}
axis = np.float32([[0,0,0], [0,3,0], [3,3,0], [3,0,0],
[0,0,-3],[0,3,-3],[3,3,-3],[3,0,-3] ])
@endcode
And look at the result below:
![image](images/pose_2.jpg)
If you are interested in graphics, augmented reality etc, you can use OpenGL to render more
complicated figures.
Additional Resources
--------------------
Exercises
---------
Camera Calibration and 3D Reconstruction {#tutorial_py_table_of_contents_calib3d}
========================================
- @subpage tutorial_py_calibration
Let's find how good
is our camera. Is there any distortion in images taken with it? If so how to correct it?
- @subpage tutorial_py_pose
This is a small
section which will help you to create some cool 3D effects with calib module.
- @subpage tutorial_py_epipolar_geometry
Let's understand
epipolar geometry and epipolar constraint.
- @subpage tutorial_py_depthmap
Extract depth
information from 2D images.
Basic Operations on Images {#tutorial_py_basic_ops}
==========================
Goal
----
Learn to:
- Access pixel values and modify them
- Access image properties
- Setting Region of Image (ROI)
- Splitting and Merging images
Almost all the operations in this section is mainly related to Numpy rather than OpenCV. A good
knowledge of Numpy is required to write better optimized code with OpenCV.
*( Examples will be shown in Python terminal since most of them are just single line codes )*
Accessing and Modifying pixel values
------------------------------------
Let's load a color image first:
@code{.py}
import cv2
import numpy as np
img = cv2.imread('messi5.jpg')
@endcode
You can access a pixel value by its row and column coordinates. For BGR image, it returns an array
of Blue, Green, Red values. For grayscale image, just corresponding intensity is returned.
@code{.py}
px = img[100,100]
print px
[157 166 200]
# accessing only blue pixel
blue = img[100,100,0]
print blue
157
@endcode
You can modify the pixel values the same way.
@code{.py}
img[100,100] = [255,255,255]
print img[100,100]
[255 255 255]
@endcode
**warning**
Numpy is a optimized library for fast array calculations. So simply accessing each and every pixel
values and modifying it will be very slow and it is discouraged.
@note Above mentioned method is normally used for selecting a region of array, say first 5 rows and
last 3 columns like that. For individual pixel access, Numpy array methods, array.item() and
array.itemset() is considered to be better. But it always returns a scalar. So if you want to access
all B,G,R values, you need to call array.item() separately for all. Better pixel accessing and
editing method :
@code{.python}
# accessing RED value
img.item(10,10,2)
59
# modifying RED value
img.itemset((10,10,2),100)
img.item(10,10,2)
100
@endcode
Accessing Image Properties
--------------------------
Image properties include number of rows, columns and channels, type of image data, number of pixels
etc.
Shape of image is accessed by img.shape. It returns a tuple of number of rows, columns and channels
(if image is color):
@code{.py}
print img.shape
(342, 548, 3)
@endcode
@note If image is grayscale, tuple returned contains only number of rows and columns. So it is a
good method to check if loaded image is grayscale or color image. Total number of pixels is accessed
by \`img.size\`:
@code{.py}
print img.size
562248
@endcode
Image datatype is obtained by \`img.dtype\`:
@code{.py}
print img.dtype
uint8
@endcode
@note img.dtype is very important while debugging because a large number of errors in OpenCV-Python
code is caused by invalid datatype. Image ROI ===========
Sometimes, you will have to play with certain region of images. For eye detection in images, first
face detection is done all over the image and when face is obtained, we select the face region alone
and search for eyes inside it instead of searching whole image. It improves accuracy (because eyes
are always on faces :D ) and performance (because we search for a small area)
ROI is again obtained using Numpy indexing. Here I am selecting the ball and copying it to another
region in the image:
@code{.py}
ball = img[280:340, 330:390]
img[273:333, 100:160] = ball
@endcode
Check the results below:
![image](images/roi.jpg)
Splitting and Merging Image Channels
------------------------------------
Sometimes you will need to work separately on B,G,R channels of image. Then you need to split the
BGR images to single planes. Or another time, you may need to join these individual channels to BGR
image. You can do it simply by:
@code{.py}
b,g,r = cv2.split(img)
img = cv2.merge((b,g,r))
@endcode
Or
\>\>\> b = img[:,:,0]
Suppose, you want to make all the red pixels to zero, you need not split like this and put it equal
to zero. You can simply use Numpy indexing, and that is more faster.
@code{.py}
img[:,:,2] = 0
@endcode
**warning**
cv2.split() is a costly operation (in terms of time). So do it only if you need it. Otherwise go
for Numpy indexing.
Making Borders for Images (Padding)
-----------------------------------
If you want to create a border around the image, something like a photo frame, you can use
**cv2.copyMakeBorder()** function. But it has more applications for convolution operation, zero
padding etc. This function takes following arguments:
- **src** - input image
- **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding
directions
-
**borderType** - Flag defining what kind of border to be added. It can be following types:
- **cv2.BORDER_CONSTANT** - Adds a constant colored border. The value should be given
as next argument.
- **cv2.BORDER_REFLECT** - Border will be mirror reflection of the border elements,
like this : *fedcba|abcdefgh|hgfedcb*
- **cv2.BORDER_REFLECT_101** or **cv2.BORDER_DEFAULT** - Same as above, but with a
slight change, like this : *gfedcb|abcdefgh|gfedcba*
- **cv2.BORDER_REPLICATE** - Last element is replicated throughout, like this:
*aaaaaa|abcdefgh|hhhhhhh*
- **cv2.BORDER_WRAP** - Can't explain, it will look like this :
*cdefgh|abcdefgh|abcdefg*
- **value** - Color of border if border type is cv2.BORDER_CONSTANT
Below is a sample code demonstrating all these border types for better understanding:
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
BLUE = [255,0,0]
img1 = cv2.imread('opencv_logo.png')
replicate = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REPLICATE)
reflect = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT)
reflect101 = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_REFLECT_101)
wrap = cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_WRAP)
constant= cv2.copyMakeBorder(img1,10,10,10,10,cv2.BORDER_CONSTANT,value=BLUE)
plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL')
plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE')
plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT')
plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101')
plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP')
plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')
plt.show()
@endcode
See the result below. (Image is displayed with matplotlib. So RED and BLUE planes will be
interchanged):
![image](images/border.jpg)
Additional Resources
--------------------
Exercises
---------
Arithmetic Operations on Images {#tutorial_py_image_arithmetics}
===============================
Goal
----
- Learn several arithmetic operations on images like addition, subtraction, bitwise operations
etc.
- You will learn these functions : **cv2.add()**, **cv2.addWeighted()** etc.
Image Addition
--------------
You can add two images by OpenCV function, cv2.add() or simply by numpy operation,
res = img1 + img2. Both images should be of same depth and type, or second image can just be a
scalar value.
@note There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a
saturated operation while Numpy addition is a modulo operation. For example, consider below sample:
@code{.py}
x = np.uint8([250])
y = np.uint8([10])
print cv2.add(x,y) # 250+10 = 260 => 255
[[255]]
print x+y # 250+10 = 260 % 256 = 4
[4]
@endcode
It will be more visible when you add two images. OpenCV function will provide a better result. So
always better stick to OpenCV functions.
Image Blending
--------------
This is also image addition, but different weights are given to images so that it gives a feeling of
blending or transparency. Images are added as per the equation below:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$, you can perform a cool transition between one image to
another.
Here I took two images to blend them together. First image is given a weight of 0.7 and second image
is given 0.3. cv2.addWeighted() applies following equation on the image.
\f[dst = \alpha \cdot img1 + \beta \cdot img2 + \gamma\f]
Here \f$\gamma\f$ is taken as zero.
@code{.py}
img1 = cv2.imread('ml.png')
img2 = cv2.imread('opencv_logo.jpg')
dst = cv2.addWeighted(img1,0.7,img2,0.3,0)
cv2.imshow('dst',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
Check the result below:
![image](images/blending.jpg)
Bitwise Operations
------------------
This includes bitwise AND, OR, NOT and XOR operations. They will be highly useful while extracting
any part of the image (as we will see in coming chapters), defining and working with non-rectangular
ROI etc. Below we will see an example on how to change a particular region of an image.
I want to put OpenCV logo above an image. If I add two images, it will change color. If I blend it,
I get an transparent effect. But I want it to be opaque. If it was a rectangular region, I could use
ROI as we did in last chapter. But OpenCV logo is a not a rectangular shape. So you can do it with
bitwise operations as below:
@code{.py}
# Load two images
img1 = cv2.imread('messi5.jpg')
img2 = cv2.imread('opencv_logo.png')
# I want to put logo on top-left corner, So I create a ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols ]
# Now create a mask of logo and create its inverse mask also
img2gray = cv2.cvtColor(img2,cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 10, 255, cv2.THRESH_BINARY)
mask_inv = cv2.bitwise_not(mask)
# Now black-out the area of logo in ROI
img1_bg = cv2.bitwise_and(roi,roi,mask = mask_inv)
# Take only region of logo from logo image.
img2_fg = cv2.bitwise_and(img2,img2,mask = mask)
# Put logo in ROI and modify the main image
dst = cv2.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst
cv2.imshow('res',img1)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
See the result below. Left image shows the mask we created. Right image shows the final result. For
more understanding, display all the intermediate images in the above code, especially img1_bg and
img2_fg.
![image](images/overlay.jpg)
Additional Resources
--------------------
Exercises
---------
-# Create a slide show of images in a folder with smooth transition between images using
cv2.addWeighted function
Performance Measurement and Improvement Techniques {#tutorial_py_optimization}
==================================================
Goal
----
In image processing, since you are dealing with large number of operations per second, it is
mandatory that your code is not only providing the correct solution, but also in the fastest manner.
So in this chapter, you will learn
- To measure the performance of your code.
- Some tips to improve the performance of your code.
- You will see these functions : **cv2.getTickCount**, **cv2.getTickFrequency** etc.
Apart from OpenCV, Python also provides a module **time** which is helpful in measuring the time of
execution. Another module **profile** helps to get detailed report on the code, like how much time
each function in the code took, how many times the function was called etc. But, if you are using
IPython, all these features are integrated in an user-friendly manner. We will see some important
ones, and for more details, check links in **Additional Resouces** section.
Measuring Performance with OpenCV
---------------------------------
**cv2.getTickCount** function returns the number of clock-cycles after a reference event (like the
moment machine was switched ON) to the moment this function is called. So if you call it before and
after the function execution, you get number of clock-cycles used to execute a function.
**cv2.getTickFrequency** function returns the frequency of clock-cycles, or the number of
clock-cycles per second. So to find the time of execution in seconds, you can do following:
@code{.py}
e1 = cv2.getTickCount()
# your code execution
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()
@endcode
We will demonstrate with following example. Following example apply median filtering with a kernel
of odd size ranging from 5 to 49. (Don't worry about what will the result look like, that is not our
goal):
@code{.py}
img1 = cv2.imread('messi5.jpg')
e1 = cv2.getTickCount()
for i in xrange(5,49,2):
img1 = cv2.medianBlur(img1,i)
e2 = cv2.getTickCount()
t = (e2 - e1)/cv2.getTickFrequency()
print t
# Result I got is 0.521107655 seconds
@endcode
@note You can do the same with time module. Instead of cv2.getTickCount, use time.time() function.
Then take the difference of two times.
Default Optimization in OpenCV
------------------------------
Many of the OpenCV functions are optimized using SSE2, AVX etc. It contains unoptimized code also.
So if our system support these features, we should exploit them (almost all modern day processors
support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is
enabled, else it runs the unoptimized code. You can use **cv2.useOptimized()** to check if it is
enabled/disabled and **cv2.setUseOptimized()** to enable/disable it. Let's see a simple example.
@code{.py}
# check if optimization is enabled
In [5]: cv2.useOptimized()
Out[5]: True
In [6]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop
# Disable it
In [7]: cv2.setUseOptimized(False)
In [8]: cv2.useOptimized()
Out[8]: False
In [9]: %timeit res = cv2.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop
@endcode
See, optimized median filtering is \~2x faster than unoptimized version. If you check its source,
you can see median filtering is SIMD optimized. So you can use this to enable optimization at the
top of your code (remember it is enabled by default).
Measuring Performance in IPython
--------------------------------
Sometimes you may need to compare the performance of two similar operations. IPython gives you a
magic command %timeit to perform this. It runs the code several times to get more accurate results.
Once again, they are suitable to measure single line codes.
For example, do you know which of the following addition operation is better, x = 5; y = x\*\*2,
x = 5; y = x\*x, x = np.uint8([5]); y = x\*x or y = np.square(x) ? We will find it with %timeit in
IPython shell.
@code{.py}
In [10]: x = 5
In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop
In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop
In [15]: z = np.uint8([5])
In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop
In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop
@endcode
You can see that, x = 5 ; y = x\*x is fastest and it is around 20x faster compared to Numpy. If you
consider the array creation also, it may reach upto 100x faster. Cool, right? *(Numpy devs are
working on this issue)*
@note Python scalar operations are faster than Numpy scalar operations. So for operations including
one or two elements, Python scalar is better than Numpy arrays. Numpy takes advantage when size of
array is a little bit bigger. We will try one more example. This time, we will compare the
performance of **cv2.countNonZero()** and **np.count_nonzero()** for same image.
@code{.py}
In [35]: %timeit z = cv2.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop
In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop
@endcode
See, OpenCV function is nearly 25x faster than Numpy function.
@note Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV
functions are preferred. But, there can be exceptions, especially when Numpy works with views
instead of copies.
More IPython magic commands
---------------------------
There are several other magic commands to measure the performance, profiling, line profiling, memory
measurement etc. They all are well documented. So only links to those docs are provided here.
Interested readers are recommended to try them out.
Performance Optimization Techniques
-----------------------------------
There are several techniques and coding methods to exploit maximum performance of Python and Numpy.
Only relevant ones are noted here and links are given to important sources. The main thing to be
noted here is that, first try to implement the algorithm in a simple manner. Once it is working,
profile it, find the bottlenecks and optimize them.
-# Avoid using loops in Python as far as possible, especially double/triple loops etc. They are
inherently slow.
2. Vectorize the algorithm/code to the maximum possible extent because Numpy and OpenCV are
optimized for vector operations.
3. Exploit the cache coherence.
4. Never make copies of array unless it is needed. Try to use views instead. Array copying is a
costly operation.
Even after doing all these operations, if your code is still slow, or use of large loops are
inevitable, use additional libraries like Cython to make it faster.
Additional Resources
--------------------
-# [Python Optimization Techniques](http://wiki.python.org/moin/PythonSpeed/PerformanceTips)
2. Scipy Lecture Notes - [Advanced
Numpy](http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy)
3. [Timing and Profiling in IPython](http://pynash.org/2013/03/06/timing-and-profiling.html)
Exercises
---------
Core Operations {#tutorial_py_table_of_contents_core}
===============
- @subpage tutorial_py_basic_ops
Learn to read and
edit pixel values, working with image ROI and other basic operations.
- @subpage tutorial_py_image_arithmetics
Perform arithmetic
operations on images
- @subpage tutorial_py_optimization
Getting a solution is
important. But getting it in the fastest way is more important. Learn to check the speed of your
code, optimize the code etc.
BRIEF (Binary Robust Independent Elementary Features) {#tutorial_py_brief}
=====================================================
Goal
----
In this chapter
- We will see the basics of BRIEF algorithm
Theory
------
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes
basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a
vector for thousands of features takes a lot of memory which are not feasible for resouce-constraint
applications especially for embedded systems. Larger the memory, longer the time it takes for
matching.
But all these dimensions may not be needed for actual matching. We can compress it using several
methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is
used to convert these SIFT descriptors in floating point numbers to binary strings. These binary
strings are used to match features using Hamming distance. This provides better speed-up because
finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with
SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing,
which doesn't solve our initial problem on memory.
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly
without finding descriptors. It takes smoothened image patch and selects a set of \f$n_d\f$ (x,y)
location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done
on these location pairs. For eg, let first location pairs be \f$p\f$ and \f$q\f$. If \f$I(p) < I(q)\f$, then its
result is 1, else it is 0. This is applied for all the \f$n_d\f$ location pairs to get a
\f$n_d\f$-dimensional bitstring.
This \f$n_d\f$ can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256
(OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can
use Hamming Distance to match these descriptors.
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the
features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper
recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE
points than for SURF points.
In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides
high recognition rate unless there is large in-plane rotation.
BRIEF in OpenCV
---------------
Below code shows the computation of BRIEF descriptors with the help of CenSurE detector. (CenSurE
detector is called STAR detector in OpenCV)
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate STAR detector
star = cv2.FeatureDetector_create("STAR")
# Initiate BRIEF extractor
brief = cv2.DescriptorExtractor_create("BRIEF")
# find the keypoints with STAR
kp = star.detect(img,None)
# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)
print brief.getInt('bytes')
print des.shape
@endcode
The function brief.getInt('bytes') gives the \f$n_d\f$ size used in bytes. By default it is 32. Next one
is matching, which will be done in another chapter.
Additional Resources
--------------------
-# Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, "BRIEF: Binary Robust
Independent Elementary Features", 11th European Conference on Computer Vision (ECCV), Heraklion,
Crete. LNCS Springer, September 2010.
2. LSH (Locality Sensitive Hasing) at wikipedia.
FAST Algorithm for Corner Detection {#tutorial_py_fast}
===================================
Goal
----
In this chapter,
- We will understand the basics of FAST algorithm
- We will find corners using OpenCV functionalities for FAST algorithm.
Theory
------
We saw several feature detectors and many of them are really good. But when looking from a real-time
application point of view, they are not fast enough. One best example would be SLAM (Simultaneous
Localization and Mapping) mobile robot which have limited computational resources.
As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by
Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in
2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original
paper for more details (All the images are taken from original paper).
### Feature Detection using FAST
-# Select a pixel \f$p\f$ in the image which is to be identified as an interest point or not. Let its
intensity be \f$I_p\f$.
2. Select appropriate threshold value \f$t\f$.
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
![image](images/fast_speedtest.jpg)
-# Now the pixel \f$p\f$ is a corner if there exists a set of \f$n\f$ contiguous pixels in the circle (of
16 pixels) which are all brighter than \f$I_p + t\f$, or all darker than \f$I_p − t\f$. (Shown as white
dash lines in the above image). \f$n\f$ was chosen to be 12.
5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines
only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or
darker. If so, then checks 5 and 13). If \f$p\f$ is a corner, then at least three of these must all
be brighter than \f$I_p + t\f$ or darker than \f$I_p − t\f$. If neither of these is the case, then \f$p\f$
cannot be a corner. The full segment test criterion can then be applied to the passed candidates
by examining all pixels in the circle. This detector in itself exhibits high performance, but
there are several weaknesses:
- It does not reject as many candidates for n \< 12.
- The choice of pixels is not optimal because its efficiency depends on ordering of the
questions and distribution of corner appearances.
- Results of high-speed tests are thrown away.
- Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using
non-maximal suppression.
### Machine Learning a Corner Detector
-# Select a set of images for training (preferably from the target application domain)
2. Run FAST algorithm in every images to find feature points.
3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to
get feature vector \f$P\f$.
4. Each pixel (say \f$x\f$) in these 16 pixels can have one of the following three states:
![image](images/fast_eqns.jpg)
-# Depending on these states, the feature vector \f$P\f$ is subdivided into 3 subsets, \f$P_d\f$, \f$P_s\f$,
\f$P_b\f$.
6. Define a new boolean variable, \f$K_p\f$, which is true if \f$p\f$ is a corner and false otherwise.
7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable \f$K_p\f$
for the knowledge about the true class. It selects the \f$x\f$ which yields the most information
about whether the candidate pixel is a corner, measured by the entropy of \f$K_p\f$.
8. This is recursively applied to all the subsets until its entropy is zero.
9. The decision tree so created is used for fast detection in other images.
### Non-maximal Suppression
Detecting multiple interest points in adjacent locations is another problem. It is solved by using
Non-maximum Suppression.
-# Compute a score function, \f$V\f$ for all the detected feature points. \f$V\f$ is the sum of absolute
difference between \f$p\f$ and 16 surrounding pixels values.
2. Consider two adjacent keypoints and compute their \f$V\f$ values.
3. Discard the one with lower \f$V\f$ value.
### Summary
It is several times faster than other existing corner detectors.
But it is not robust to high levels of noise. It is dependant on a threshold.
FAST Feature Detector in OpenCV
-------------------------------
It is called as any other feature detector in OpenCV. If you want, you can specify the threshold,
whether non-maximum suppression to be applied or not, the neighborhood to be used etc.
For the neighborhood, three flags are defined, cv2.FAST_FEATURE_DETECTOR_TYPE_5_8,
cv2.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv2.FAST_FEATURE_DETECTOR_TYPE_9_16. Below is a
simple code on how to detect and draw the FAST feature points.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate FAST object with default values
fast = cv2.FastFeatureDetector()
# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv2.drawKeypoints(img, kp, color=(255,0,0))
# Print all default params
print "Threshold: ", fast.getInt('threshold')
print "nonmaxSuppression: ", fast.getBool('nonmaxSuppression')
print "neighborhood: ", fast.getInt('type')
print "Total Keypoints with nonmaxSuppression: ", len(kp)
cv2.imwrite('fast_true.png',img2)
# Disable nonmaxSuppression
fast.setBool('nonmaxSuppression',0)
kp = fast.detect(img,None)
print "Total Keypoints without nonmaxSuppression: ", len(kp)
img3 = cv2.drawKeypoints(img, kp, color=(255,0,0))
cv2.imwrite('fast_false.png',img3)
@endcode
See the results. First image shows FAST with nonmaxSuppression and second one without
nonmaxSuppression:
![image](images/fast_kp.jpg)
Additional Resources
--------------------
-# Edward Rosten and Tom Drummond, “Machine learning for high speed corner detection” in 9th
European Conference on Computer Vision, vol. 1, 2006, pp. 430–443.
2. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to
corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp.
105-119.
Exercises
---------
Feature Matching + Homography to find Objects {#tutorial_py_feature_homography}
=============================================
Goal
----
In this chapter,
- We will mix up the feature matching and findHomography from calib3d module to find known
objects in a complex image.
Basics
------
So what we did in last session? We used a queryImage, found some feature points in it, we took
another trainImage, found the features in that image too and we found the best matches among them.
In short, we found locations of some parts of an object in another cluttered image. This information
is sufficient to find the object exactly on the trainImage.
For that, we can use a function from calib3d module, ie **cv2.findHomography()**. If we pass the set
of points from both the images, it will find the perpective transformation of that object. Then we
can use **cv2.perspectiveTransform()** to find the object. It needs atleast four correct points to
find the transformation.
We have seen that there can be some possible errors while matching which may affect the result. To
solve this problem, algorithm uses RANSAC or LEAST_MEDIAN (which can be decided by the flags). So
good matches which provide correct estimation are called inliers and remaining are called outliers.
**cv2.findHomography()** returns a mask which specifies the inlier and outlier points.
So let's do it !!!
Code
----
First, as usual, let's find SIFT features in images and apply the ratio test to find the best
matches.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
MIN_MATCH_COUNT = 10
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1,des2,k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)
@endcode
Now we set a condition that atleast 10 matches (defined by MIN_MATCH_COUNT) are to be there to
find the object. Otherwise simply show a message saying not enough matches are present.
If enough matches are found, we extract the locations of matched keypoints in both the images. They
are passed to find the perpective transformation. Once we get this 3x3 transformation matrix, we use
it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it.
@code{.py}
if len(good)>MIN_MATCH_COUNT:
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC,5.0)
matchesMask = mask.ravel().tolist()
h,w = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv2.perspectiveTransform(pts,M)
img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3, cv2.LINE_AA)
else:
print "Not enough matches are found - %d/%d" % (len(good),MIN_MATCH_COUNT)
matchesMask = None
@endcode
Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed).
@code{.py}
draw_params = dict(matchColor = (0,255,0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)
img3 = cv2.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
plt.imshow(img3, 'gray'),plt.show()
@endcode
See the result below. Object is marked in white color in cluttered image:
![image](images/homography_findobj.jpg)
Additional Resources
--------------------
Exercises
---------
Harris Corner Detection {#tutorial_py_features_harris}
=======================
Goal
----
In this chapter,
- We will understand the concepts behind Harris Corner Detection.
- We will see the functions: **cv2.cornerHarris()**, **cv2.cornerSubPix()**
Theory
------
In last chapter, we saw that corners are regions in the image with large variation in intensity in
all the directions. One early attempt to find these corners was done by **Chris Harris & Mike
Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called
Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the
difference in intensity for a displacement of \f$(u,v)\f$ in all directions. This is expressed as below:
\f[E(u,v) = \sum_{x,y} \underbrace{w(x,y)}_\text{window function} \, [\underbrace{I(x+u,y+v)}_\text{shifted intensity}-\underbrace{I(x,y)}_\text{intensity}]^2\f]
Window function is either a rectangular window or gaussian window which gives weights to pixels
underneath.
We have to maximize this function \f$E(u,v)\f$ for corner detection. That means, we have to maximize the
second term. Applying Taylor Expansion to above equation and using some mathematical steps (please
refer any standard text books you like for full derivation), we get the final equation as:
\f[E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}\f]
where
\f[M = \sum_{x,y} w(x,y) \begin{bmatrix}I_x I_x & I_x I_y \\
I_x I_y & I_y I_y \end{bmatrix}\f]
Here, \f$I_x\f$ and \f$I_y\f$ are image derivatives in x and y directions respectively. (Can be easily found
out using **cv2.Sobel()**).
Then comes the main part. After this, they created a score, basically an equation, which will
determine if a window can contain a corner or not.
\f[R = det(M) - k(trace(M))^2\f]
where
- \f$det(M) = \lambda_1 \lambda_2\f$
- \f$trace(M) = \lambda_1 + \lambda_2\f$
- \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigen values of M
So the values of these eigen values decide whether a region is corner, edge or flat.
- When \f$|R|\f$ is small, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are small, the region is
flat.
- When \f$R<0\f$, which happens when \f$\lambda_1 >> \lambda_2\f$ or vice versa, the region is edge.
- When \f$R\f$ is large, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are large and
\f$\lambda_1 \sim \lambda_2\f$, the region is a corner.
It can be represented in a nice picture as follows:
![image](images/harris_region.jpg)
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a
suitable give you the corners in the image. We will do it with a simple image.
Harris Corner Detector in OpenCV
--------------------------------
OpenCV has the function **cv2.cornerHarris()** for this purpose. Its arguments are :
- **img** - Input image, it should be grayscale and float32 type.
- **blockSize** - It is the size of neighbourhood considered for corner detection
- **ksize** - Aperture parameter of Sobel derivative used.
- **k** - Harris detector free parameter in the equation.
See the example below:
@code{.py}
import cv2
import numpy as np
filename = 'chessboard.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)
#result is dilated for marking the corners, not important
dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv2.imshow('dst',img)
if cv2.waitKey(0) & 0xff == 27:
cv2.destroyAllWindows()
@endcode
Below are the three results:
![image](images/harris_result.jpg)
Corner with SubPixel Accuracy
-----------------------------
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function
**cv2.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is
an example. As usual, we need to find the harris corners first. Then we pass the centroids of these
corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris
corners are marked in red pixels and refined corners are marked in green pixels. For this function,
we have to define the criteria when to stop the iteration. We stop it after a specified number of
iteration or a certain accuracy is achieved, whichever occurs first. We also need to define the size
of neighbourhood it would search for corners.
@code{.py}
import cv2
import numpy as np
filename = 'chessboard2.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# find Harris corners
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,3,0.04)
dst = cv2.dilate(dst,None)
ret, dst = cv2.threshold(dst,0.01*dst.max(),255,0)
dst = np.uint8(dst)
# find centroids
ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)
# define the criteria to stop and refine the corners
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv2.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)
# Now draw them
res = np.hstack((centroids,corners))
res = np.int0(res)
img[res[:,1],res[:,0]]=[0,0,255]
img[res[:,3],res[:,2]] = [0,255,0]
cv2.imwrite('subpixel5.png',img)
@endcode
Below is the result, where some important locations are shown in zoomed window to visualize:
![image](images/subpixel3.png)
Additional Resources
--------------------
Exercises
---------
Understanding Features {#tutorial_py_features_meaning}
======================
Goal
----
In this chapter, we will just try to understand what are features, why are they important, why
corners are important etc.
Explanation
-----------
Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of a images,
where you need to assemble them correctly to form a big real image. **The question is, how you do
it?** What about the projecting the same theory to a computer program so that computer can play
jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images
of a good natural scenery to computer and tell it to stitch all those images to a big single image?
If the computer can stitch several natural images to one, what about giving a lot of pictures of a
building or any structure and tell computer to create a 3D model out of it?
Well, the questions and imaginations continue. But it all depends on the most basic question: How do
you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image?
How can you stitch a lot of natural images to a single image?
The answer is, we are looking for specific patterns or specific features which are unique, which can
be easily tracked, which can be easily compared. If we go for a definition of such a feature, we may
find it difficult to express it in words, but we know what are they. If some one asks you to point
out one good feature which can be compared across several images, you can point out one. That is
why, even small children can simply play these games. We search for these features in an image, we
find them, we find the same features in other images, we align them. That's it. (In jigsaw puzzle,
we look more into continuity of different images). All these abilities are present in us inherently.
So our one basic question expands to more in number, but becomes more specific. **What are these
features?**. *(The answer should be understandable to a computer also.)*
Well, it is difficult to say how humans find these features. It is already programmed in our brain.
But if we look deep into some pictures and search for different patterns, we will find something
interesting. For example, take below image:
![image](images/feature_building.jpg)
Image is very simple. At the top of image, six small image patches are given. Question for you is to
find the exact location of these patches in the original image. How many correct results you can
find ?
A and B are flat surfaces, and they are spread in a lot of area. It is difficult to find the exact
location of these patches.
C and D are much more simpler. They are edges of the building. You can find an approximate location,
but exact location is still difficult. It is because, along the edge, it is same everywhere. Normal
to the edge, it is different. So edge is a much better feature compared to flat area, but not good
enough (It is good in jigsaw puzzle for comparing continuity of edges).
Finally, E and F are some corners of the building. And they can be easily found out. Because at
corners, wherever you move this patch, it will look different. So they can be considered as a good
feature. So now we move into more simpler (and widely used image) for better understanding.
![image](images/feature_simple.png)
Just like above, blue patch is flat area and difficult to find and track. Wherever you move the blue
patch, it looks the same. For black patch, it is an edge. If you move it in vertical direction (i.e.
along the gradient) it changes. Put along the edge (parallel to edge), it looks the same. And for
red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So
basically, corners are considered to be good features in an image. (Not just corners, in some cases
blobs are considered good features).
So now we answered our question, "what are these features?". But next question arises. How do we
find them? Or how do we find the corners?. That also we answered in an intuitive way, i.e., look for
the regions in images which have maximum variation when moved (by a small amount) in all regions
around it. This would be projected into computer language in coming chapters. So finding these image
features is called **Feature Detection**.
So we found the features in image (Assume you did it). Once you found it, you should find the same
in the other images. What we do? We take a region around the feature, we explain it in our own
words, like "upper part is blue sky, lower part is building region, on that building there are some
glasses etc" and you search for the same area in other images. Basically, you are describing the
feature. Similar way, computer also should describe the region around the feature so that it can
find it in other images. So called description is called **Feature Description**. Once you have the
features and its description, you can find same features in all images and align them, stitch them
or do whatever you want.
So in this module, we are looking to different algorithms in OpenCV to find features, describe them,
match them etc.
Additional Resources
--------------------
Exercises
---------
Feature Matching {#tutorial_py_matcher}
================
Goal
----
In this chapter
- We will see how to match features in one image with others.
- We will use the Brute-Force matcher and FLANN Matcher in OpenCV
Basics of Brute-Force Matcher
-----------------------------
Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched
with all other features in second set using some distance calculation. And the closest one is
returned.
For BF matcher, first we have to create the BFMatcher object using **cv2.BFMatcher()**. It takes two
optional params. First one is normType. It specifies the distance measurement to be used. By
default, it is cv2.NORM_L2. It is good for SIFT, SURF etc (cv2.NORM_L1 is also there). For binary
string based descriptors like ORB, BRIEF, BRISK etc, cv2.NORM_HAMMING should be used, which used
Hamming distance as measurement. If ORB is using WTA_K == 3 or 4, cv2.NORM_HAMMING2 should be
used.
Second param is boolean variable, crossCheck which is false by default. If it is true, Matcher
returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor
in set B as the best match and vice-versa. That is, the two features in both sets should match each
other. It provides consistant result, and is a good alternative to ratio test proposed by D.Lowe in
SIFT paper.
Once it is created, two important methods are *BFMatcher.match()* and *BFMatcher.knnMatch()*. First
one returns the best match. Second method returns k best matches where k is specified by the user.
It may be useful when we need to do additional work on that.
Like we used cv2.drawKeypoints() to draw keypoints, **cv2.drawMatches()** helps us to draw the
matches. It stacks two images horizontally and draw lines from first image to second image showing
best matches. There is also **cv2.drawMatchesKnn** which draws all the k best matches. If k=2, it
will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively
draw it.
Let's see one example for each of SURF and ORB (Both use different distance measurements).
### Brute-Force Matching with ORB Descriptors
Here, we will see a simple example on how to match features between two images. In this case, I have
a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature
matching. ( The images are /samples/c/box.png and /samples/c/box_in_scene.png)
We are using SIFT descriptors to match features. So let's start with loading images, finding
descriptors etc.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
orb = cv2.ORB()
# find the keypoints and descriptors with SIFT
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
@endcode
Next we create a BFMatcher object with distance measurement cv2.NORM_HAMMING (since we are using
ORB) and crossCheck is switched on for better results. Then we use Matcher.match() method to get the
best matches in two images. We sort them in ascending order of their distances so that best matches
(with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility.
You can increase it as you like)
@code{.py}
# create BFMatcher object
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 10 matches.
img3 = cv2.drawMatches(img1,kp1,img2,kp2,matches[:10], flags=2)
plt.imshow(img3),plt.show()
@endcode
Below is the result I got:
![image](images/matcher_result1.jpg)
### What is this Matcher Object?
The result of matches = bf.match(des1,des2) line is a list of DMatch objects. This DMatch object has
following attributes:
- DMatch.distance - Distance between descriptors. The lower, the better it is.
- DMatch.trainIdx - Index of the descriptor in train descriptors
- DMatch.queryIdx - Index of the descriptor in query descriptors
- DMatch.imgIdx - Index of the train image.
### Brute-Force Matching with SIFT Descriptors and Ratio Test
This time, we will use BFMatcher.knnMatch() to get k best matches. In this example, we will take k=2
so that we can apply ratio test explained by D.Lowe in his paper.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# BFMatcher with default params
bf = cv2.BFMatcher()
matches = bf.knnMatch(des1,des2, k=2)
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append([m])
# cv2.drawMatchesKnn expects list of lists as matches.
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,good,flags=2)
plt.imshow(img3),plt.show()
@endcode
See the result below:
![image](images/matcher_result2.jpg)
FLANN based Matcher
-------------------
FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of
algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional
features. It works more faster than BFMatcher for large datasets. We will see the second example
with FLANN based matcher.
For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used,
its related parameters etc. First one is IndexParams. For various algorithms, the information to be
passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass
following:
@code{.py}
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
@endcode
While using ORB, you can pass the following. The commented values are recommended as per the docs,
but it didn't provide required results in some cases. Other values worked fine.:
@code{.py}
index_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2
@endcode
Second dictionary is the SearchParams. It specifies the number of times the trees in the index
should be recursively traversed. Higher values gives better precision, but also takes more time. If
you want to change the value, pass search_params = dict(checks=100).
With these informations, we are good to go.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img1 = cv2.imread('box.png',0) # queryImage
img2 = cv2.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv2.SIFT()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in xrange(len(matches))]
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = 0)
img3 = cv2.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
plt.imshow(img3,),plt.show()
@endcode
See the result below:
![image](images/matcher_flann.jpg)
Additional Resources
--------------------
Exercises
---------
ORB (Oriented FAST and Rotated BRIEF) {#tutorial_py_orb}
=====================================
Goal
----
In this chapter,
- We will see the basics of ORB
Theory
------
As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs".
This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in
their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a
good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents.
Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to
enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to
find top N points among them. It also use pyramid to produce multiscale-features. But one problem is
that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with
following modification.
It computes the intensity weighted centroid of the patch with located corner at center. The
direction of the vector from this corner point to centroid gives the orientation. To improve the
rotation invariance, moments are computed with x and y which should be in a circular region of
radius \f$r\f$, where \f$r\f$ is the size of the patch.
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly
with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For
any feature set of \f$n\f$ binary tests at location \f$(x_i, y_i)\f$, define a \f$2 \times n\f$ matrix, \f$S\f$
which contains the coordinates of these pixels. Then using the orientation of patch, \f$\theta\f$, its
rotation matrix is found and rotates the \f$S\f$ to get steered(rotated) version \f$S_\theta\f$.
ORB discretize the angle to increments of \f$2 \pi /30\f$ (12 degrees), and construct a lookup table of
precomputed BRIEF patterns. As long as the keypoint orientation \f$\theta\f$ is consistent across views,
the correct set of points \f$S_\theta\f$ will be used to compute its descriptor.
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But
once it is oriented along keypoint direction, it loses this property and become more distributed.
High variance makes a feature more discriminative, since it responds differentially to inputs.
Another desirable property is to have the tests uncorrelated, since then each test will contribute
to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to
find the ones that have both high variance and means close to 0.5, as well as being uncorrelated.
The result is called **rBRIEF**.
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper
says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good
choice in low-power devices for panorama stitching etc.
ORB in OpenCV
-------------
As usual, we have to create an ORB object with the function, **cv2.ORB()** or using feature2d common
interface. It has a number of optional parameters. Most useful ones are nFeatures which denotes
maximum number of features to be retained (by default 500), scoreType which denotes whether Harris
score or FAST score to rank the features (by default, Harris score) etc. Another parameter, WTA_K
decides number of points that produce each element of the oriented BRIEF descriptor. By default it
is two, ie selects two points at a time. In that case, for matching, NORM_HAMMING distance is used.
If WTA_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance
is defined by NORM_HAMMING2.
Below is a simple code which shows the use of ORB.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg',0)
# Initiate STAR detector
orb = cv2.ORB()
# find the keypoints with ORB
kp = orb.detect(img,None)
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
# draw only keypoints location,not size and orientation
img2 = cv2.drawKeypoints(img,kp,color=(0,255,0), flags=0)
plt.imshow(img2),plt.show()
@endcode
See the result below:
![image](images/orb_kp.jpg)
ORB feature matching, we will do in another chapter.
Additional Resources
--------------------
-# Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to
SIFT or SURF. ICCV 2011: 2564-2571.
Exercises
---------
Shi-Tomasi Corner Detector & Good Features to Track {#tutorial_py_shi_tomasi}
===================================================
Goal
----
In this chapter,
- We will learn about the another corner detector: Shi-Tomasi Corner Detector
- We will see the function: **cv2.goodFeaturesToTrack()**
Theory
------
In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small
modification to it in their paper **Good Features to Track** which shows better results compared to
Harris Corner Detector. The scoring function in Harris Corner Detector was given by:
\f[R = \lambda_1 \lambda_2 - k(\lambda_1+\lambda_2)^2\f]
Instead of this, Shi-Tomasi proposed:
\f[R = min(\lambda_1, \lambda_2)\f]
If it is a greater than a threshold value, it is considered as a corner. If we plot it in
\f$\lambda_1 - \lambda_2\f$ space as we did in Harris Corner Detector, we get an image as below:
![image](images/shitomasi_space.png)
From the figure, you can see that only when \f$\lambda_1\f$ and \f$\lambda_2\f$ are above a minimum value,
\f$\lambda_{min}\f$, it is conidered as a corner(green region).
Code
----
OpenCV has a function, **cv2.goodFeaturesToTrack()**. It finds N strongest corners in the image by
Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a
grayscale image. Then you specify number of corners you want to find. Then you specify the quality
level, which is a value between 0-1, which denotes the minimum quality of corner below which
everyone is rejected. Then we provide the minimum euclidean distance between corners detected.
With all these informations, the function finds corners in the image. All corners below quality
level are rejected. Then it sorts the remaining corners based on quality in the descending order.
Then function takes first strongest corner, throws away all the nearby corners in the range of
minimum distance and returns N strongest corners.
In below example, we will try to find 25 best corners:
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('simple.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
corners = cv2.goodFeaturesToTrack(gray,25,0.01,10)
corners = np.int0(corners)
for i in corners:
x,y = i.ravel()
cv2.circle(img,(x,y),3,255,-1)
plt.imshow(img),plt.show()
@endcode
See the result below:
![image](images/shitomasi_block1.jpg)
This function is more appropriate for tracking. We will see that when its time comes.
Additional Resources
--------------------
Exercises
---------
Introduction to SIFT (Scale-Invariant Feature Transform) {#tutorial_py_sift_intro}
========================================================
Goal
----
In this chapter,
- We will learn about the concepts of SIFT algorithm
- We will learn to find SIFT Keypoints and Descriptors.
Theory
------
In last couple of chapters, we saw some corner detectors like Harris etc. They are
rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is
obvious because corners remain corners in rotated image also. But what about scaling? A corner may
not be a corner if the image is scaled. For example, check a simple image below. A corner in a small
image within a small window is flat when it is zoomed in the same window. So Harris corner is not
scale invariant.
![image](images/sift_scale_invariant.jpg)
So, in 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale
Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant
Keypoints**, which extract keypoints and compute its descriptors. *(This paper is easy to understand
and considered to be best material available on SIFT. So this explanation is just a short summary of
this paper)*.
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
### 1. Scale-space Extrema Detection
From the image above, it is obvious that we can't use the same window to detect keypoints with
different scale. It is OK with small corner. But to detect larger corners we need larger windows.
For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with
various \f$\sigma\f$ values. LoG acts as a blob detector which detects blobs in various sizes due to
change in \f$\sigma\f$. In short, \f$\sigma\f$ acts as a scaling parameter. For eg, in the above image,
gaussian kernel with low \f$\sigma\f$ gives high value for small corner while guassian kernel with high
\f$\sigma\f$ fits well for larger corner. So, we can find the local maxima across the scale and space
which gives us a list of \f$(x,y,\sigma)\f$ values which means there is a potential keypoint at (x,y) at
\f$\sigma\f$ scale.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an
approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of
an image with two different \f$\sigma\f$, let it be \f$\sigma\f$ and \f$k\sigma\f$. This process is done for
different octaves of the image in Gaussian Pyramid. It is represented in below image:
![image](images/sift_dog.jpg)
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one
pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels
in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that
keypoint is best represented in that scale. It is shown in below image:
![image](images/sift_local_extrema.jpg)
Regarding different parameters, the paper gives some empirical data which can be summarized as,
number of octaves = 4, number of scale levels = 5, initial \f$\sigma=1.6\f$, \f$k=\sqrt{2}\f$ etc as optimal
values.
### 2. Keypoint Localization
Once potential keypoints locations are found, they have to be refined to get more accurate results.
They used Taylor series expansion of scale space to get more accurate location of extrema, and if
the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is
rejected. This threshold is called **contrastThreshold** in OpenCV
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to
Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the pricipal
curvature. We know from Harris corner detector that for edges, one eigen value is larger than the
other. So here they used a simple function,
If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is
discarded. It is given as 10 in paper.
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest
points.
### 3. Orientation Assignment
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A
neigbourhood is taken around the keypoint location depending on the scale, and the gradient
magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering
360 degrees is created. (It is weighted by gradient magnitude and gaussian-weighted circular window
with \f$\sigma\f$ equal to 1.5 times the scale of keypoint. The highest peak in the histogram is taken
and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints
with same location and scale, but different directions. It contribute to stability of matching.
### 4. Keypoint Descriptor
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is
devided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created.
So a total of 128 bin values are available. It is represented as a vector to form keypoint
descriptor. In addition to this, several measures are taken to achieve robustness against
illumination changes, rotation etc.
### 5. Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases,
the second closest-match may be very near to the first. It may happen due to noise or some other
reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is
greater than 0.8, they are rejected. It eliminaters around 90% of false matches while discards only
5% correct matches, as per the paper.
So this is a summary of SIFT algorithm. For more details and understanding, reading the original
paper is highly recommended. Remember one thing, this algorithm is patented. So this algorithm is
included in Non-free module in OpenCV.
SIFT in OpenCV
--------------
So now let's see SIFT functionalities available in OpenCV. Let's start with keypoint detection and
draw them. First we have to construct a SIFT object. We can pass different parameters to it which
are optional and they are well explained in docs.
@code{.py}
import cv2
import numpy as np
img = cv2.imread('home.jpg')
gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT()
kp = sift.detect(gray,None)
img=cv2.drawKeypoints(gray,kp)
cv2.imwrite('sift_keypoints.jpg',img)
@endcode
**sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to
search only a part of image. Each keypoint is a special structure which has many attributes like its
(x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation,
response that specifies strength of keypoints etc.
OpenCV also provides **cv2.drawKeyPoints()** function which draws the small circles on the locations
of keypoints. If you pass a flag, **cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will
draw a circle with size of keypoint and it will even show its orientation. See below example.
@code{.py}
img=cv2.drawKeypoints(gray,kp,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imwrite('sift_keypoints.jpg',img)
@endcode
See the two results below:
![image](images/sift_keypoints.jpg)
Now to calculate the descriptor, OpenCV provides two methods.
-# Since you already found keypoints, you can call **sift.compute()** which computes the
descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp)
2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the
function, **sift.detectAndCompute()**.
We will see the second method:
@code{.py}
sift = cv2.SIFT()
kp, des = sift.detectAndCompute(gray,None)
@endcode
Here kp will be a list of keypoints and des is a numpy array of shape
\f$Number_of_Keypoints \times 128\f$.
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images.
That we will learn in coming chapters.
Additional Resources
--------------------
Exercises
---------
Introduction to SURF (Speeded-Up Robust Features) {#tutorial_py_surf_intro}
=================================================
Goal
----
In this chapter,
- We will see the basics of SURF
- We will see SURF functionalities in OpenCV
Theory
------
In last chapter, we saw SIFT for keypoint detection and description. But it was comparatively slow
and people needed more speeded-up version. In 2006, three people, Bay, H., Tuytelaars, T. and Van
Gool, L, published another paper, "SURF: Speeded Up Robust Features" which introduced a new
algorithm called SURF. As name suggests, it is a speeded-up version of SIFT.
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding
scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a
demonstration of such an approximation. One big advantage of this approximation is that, convolution
with box filter can be easily calculated with the help of integral images. And it can be done in
parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale
and location.
![image](images/surf_boxfilter.jpg)
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a
neighbourhood of size 6s. Adequate guassian weights are also applied to it. Then they are plotted in
a space as given in below image. The dominant orientation is estimated by calculating the sum of all
responses within a sliding orientation window of angle 60 degrees. Interesting thing is that,
wavelet response can be found out using integral images very easily at any scale. For many
applications, rotation invariance is not required, so no need of finding this orientation, which
speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves
speed and is robust upto \f$\pm 15^{\circ}\f$. OpenCV supports both, depending upon the flag,
**upright**. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it
is more faster.
![image](images/surf_orientation.jpg)
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again,
use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the
keypoint where s is the size. It is divided into 4x4 subregions. For each subregion, horizontal and
vertical wavelet responses are taken and a vector is formed like this,
\f$v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|})\f$. This when represented as a vector gives SURF
feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation
and matching, but provide better distinctiveness of features.
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of
\f$d_x\f$ and \f$|d_x|\f$ are computed separately for \f$d_y < 0\f$ and \f$d_y \geq 0\f$. Similarly, the sums of
\f$d_y\f$ and \f$|d_y|\f$ are split up according to the sign of \f$d_x\f$ , thereby doubling the number of
features. It doesn't add much computation complexity. OpenCV supports both by setting the value of
flag **extended** with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for
underlying interest point. It adds no computation cost since it is already computed during
detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse
situation. In the matching stage, we only compare features if they have the same type of contrast
(as shown in image below). This minimal information allows for faster matching, without reducing the
descriptor's performance.
![image](images/surf_matching.jpg)
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3
times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with
blurring and rotation, but not good at handling viewpoint change and illumination change.
SURF in OpenCV
--------------
OpenCV provides SURF functionalities just like SIFT. You initiate a SURF object with some optional
conditions like 64/128-dim descriptors, Upright/Normal SURF etc. All the details are well explained
in docs. Then as we did in SIFT, we can use SURF.detect(), SURF.compute() etc for finding keypoints
and descriptors.
First we will see a simple demo on how to find SURF keypoints and descriptors and draw it. All
examples are shown in Python terminal since it is just same as SIFT only.
@code{.py}
img = cv2.imread('fly.png',0)
# Create SURF object. You can specify params here or later.
# Here I set Hessian Threshold to 400
surf = cv2.SURF(400)
# Find keypoints and descriptors directly
kp, des = surf.detectAndCompute(img,None)
len(kp)
699
@endcode
1199 keypoints is too much to show in a picture. We reduce it to some 50 to draw it on an image.
While matching, we may need all those features, but not now. So we increase the Hessian Threshold.
@code{.py}
# Check present Hessian threshold
print surf.hessianThreshold
400.0
# We set it to some 50000. Remember, it is just for representing in picture.
# In actual cases, it is better to have a value 300-500
surf.hessianThreshold = 50000
# Again compute keypoints and check its number.
kp, des = surf.detectAndCompute(img,None)
print len(kp)
47
@endcode
It is less than 50. Let's draw it on the image.
@code{.py}
img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4)
plt.imshow(img2),plt.show()
@endcode
See the result below. You can see that SURF is more like a blob detector. It detects the white blobs
on wings of butterfly. You can test it with other images.
![image](images/surf_kp1.jpg)
Now I want to apply U-SURF, so that it won't find the orientation.
@code{.py}
# Check upright flag, if it False, set it to True
print surf.upright
False
surf.upright = True
# Recompute the feature points and draw it
kp = surf.detect(img,None)
img2 = cv2.drawKeypoints(img,kp,None,(255,0,0),4)
plt.imshow(img2),plt.show()
@endcode
See the results below. All the orientations are shown in same direction. It is more faster than
previous. If you are working on cases where orientation is not a problem (like panorama stitching)
etc, this is better.
![image](images/surf_kp2.jpg)
Finally we check the descriptor size and change it to 128 if it is only 64-dim.
@code{.py}
# Find size of descriptor
print surf.descriptorSize()
64
# That means flag, "extended" is False.
surf.extended
False
# So we make it to True to get 128-dim descriptors.
surf.extended = True
kp, des = surf.detectAndCompute(img,None)
print surf.descriptorSize()
128
print des.shape
(47, 128)
@endcode
Remaining part is matching which we will do in another chapter.
Additional Resources
--------------------
Exercises
---------
Feature Detection and Description {#tutorial_py_table_of_contents_feature2d}
=================================
- @subpage tutorial_py_features_meaning
What are the main
features in an image? How can finding those features be useful to us?
- @subpage tutorial_py_features_harris
Okay, Corners are good
features? But how do we find them?
- @subpage tutorial_py_shi_tomasi
We will look into
Shi-Tomasi corner detection
- @subpage tutorial_py_sift_intro
Harris corner detector
is not good enough when scale of image changes. Lowe developed a breakthrough method to find
scale-invariant features and it is called SIFT
- @subpage tutorial_py_surf_intro
SIFT is really good,
but not fast enough, so people came up with a speeded-up version called SURF.
- @subpage tutorial_py_fast
All the above feature
detection methods are good in some way. But they are not fast enough to work in real-time
applications like SLAM. There comes the FAST algorithm, which is really "FAST".
- @subpage tutorial_py_brief
SIFT uses a feature
descriptor with 128 floating point numbers. Consider thousands of such features. It takes lots of
memory and more time for matching. We can compress it to make it faster. But still we have to
calculate it first. There comes BRIEF which gives the shortcut to find binary descriptors with
less memory, faster matching, still higher recognition rate.
- @subpage tutorial_py_orb
SIFT and SURF are good in what they do, but what if you have to pay a few dollars every year to use them in your applications? Yeah, they are patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
- @subpage tutorial_py_matcher
We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
- @subpage tutorial_py_feature_homography
Now we know about feature matching. Let's mix it up with calib3d module to find objects in a complex image.
Drawing Functions in OpenCV {#tutorial_py_drawing_functions}
===========================
Goal
----
- Learn to draw different geometric shapes with OpenCV
- You will learn these functions : **cv2.line()**, **cv2.circle()** , **cv2.rectangle()**,
**cv2.ellipse()**, **cv2.putText()** etc.
Code
----
In all the above functions, you will see some common arguments as given below:
- img : The image where you want to draw the shapes
- color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For
grayscale, just pass the scalar value.
- thickness : Thickness of the line or circle etc. If **-1** is passed for closed figures like
circles, it will fill the shape. *default thickness = 1*
- lineType : Type of line, whether 8-connected, anti-aliased line etc. *By default, it is
8-connected.* cv2.LINE_AA gives anti-aliased line which looks great for curves.
### Drawing Line
To draw a line, you need to pass starting and ending coordinates of line. We will create a black
image and draw a blue line on it from top-left to bottom-right corners.
@code{.py}
import numpy as np
import cv2
# Create a black image
img = np.zeros((512,512,3), np.uint8)
# Draw a diagonal blue line with thickness of 5 px
cv2.line(img,(0,0),(511,511),(255,0,0),5)
@endcode
### Drawing Rectangle
To draw a rectangle, you need top-left corner and bottom-right corner of rectangle. This time we
will draw a green rectangle at the top-right corner of image.
@code{.py}
cv2.rectangle(img,(384,0),(510,128),(0,255,0),3)
@endcode
### Drawing Circle
To draw a circle, you need its center coordinates and radius. We will draw a circle inside the
rectangle drawn above.
@code{.py}
cv2.circle(img,(447,63), 63, (0,0,255), -1)
@endcode
### Drawing Ellipse
To draw the ellipse, we need to pass several arguments. One argument is the center location (x,y).
Next argument is axes lengths (major axis length, minor axis length). angle is the angle of rotation
of ellipse in anti-clockwise direction. startAngle and endAngle denotes the starting and ending of
ellipse arc measured in clockwise direction from major axis. i.e. giving values 0 and 360 gives the
full ellipse. For more details, check the documentation of **cv2.ellipse()**. Below example draws a
half ellipse at the center of the image.
@code{.py}
cv2.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
@endcode
### Drawing Polygon
To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape
ROWSx1x2 where ROWS are number of vertices and it should be of type int32. Here we draw a small
polygon of with four vertices in yellow color.
@code{.py}
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1,1,2))
cv2.polylines(img,[pts],True,(0,255,255))
@endcode
**note**
If third argument is False, you will get a polylines joining all the points, not a closed shape.
**note**
cv2.polylines() can be used to draw multiple lines. Just create a list of all the lines you want
to draw and pass it to the function. All lines will be drawn individually. It is a much better and
faster way to draw a group of lines than calling cv2.line() for each line.
### Adding Text to Images:
To put texts in images, you need specify following things.
- Text data that you want to write
- Position coordinates of where you want put it (i.e. bottom-left corner where data starts).
- Font type (Check **cv2.putText()** docs for supported fonts)
- Font Scale (specifies the size of font)
- regular things like color, thickness, lineType etc. For better look, lineType = cv2.LINE_AA
is recommended.
We will write **OpenCV** on our image in white color.
@code{.py}
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv2.LINE_AA)
@endcode
### Result
So it is time to see the final result of our drawing. As you studied in previous articles, display
the image to see it.
![image](images/drawing.jpg)
Additional Resources
--------------------
-# The angles used in ellipse function is not our circular angles. For more details, visit [this
discussion](http://answers.opencv.org/question/14541/angles-in-ellipse-function/).
Exercises
---------
-# Try to create the logo of OpenCV using drawing functions available in OpenCV.
Getting Started with Images {#tutorial_py_image_display}
===========================
Goals
-----
- Here, you will learn how to read an image, how to display it and how to save it back
- You will learn these functions : **cv2.imread()**, **cv2.imshow()** , **cv2.imwrite()**
- Optionally, you will learn how to display images with Matplotlib
Using OpenCV
------------
### Read an image
Use the function **cv2.imread()** to read an image. The image should be in the working directory or
a full path of image should be given.
Second argument is a flag which specifies the way image should be read.
- cv2.IMREAD_COLOR : Loads a color image. Any transparency of image will be neglected. It is the
default flag.
- cv2.IMREAD_GRAYSCALE : Loads image in grayscale mode
- cv2.IMREAD_UNCHANGED : Loads image as such including alpha channel
@note Instead of these three flags, you can simply pass integers 1, 0 or -1 respectively. See the
code below:
@code{.py}
import numpy as np
import cv2
# Load an color image in grayscale
img = cv2.imread('messi5.jpg',0)
@endcode
**warning**
Even if the image path is wrong, it won't throw any error, but print img will give you None
### Display an image
Use the function **cv2.imshow()** to display an image in a window. The window automatically fits to
the image size.
First argument is a window name which is a string. second argument is our image. You can create as
many windows as you wish, but with different window names.
@code{.py}
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
A screenshot of the window will look like this (in Fedora-Gnome machine):
![image](images/opencv_screenshot.jpg)
**cv2.waitKey()** is a keyboard binding function. Its argument is the time in milliseconds. The
function waits for specified milliseconds for any keyboard event. If you press any key in that time,
the program continues. If **0** is passed, it waits indefinitely for a key stroke. It can also be
set to detect specific key strokes like, if key a is pressed etc which we will discuss below.
@note Besides binding keyboard events this function also processes many other GUI events, so you
MUST use it to actually display the image. **cv2.destroyAllWindows()** simply destroys all the
windows we created. If you want to destroy any specific window, use the function
**cv2.destroyWindow()** where you pass the exact window name as the argument.
@note There is a special case where you can already create a window and load image to it later. In
that case, you can specify whether window is resizable or not. It is done with the function
**cv2.namedWindow()**. By default, the flag is cv2.WINDOW_AUTOSIZE. But if you specify flag to be
cv2.WINDOW_NORMAL, you can resize window. It will be helpful when image is too large in dimension
and adding track bar to windows. See the code below:
@code{.py}
cv2.namedWindow('image', cv2.WINDOW_NORMAL)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
### Write an image
Use the function **cv2.imwrite()** to save an image.
First argument is the file name, second argument is the image you want to save.
@code{.py}
cv2.imwrite('messigray.png',img)
@endcode
This will save the image in PNG format in the working directory.
### Sum it up
Below program loads an image in grayscale, displays it, save the image if you press 's' and exit, or
simply exit without saving if you press ESC key.
@code{.py}
import numpy as np
import cv2
img = cv2.imread('messi5.jpg',0)
cv2.imshow('image',img)
k = cv2.waitKey(0)
if k == 27: # wait for ESC key to exit
cv2.destroyAllWindows()
elif k == ord('s'): # wait for 's' key to save and exit
cv2.imwrite('messigray.png',img)
cv2.destroyAllWindows()
@endcode
**warning**
If you are using a 64-bit machine, you will have to modify k = cv2.waitKey(0) line as follows :
k = cv2.waitKey(0) & 0xFF
Using Matplotlib
----------------
Matplotlib is a plotting library for Python which gives you wide variety of plotting methods. You
will see them in coming articles. Here, you will learn how to display image with Matplotlib. You can
zoom images, save it etc using Matplotlib.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
plt.imshow(img, cmap = 'gray', interpolation = 'bicubic')
plt.xticks([]), plt.yticks([]) # to hide tick values on X and Y axis
plt.show()
@endcode
A screen-shot of the window will look like this :
![image](images/matplotlib_screenshot.jpg)
@sa Plenty of plotting options are available in Matplotlib. Please refer to Matplotlib docs for more
details. Some, we will see on the way. .. warning:: Color image loaded by OpenCV is in BGR mode. But
Matplotlib displays in RGB mode. So color images will not be displayed correctly in Matplotlib if
image is read with OpenCV. Please see the exercises for more details.
Additional Resources
--------------------
-# [Matplotlib Plotting Styles and Features](http://matplotlib.org/api/pyplot_api.html)
Exercises
---------
-# There is some problem when you try to load color image in OpenCV and display it in Matplotlib.
Read [this discussion](http://stackoverflow.com/a/15074748/1134940) and understand it.
Mouse as a Paint-Brush {#tutorial_py_mouse_handling}
======================
Goal
----
- Learn to handle mouse events in OpenCV
- You will learn these functions : **cv2.setMouseCallback()**
Simple Demo
-----------
Here, we create a simple application which draws a circle on an image wherever we double-click on
it.
First we create a mouse callback function which is executed when a mouse event take place. Mouse
event can be anything related to mouse like left-button down, left-button up, left-button
double-click etc. It gives us the coordinates (x,y) for every mouse event. With this event and
location, we can do whatever we like. To list all available events available, run the following code
in Python terminal:
@code{.py}
import cv2
events = [i for i in dir(cv2) if 'EVENT' in i]
print events
@endcode
Creating mouse callback function has a specific format which is same everywhere. It differs only in
what the function does. So our mouse callback function does one thing, it draws a circle where we
double-click. So see the code below. Code is self-explanatory from comments :
@code{.py}
import cv2
import numpy as np
# mouse callback function
def draw_circle(event,x,y,flags,param):
if event == cv2.EVENT_LBUTTONDBLCLK:
cv2.circle(img,(x,y),100,(255,0,0),-1)
# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv2.namedWindow('image')
cv2.setMouseCallback('image',draw_circle)
while(1):
cv2.imshow('image',img)
if cv2.waitKey(20) & 0xFF == 27:
break
cv2.destroyAllWindows()
@endcode
More Advanced Demo
------------------
Now we go for a much better application. In this, we draw either rectangles or circles (depending on
the mode we select) by dragging the mouse like we do in Paint application. So our mouse callback
function has two parts, one to draw rectangle and other to draw the circles. This specific example
will be really helpful in creating and understanding some interactive applications like object
tracking, image segmentation etc.
@code{.py}
import cv2
import numpy as np
drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1
# mouse callback function
def draw_circle(event,x,y,flags,param):
global ix,iy,drawing,mode
if event == cv2.EVENT_LBUTTONDOWN:
drawing = True
ix,iy = x,y
elif event == cv2.EVENT_MOUSEMOVE:
if drawing == True:
if mode == True:
cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv2.circle(img,(x,y),5,(0,0,255),-1)
elif event == cv2.EVENT_LBUTTONUP:
drawing = False
if mode == True:
cv2.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv2.circle(img,(x,y),5,(0,0,255),-1)
@endcode
Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set
a keyboard binding for key 'm' to toggle between rectangle and circle.
@code{.py}
img = np.zeros((512,512,3), np.uint8)
cv2.namedWindow('image')
cv2.setMouseCallback('image',draw_circle)
while(1):
cv2.imshow('image',img)
k = cv2.waitKey(1) & 0xFF
if k == ord('m'):
mode = not mode
elif k == 27:
break
cv2.destroyAllWindows()
@endcode
Additional Resources
--------------------
Exercises
---------
-# In our last example, we drew filled rectangle. You modify the code to draw an unfilled
rectangle.
Gui Features in OpenCV {#tutorial_py_table_of_contents_gui}
======================
- @subpage tutorial_py_image_display
Learn to load an
image, display it and save it back
- @subpage tutorial_py_video_display
Learn to play videos,
capture videos from Camera and write it as a video
- @subpage tutorial_py_drawing_functions
Learn to draw lines,
rectangles, ellipses, circles etc with OpenCV
- @subpage tutorial_py_mouse_handling
Draw stuffs with your
mouse
- @subpage tutorial_py_trackbar
Create trackbar to
control certain parameters
Trackbar as the Color Palette {#tutorial_py_trackbar}
=============================
Goal
----
- Learn to bind trackbar to OpenCV windows
- You will learn these functions : **cv2.getTrackbarPos()**, **cv2.createTrackbar()** etc.
Code Demo
---------
Here we will create a simple application which shows the color you specify. You have a window which
shows the color and three trackbars to specify each of B,G,R colors. You slide the trackbar and
correspondingly window color changes. By default, initial color will be set to Black.
For cv2.getTrackbarPos() function, first argument is the trackbar name, second one is the window
name to which it is attached, third argument is the default value, fourth one is the maximum value
and fifth one is the callback function which is executed everytime trackbar value changes. The
callback function always has a default argument which is the trackbar position. In our case,
function does nothing, so we simply pass.
Another important application of trackbar is to use it as a button or switch. OpenCV, by default,
doesn't have button functionality. So you can use trackbar to get such functionality. In our
application, we have created one switch in which application works only if switch is ON, otherwise
screen is always black.
@code{.py}
import cv2
import numpy as np
def nothing(x):
pass
# Create a black image, a window
img = np.zeros((300,512,3), np.uint8)
cv2.namedWindow('image')
# create trackbars for color change
cv2.createTrackbar('R','image',0,255,nothing)
cv2.createTrackbar('G','image',0,255,nothing)
cv2.createTrackbar('B','image',0,255,nothing)
# create switch for ON/OFF functionality
switch = '0 : OFF \n1 : ON'
cv2.createTrackbar(switch, 'image',0,1,nothing)
while(1):
cv2.imshow('image',img)
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
# get current positions of four trackbars
r = cv2.getTrackbarPos('R','image')
g = cv2.getTrackbarPos('G','image')
b = cv2.getTrackbarPos('B','image')
s = cv2.getTrackbarPos(switch,'image')
if s == 0:
img[:] = 0
else:
img[:] = [b,g,r]
cv2.destroyAllWindows()
@endcode
The screenshot of the application looks like below :
![image](images/trackbar_screenshot.jpg)
Exercises
---------
-# Create a Paint application with adjustable colors and brush radius using trackbars. For drawing,
refer previous tutorial on mouse handling.
Getting Started with Videos {#tutorial_py_video_display}
===========================
Goal
----
- Learn to read video, display video and save video.
- Learn to capture from Camera and display it.
- You will learn these functions : **cv2.VideoCapture()**, **cv2.VideoWriter()**
Capture Video from Camera
-------------------------
Often, we have to capture live stream with camera. OpenCV provides a very simple interface to this.
Let's capture a video from the camera (I am using the in-built webcam of my laptop), convert it into
grayscale video and display it. Just a simple task to get started.
To capture a video, you need to create a **VideoCapture** object. Its argument can be either the
device index or the name of a video file. Device index is just the number to specify which camera.
Normally one camera will be connected (as in my case). So I simply pass 0 (or -1). You can select
the second camera by passing 1 and so on. After that, you can capture frame-by-frame. But at the
end, don't forget to release the capture.
@code{.py}
import numpy as np
import cv2
cap = cv2.VideoCapture(0)
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
# Our operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Display the resulting frame
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
@endcode
cap.read() returns a bool (True/False). If frame is read correctly, it will be True. So you can
check end of the video by checking this return value.
Sometimes, cap may not have initialized the capture. In that case, this code shows error. You can
check whether it is initialized or not by the method **cap.isOpened()**. If it is True, OK.
Otherwise open it using **cap.open()**.
You can also access some of the features of this video using **cap.get(propId)** method where propId
is a number from 0 to 18. Each number denotes a property of the video (if it is applicable to that
video) and full details can be seen here: [Property
Identifier](http://docs.opencv.org/modules/highgui/doc/reading_and_writing_video.html#videocapture-get).
Some of these values can be modified using **cap.set(propId, value)**. Value is the new value you
want.
For example, I can check the frame width and height by cap.get(3) and cap.get(4). It gives me
640x480 by default. But I want to modify it to 320x240. Just use ret = cap.set(3,320) and
ret = cap.set(4,240).
**note**
If you are getting error, make sure camera is working fine using any other camera application
(like Cheese in Linux).
Playing Video from file
-----------------------
It is same as capturing from Camera, just change camera index with video file name. Also while
displaying the frame, use appropriate time for cv2.waitKey(). If it is too less, video will be very
fast and if it is too high, video will be slow (Well, that is how you can display videos in slow
motion). 25 milliseconds will be OK in normal cases.
@code{.py}
import numpy as np
import cv2
cap = cv2.VideoCapture('vtest.avi')
while(cap.isOpened()):
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
@endcode
**note**
Make sure proper versions of ffmpeg or gstreamer is installed. Sometimes, it is a headache to work
with Video Capture mostly due to wrong installation of ffmpeg/gstreamer.
Saving a Video
--------------
So we capture a video, process it frame-by-frame and we want to save that video. For images, it is
very simple, just use cv2.imwrite(). Here a little more work is required.
This time we create a **VideoWriter** object. We should specify the output file name (eg:
output.avi). Then we should specify the **FourCC** code (details in next paragraph). Then number of
frames per second (fps) and frame size should be passed. And last one is **isColor** flag. If it is
True, encoder expect color frame, otherwise it works with grayscale frame.
[FourCC](http://en.wikipedia.org/wiki/FourCC) is a 4-byte code used to specify the video codec. The
list of available codes can be found in [fourcc.org](http://www.fourcc.org/codecs.php). It is
platform dependent. Following codecs works fine for me.
- In Fedora: DIVX, XVID, MJPG, X264, WMV1, WMV2. (XVID is more preferable. MJPG results in high
size video. X264 gives very small size video)
- In Windows: DIVX (More to be tested and added)
- In OSX : *(I don't have access to OSX. Can some one fill this?)*
FourCC code is passed as cv2.VideoWriter_fourcc('M','J','P','G') or
cv2.VideoWriter_fourcc(\*'MJPG) for MJPG.
Below code capture from a Camera, flip every frame in vertical direction and saves it.
@code{.py}
import numpy as np
import cv2
cap = cv2.VideoCapture(0)
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('output.avi',fourcc, 20.0, (640,480))
while(cap.isOpened()):
ret, frame = cap.read()
if ret==True:
frame = cv2.flip(frame,0)
# write the flipped frame
out.write(frame)
cv2.imshow('frame',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
break
# Release everything if job is finished
cap.release()
out.release()
cv2.destroyAllWindows()
@endcode
Additional Resources
--------------------
Exercises
---------
Canny Edge Detection {#tutorial_py_canny}
====================
Goal
----
In this chapter, we will learn about
- Concept of Canny edge detection
- OpenCV functions for that : **cv2.Canny()**
Theory
------
Canny Edge Detection is a popular edge detection algorithm. It was developed by John F. Canny in
1986. It is a multi-stage algorithm and we will go through each stages.
-# **Noise Reduction**
Since edge detection is susceptible to noise in the image, first step is to remove the noise in the
image with a 5x5 Gaussian filter. We have already seen this in previous chapters.
-# **Finding Intensity Gradient of the Image**
Smoothened image is then filtered with a Sobel kernel in both horizontal and vertical direction to
get first derivative in horizontal direction (\f$G_x\f$) and vertical direction (\f$G_y\f$). From these two
images, we can find edge gradient and direction for each pixel as follows:
\f[Edge_Gradient \; (G) = \sqrt{G_x^2 + G_y^2}\f]\f[Angle \; (\theta) = \tan^{-1} \bigg(\frac{G_y}{G_x}\bigg)\f]
Gradient direction is always perpendicular to edges. It is rounded to one of four angles
representing vertical, horizontal and two diagonal directions.
-# **Non-maximum Suppression**
After getting gradient magnitude and direction, a full scan of image is done to remove any unwanted
pixels which may not constitute the edge. For this, at every pixel, pixel is checked if it is a
local maximum in its neighborhood in the direction of gradient. Check the image below:
![image](images/nms.jpg)
Point A is on the edge ( in vertical direction). Gradient direction is normal to the edge. Point B
and C are in gradient directions. So point A is checked with point B and C to see if it forms a
local maximum. If so, it is considered for next stage, otherwise, it is suppressed ( put to zero).
In short, the result you get is a binary image with "thin edges".
-# **Hysteresis Thresholding**
This stage decides which are all edges are really edges and which are not. For this, we need two
threshold values, minVal and maxVal. Any edges with intensity gradient more than maxVal are sure to
be edges and those below minVal are sure to be non-edges, so discarded. Those who lie between these
two thresholds are classified edges or non-edges based on their connectivity. If they are connected
to "sure-edge" pixels, they are considered to be part of edges. Otherwise, they are also discarded.
See the image below:
![image](images/hysteresis.jpg)
The edge A is above the maxVal, so considered as "sure-edge". Although edge C is below maxVal, it is
connected to edge A, so that also considered as valid edge and we get that full curve. But edge B,
although it is above minVal and is in same region as that of edge C, it is not connected to any
"sure-edge", so that is discarded. So it is very important that we have to select minVal and maxVal
accordingly to get the correct result.
This stage also removes small pixels noises on the assumption that edges are long lines.
So what we finally get is strong edges in the image.
Canny Edge Detection in OpenCV
------------------------------
OpenCV puts all the above in single function, **cv2.Canny()**. We will see how to use it. First
argument is our input image. Second and third arguments are our minVal and maxVal respectively.
Third argument is aperture_size. It is the size of Sobel kernel used for find image gradients. By
default it is 3. Last argument is L2gradient which specifies the equation for finding gradient
magnitude. If it is True, it uses the equation mentioned above which is more accurate, otherwise it
uses this function: \f$Edge_Gradient \; (G) = |G_x| + |G_y|\f$. By default, it is False.
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
edges = cv2.Canny(img,100,200)
plt.subplot(121),plt.imshow(img,cmap = 'gray')
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])
plt.show()
@endcode
See the result below:
![image](images/canny1.jpg)
Additional Resources
--------------------
-# Canny edge detector at [Wikipedia](http://en.wikipedia.org/wiki/Canny_edge_detector)
2. [Canny Edge Detection
Tutorial](http://dasl.mem.drexel.edu/alumni/bGreen/www.pages.drexel.edu/_weg22/can_tut.html) by
Bill Green, 2002.
Exercises
---------
-# Write a small application to find the Canny edge detection whose threshold values can be varied
using two trackbars. This way, you can understand the effect of threshold values.
Changing Colorspaces {#tutorial_py_colorspaces}
====================
Goal
----
- In this tutorial, you will learn how to convert images from one color-space to another, like
BGR \f$\leftrightarrow\f$ Gray, BGR \f$\leftrightarrow\f$ HSV etc.
- In addition to that, we will create an application which extracts a colored object in a video
- You will learn following functions : **cv2.cvtColor()**, **cv2.inRange()** etc.
Changing Color-space
--------------------
There are more than 150 color-space conversion methods available in OpenCV. But we will look into
only two which are most widely used ones, BGR \f$\leftrightarrow\f$ Gray and BGR \f$\leftrightarrow\f$ HSV.
For color conversion, we use the function cv2.cvtColor(input_image, flag) where flag determines the
type of conversion.
For BGR \f$\rightarrow\f$ Gray conversion we use the flags cv2.COLOR_BGR2GRAY. Similarly for BGR
\f$\rightarrow\f$ HSV, we use the flag cv2.COLOR_BGR2HSV. To get other flags, just run following
commands in your Python terminal :
@code{.py}
import cv2
flags = [i for i in dir(cv2) if i.startswith('COLOR_')]
print flags
@endcode
@note For HSV, Hue range is [0,179], Saturation range is [0,255] and Value range is [0,255].
Different softwares use different scales. So if you are comparing OpenCV values with them, you need
to normalize these ranges. Object Tracking ==================
Now we know how to convert BGR image to HSV, we can use this to extract a colored object. In HSV, it
is more easier to represent a color than RGB color-space. In our application, we will try to extract
a blue colored object. So here is the method:
- Take each frame of the video
- Convert from BGR to HSV color-space
- We threshold the HSV image for a range of blue color
- Now extract the blue object alone, we can do whatever on that image we want.
Below is the code which are commented in detail :
@code{.py}
import cv2
import numpy as np
cap = cv2.VideoCapture(0)
while(1):
# Take each frame
_, frame = cap.read()
# Convert BGR to HSV
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# define range of blue color in HSV
lower_blue = np.array([110,50,50])
upper_blue = np.array([130,255,255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower_green, upper_green)
# Bitwise-AND mask and original image
res = cv2.bitwise_and(frame,frame, mask= mask)
cv2.imshow('frame',frame)
cv2.imshow('mask',mask)
cv2.imshow('res',res)
k = cv2.waitKey(5) & 0xFF
if k == 27:
break
cv2.destroyAllWindows()
@endcode
Below image shows tracking of the blue object:
![image](images/frame.jpg)
@note There are some noises in the image. We will see how to remove them in later chapters.
@note This is the simplest method in object tracking. Once you learn functions of contours, you can
do plenty of things like find centroid of this object and use it to track the object, draw diagrams
just by moving your hand in front of camera and many other funny stuffs. How to find HSV values to
track? -----------------------------------This is a common question found in
[stackoverflow.com](www.stackoverflow.com). It is very simple and you can use the same function,
cv2.cvtColor(). Instead of passing an image, you just pass the BGR values you want. For example, to
find the HSV value of Green, try following commands in Python terminal:
@code{.py}
green = np.uint8([[[0,255,0 ]]])
hsv_green = cv2.cvtColor(green,cv2.COLOR_BGR2HSV)
print hsv_green
[[[ 60 255 255]]]
@endcode
Now you take [H-10, 100,100] and [H+10, 255, 255] as lower bound and upper bound respectively. Apart
from this method, you can use any image editing tools like GIMP or any online converters to find
these values, but don't forget to adjust the HSV ranges.
Additional Resources
--------------------
Exercises
---------
-# Try to find a way to extract more than one colored objects, for eg, extract red, blue, green
objects simultaneously.
Contour Features {#tutorial_py_contour_features}
================
Goal
----
In this article, we will learn
- To find the different features of contours, like area, perimeter, centroid, bounding box etc
- You will see plenty of functions related to contours.
-# Moments
----------
Image moments help you to calculate some features like center of mass of the object, area of the
object etc. Check out the wikipedia page on [Image
Moments](http://en.wikipedia.org/wiki/Image_moment)
The function **cv2.moments()** gives a dictionary of all moment values calculated. See below:
@code{.py}
import cv2
import numpy as np
img = cv2.imread('star.jpg',0)
ret,thresh = cv2.threshold(img,127,255,0)
contours,hierarchy = cv2.findContours(thresh, 1, 2)
cnt = contours[0]
M = cv2.moments(cnt)
print M
@endcode
From this moments, you can extract useful data like area, centroid etc. Centroid is given by the
relations, \f$C_x = \frac{M_{10}}{M_{00}}\f$ and \f$C_y = \frac{M_{01}}{M_{00}}\f$. This can be done as
follows:
@code{.py}
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
@endcode
2. Contour Area
---------------
Contour area is given by the function **cv2.contourArea()** or from moments, **M['m00']**.
@code{.py}
area = cv2.contourArea(cnt)
@endcode
3. Contour Perimeter
--------------------
It is also called arc length. It can be found out using **cv2.arcLength()** function. Second
argument specify whether shape is a closed contour (if passed True), or just a curve.
@code{.py}
perimeter = cv2.arcLength(cnt,True)
@endcode
4. Contour Approximation
------------------------
It approximates a contour shape to another shape with less number of vertices depending upon the
precision we specify. It is an implementation of [Douglas-Peucker
algorithm](http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm). Check the wikipedia page
for algorithm and demonstration.
To understand this, suppose you are trying to find a square in an image, but due to some problems in
the image, you didn't get a perfect square, but a "bad shape" (As shown in first image below). Now
you can use this function to approximate the shape. In this, second argument is called epsilon,
which is maximum distance from contour to approximated contour. It is an accuracy parameter. A wise
selection of epsilon is needed to get the correct output.
@code{.py}
epsilon = 0.1*cv2.arcLength(cnt,True)
approx = cv2.approxPolyDP(cnt,epsilon,True)
@endcode
Below, in second image, green line shows the approximated curve for epsilon = 10% of arc length.
Third image shows the same for epsilon = 1% of the arc length. Third argument specifies whether
curve is closed or not.
![image](images/approx.jpg)
-# Convex Hull
--------------
Convex Hull will look similar to contour approximation, but it is not (Both may provide same results
in some cases). Here, **cv2.convexHull()** function checks a curve for convexity defects and
corrects it. Generally speaking, convex curves are the curves which are always bulged out, or
at-least flat. And if it is bulged inside, it is called convexity defects. For example, check the
below image of hand. Red line shows the convex hull of hand. The double-sided arrow marks shows the
convexity defects, which are the local maximum deviations of hull from contours.
![image](images/convexitydefects.jpg)
There is a little bit things to discuss about it its syntax:
@code{.py}
hull = cv2.convexHull(points[, hull[, clockwise[, returnPoints]]
@endcode
Arguments details:
- **points** are the contours we pass into.
- **hull** is the output, normally we avoid it.
- **clockwise** : Orientation flag. If it is True, the output convex hull is oriented clockwise.
Otherwise, it is oriented counter-clockwise.
- **returnPoints** : By default, True. Then it returns the coordinates of the hull points. If
False, it returns the indices of contour points corresponding to the hull points.
So to get a convex hull as in above image, following is sufficient:
@code{.py}
hull = cv2.convexHull(cnt)
@endcode
But if you want to find convexity defects, you need to pass returnPoints = False. To understand it,
we will take the rectangle image above. First I found its contour as cnt. Now I found its convex
hull with returnPoints = True, I got following values:
[[[234 202]], [[ 51 202]], [[ 51 79]], [[234 79]]] which are the four corner points of rectangle.
Now if do the same with returnPoints = False, I get following result: [[129],[ 67],[ 0],[142]].
These are the indices of corresponding points in contours. For eg, check the first value:
cnt[129] = [[234, 202]] which is same as first result (and so on for others).
You will see it again when we discuss about convexity defects.
-# Checking Convexity
---------------------
There is a function to check if a curve is convex or not, **cv2.isContourConvex()**. It just return
whether True or False. Not a big deal.
@code{.py}
k = cv2.isContourConvex(cnt)
@endcode
7. Bounding Rectangle
---------------------
There are two types of bounding rectangles.
### 7.a. Straight Bounding Rectangle
It is a straight rectangle, it doesn't consider the rotation of the object. So area of the bounding
rectangle won't be minimum. It is found by the function **cv2.boundingRect()**.
Let (x,y) be the top-left coordinate of the rectangle and (w,h) be its width and height.
@code{.py}
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
@endcode
### 7.b. Rotated Rectangle
Here, bounding rectangle is drawn with minimum area, so it considers the rotation also. The function
used is **cv2.minAreaRect()**. It returns a Box2D structure which contains following detals - (
center (x,y), (width, height), angle of rotation ). But to draw this rectangle, we need 4 corners of
the rectangle. It is obtained by the function **cv2.boxPoints()**
@code{.py}
rect = cv2.minAreaRect(cnt)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(img,[box],0,(0,0,255),2)
@endcode
Both the rectangles are shown in a single image. Green rectangle shows the normal bounding rect. Red
rectangle is the rotated rect.
![image](images/boundingrect.png)
-# Minimum Enclosing Circle
---------------------------
Next we find the circumcircle of an object using the function **cv2.minEnclosingCircle()**. It is a
circle which completely covers the object with minimum area.
@code{.py}
(x,y),radius = cv2.minEnclosingCircle(cnt)
center = (int(x),int(y))
radius = int(radius)
cv2.circle(img,center,radius,(0,255,0),2)
@endcode
![image](images/circumcircle.png)
-# Fitting an Ellipse
---------------------
Next one is to fit an ellipse to an object. It returns the rotated rectangle in which the ellipse is
inscribed.
@code{.py}
ellipse = cv2.fitEllipse(cnt)
cv2.ellipse(img,ellipse,(0,255,0),2)
@endcode
![image](images/fitellipse.png)
-# Fitting a Line
------------------
Similarly we can fit a line to a set of points. Below image contains a set of white points. We can
approximate a straight line to it.
@code{.py}
rows,cols = img.shape[:2]
[vx,vy,x,y] = cv2.fitLine(cnt, cv2.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((cols-x)*vy/vx)+y)
cv2.line(img,(cols-1,righty),(0,lefty),(0,255,0),2)
@endcode
![image](images/fitline.jpg)
Additional Resources
--------------------
Exercises
---------
Contour Properties {#tutorial_py_contour_properties}
==================
Here we will learn to extract some frequently used properties of objects like Solidity, Equivalent
Diameter, Mask image, Mean Intensity etc. More features can be found at [Matlab regionprops
documentation](http://www.mathworks.in/help/images/ref/regionprops.html).
*(NB : Centroid, Area, Perimeter etc also belong to this category, but we have seen it in last
chapter)*
-# Aspect Ratio
---------------
It is the ratio of width to height of bounding rect of the object.
\f[Aspect \; Ratio = \frac{Width}{Height}\f]
@code{.python}
x,y,w,h = cv2.boundingRect(cnt)
aspect_ratio = float(w)/h
@endcode
2. Extent
---------
Extent is the ratio of contour area to bounding rectangle area.
\f[Extent = \frac{Object \; Area}{Bounding \; Rectangle \; Area}\f]
@code{.python}
area = cv2.contourArea(cnt)
x,y,w,h = cv2.boundingRect(cnt)
rect_area = w*h
extent = float(area)/rect_area
@endcode
3. Solidity
-----------
Solidity is the ratio of contour area to its convex hull area.
\f[Solidity = \frac{Contour \; Area}{Convex \; Hull \; Area}\f]
@code{.python}
area = cv2.contourArea(cnt)
hull = cv2.convexHull(cnt)
hull_area = cv2.contourArea(hull)
solidity = float(area)/hull_area
@endcode
4. Equivalent Diameter
----------------------
Equivalent Diameter is the diameter of the circle whose area is same as the contour area.
\f[Equivalent \; Diameter = \sqrt{\frac{4 \times Contour \; Area}{\pi}}\f]
@code{.python}
area = cv2.contourArea(cnt)
equi_diameter = np.sqrt(4*area/np.pi)
@endcode
5. Orientation
--------------
Orientation is the angle at which object is directed. Following method also gives the Major Axis and
Minor Axis lengths.
@code{.py}
(x,y),(MA,ma),angle = cv2.fitEllipse(cnt)
@endcode
6. Mask and Pixel Points
------------------------
In some cases, we may need all the points which comprises that object. It can be done as follows:
@code{.py}
mask = np.zeros(imgray.shape,np.uint8)
cv2.drawContours(mask,[cnt],0,255,-1)
pixelpoints = np.transpose(np.nonzero(mask))
#pixelpoints = cv2.findNonZero(mask)
@endcode
Here, two methods, one using Numpy functions, next one using OpenCV function (last commented line)
are given to do the same. Results are also same, but with a slight difference. Numpy gives
coordinates in **(row, column)** format, while OpenCV gives coordinates in **(x,y)** format. So
basically the answers will be interchanged. Note that, **row = x** and **column = y**.
-# Maximum Value, Minimum Value and their locations
---------------------------------------------------
We can find these parameters using a mask image.
@code{.py}
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(imgray,mask = mask)
@endcode
8. Mean Color or Mean Intensity
-------------------------------
Here, we can find the average color of an object. Or it can be average intensity of the object in
grayscale mode. We again use the same mask to do it.
@code{.py}
mean_val = cv2.mean(im,mask = mask)
@endcode
9. Extreme Points
-----------------
Extreme Points means topmost, bottommost, rightmost and leftmost points of the object.
@code{.py}
leftmost = tuple(cnt[cnt[:,:,0].argmin()][0])
rightmost = tuple(cnt[cnt[:,:,0].argmax()][0])
topmost = tuple(cnt[cnt[:,:,1].argmin()][0])
bottommost = tuple(cnt[cnt[:,:,1].argmax()][0])
@endcode
For eg, if I apply it to an Indian map, I get the following result :
![image](images/extremepoints.jpg)
Additional Resources
--------------------
Exercises
---------
-# There are still some features left in matlab regionprops doc. Try to implement them.
Contours : Getting Started {#tutorial_py_contours_begin}
==========================
Goal
----
- Understand what contours are.
- Learn to find contours, draw contours etc
- You will see these functions : **cv2.findContours()**, **cv2.drawContours()**
What are contours?
------------------
Contours can be explained simply as a curve joining all the continuous points (along the boundary),
having same color or intensity. The contours are a useful tool for shape analysis and object
detection and recognition.
- For better accuracy, use binary images. So before finding contours, apply threshold or canny
edge detection.
- findContours function modifies the source image. So if you want source image even after
finding contours, already store it to some other variables.
- In OpenCV, finding contours is like finding white object from black background. So remember,
object to be found should be white and background should be black.
Let's see how to find contours of a binary image:
@code{.py}
import numpy as np
import cv2
im = cv2.imread('test.jpg')
imgray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(imgray,127,255,0)
contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
@endcode
See, there are three arguments in **cv2.findContours()** function, first one is source image, second
is contour retrieval mode, third is contour approximation method. And it outputs the contours and
hierarchy. contours is a Python list of all the contours in the image. Each individual contour is a
Numpy array of (x,y) coordinates of boundary points of the object.
@note We will discuss second and third arguments and about hierarchy in details later. Until then,
the values given to them in code sample will work fine for all images.
How to draw the contours?
-------------------------
To draw the contours, cv2.drawContours function is used. It can also be used to draw any shape
provided you have its boundary points. Its first argument is source image, second argument is the
contours which should be passed as a Python list, third argument is index of contours (useful when
drawing individual contour. To draw all contours, pass -1) and remaining arguments are color,
thickness etc.
To draw all the contours in an image:
@code{.py}
cv2.drawContours(img, contours, -1, (0,255,0), 3)
@endcode
To draw an individual contour, say 4th contour:
@code{.py}
cv2.drawContours(img, contours, 3, (0,255,0), 3)
@endcode
But most of the time, below method will be useful:
@code{.py}
cnt = contours[4]
cv2.drawContours(img, [cnt], 0, (0,255,0), 3)
@endcode
@note Last two methods are same, but when you go forward, you will see last one is more useful.
Contour Approximation Method ================================
This is the third argument in cv2.findContours function. What does it denote actually?
Above, we told that contours are the boundaries of a shape with same intensity. It stores the (x,y)
coordinates of the boundary of a shape. But does it store all the coordinates ? That is specified by
this contour approximation method.
If you pass cv2.CHAIN_APPROX_NONE, all the boundary points are stored. But actually do we need all
the points? For eg, you found the contour of a straight line. Do you need all the points on the line
to represent that line? No, we need just two end points of that line. This is what
cv2.CHAIN_APPROX_SIMPLE does. It removes all redundant points and compresses the contour, thereby
saving memory.
Below image of a rectangle demonstrate this technique. Just draw a circle on all the coordinates in
the contour array (drawn in blue color). First image shows points I got with cv2.CHAIN_APPROX_NONE
(734 points) and second image shows the one with cv2.CHAIN_APPROX_SIMPLE (only 4 points). See, how
much memory it saves!!!
![image](images/none.jpg)
Additional Resources
--------------------
Exercises
---------
Contours Hierarchy {#tutorial_py_contours_hierarchy}
==================
Goal
----
This time, we learn about the hierarchy of contours, i.e. the parent-child relationship in Contours.
Theory
------
In the last few articles on contours, we have worked with several functions related to contours
provided by OpenCV. But when we found the contours in image using **cv2.findContours()** function,
we have passed an argument, **Contour Retrieval Mode**. We usually passed **cv2.RETR_LIST** or
**cv2.RETR_TREE** and it worked nice. But what does it actually mean ?
Also, in the output, we got three arrays, first is the image, second is our contours, and one more
output which we named as **hierarchy** (Please checkout the codes in previous articles). But we
never used this hierarchy anywhere. Then what is this hierarchy and what is it for ? What is its
relationship with the previous mentioned function argument ?
That is what we are going to deal in this article.
### What is Hierarchy?
Normally we use the **cv2.findContours()** function to detect objects in an image, right ? Sometimes
objects are in different locations. But in some cases, some shapes are inside other shapes. Just
like nested figures. In this case, we call outer one as **parent** and inner one as **child**. This
way, contours in an image has some relationship to each other. And we can specify how one contour is
connected to each other, like, is it child of some other contour, or is it a parent etc.
Representation of this relationship is called the **Hierarchy**.
Consider an example image below :
![image](images/hierarchy.png)
In this image, there are a few shapes which I have numbered from **0-5**. *2 and 2a* denotes the
external and internal contours of the outermost box.
Here, contours 0,1,2 are **external or outermost**. We can say, they are in **hierarchy-0** or
simply they are in **same hierarchy level**.
Next comes **contour-2a**. It can be considered as a **child of contour-2** (or in opposite way,
contour-2 is parent of contour-2a). So let it be in **hierarchy-1**. Similarly contour-3 is child of
contour-2 and it comes in next hierarchy. Finally contours 4,5 are the children of contour-3a, and
they come in the last hierarchy level. From the way I numbered the boxes, I would say contour-4 is
the first child of contour-3a (It can be contour-5 also).
I mentioned these things to understand terms like **same hierarchy level**, **external contour**,
**child contour**, **parent contour**, **first child** etc. Now let's get into OpenCV.
### Hierarchy Representation in OpenCV
So each contour has its own information regarding what hierarchy it is, who is its child, who is its
parent etc. OpenCV represents it as an array of four values : **[Next, Previous, First_Child,
Parent]**
For eg, take contour-0 in our picture. Who is next contour in its same level ? It is contour-1. So
simply put Next = 1. Similarly for Contour-1, next is contour-2. So Next = 2.
What about contour-2? There is no next contour in the same level. So simply, put Next = -1. What
about contour-4? It is in same level with contour-5. So its next contour is contour-5, so Next = 5.
It is same as above. Previous contour of contour-1 is contour-0 in the same level. Similarly for
contour-2, it is contour-1. And for contour-0, there is no previous, so put it as -1.
There is no need of any explanation. For contour-2, child is contour-2a. So it gets the
corresponding index value of contour-2a. What about contour-3a? It has two children. But we take
only first child. And it is contour-4. So First_Child = 4 for contour-3a.
It is just opposite of **First_Child**. Both for contour-4 and contour-5, parent contour is
contour-3a. For contour-3a, it is contour-3 and so on.
@note If there is no child or parent, that field is taken as -1 So now we know about the hierarchy
style used in OpenCV, we can check into Contour Retrieval Modes in OpenCV with the help of same
image given above. ie what do flags like cv2.RETR_LIST, cv2.RETR_TREE, cv2.RETR_CCOMP,
cv2.RETR_EXTERNAL etc mean?
Contour Retrieval Mode
----------------------
### 1. RETR_LIST
This is the simplest of the four flags (from explanation point of view). It simply retrieves all the
contours, but doesn't create any parent-child relationship. **Parents and kids are equal under this
rule, and they are just contours**. ie they all belongs to same hierarchy level.
So here, 3rd and 4th term in hierarchy array is always -1. But obviously, Next and Previous terms
will have their corresponding values. Just check it yourself and verify it.
Below is the result I got, and each row is hierarchy details of corresponding contour. For eg, first
row corresponds to contour 0. Next contour is contour 1. So Next = 1. There is no previous contour,
so Previous = 0. And the remaining two, as told before, it is -1.
@code{.py}
hierarchy
array([[[ 1, -1, -1, -1],
[ 2, 0, -1, -1],
[ 3, 1, -1, -1],
[ 4, 2, -1, -1],
[ 5, 3, -1, -1],
[ 6, 4, -1, -1],
[ 7, 5, -1, -1],
[-1, 6, -1, -1]]])
@endcode
This is the good choice to use in your code, if you are not using any hierarchy features.
### 2. RETR_EXTERNAL
If you use this flag, it returns only extreme outer flags. All child contours are left behind. **We
can say, under this law, Only the eldest in every family is taken care of. It doesn't care about
other members of the family :)**.
So, in our image, how many extreme outer contours are there? ie at hierarchy-0 level?. Only 3, ie
contours 0,1,2, right? Now try to find the contours using this flag. Here also, values given to each
element is same as above. Compare it with above result. Below is what I got :
@code{.py}
hierarchy
array([[[ 1, -1, -1, -1],
[ 2, 0, -1, -1],
[-1, 1, -1, -1]]])
@endcode
You can use this flag if you want to extract only the outer contours. It might be useful in some
cases.
### 3. RETR_CCOMP
This flag retrieves all the contours and arranges them to a 2-level hierarchy. ie external contours
of the object (ie its boundary) are placed in hierarchy-1. And the contours of holes inside object
(if any) is placed in hierarchy-2. If any object inside it, its contour is placed again in
hierarchy-1 only. And its hole in hierarchy-2 and so on.
Just consider the image of a "big white zero" on a black background. Outer circle of zero belongs to
first hierarchy, and inner circle of zero belongs to second hierarchy.
We can explain it with a simple image. Here I have labelled the order of contours in red color and
the hierarchy they belongs to, in green color (either 1 or 2). The order is same as the order OpenCV
detects contours.
![image](images/ccomp_hierarchy.png)
So consider first contour, ie contour-0. It is hierarchy-1. It has two holes, contours 1&2, and they
belong to hierarchy-2. So for contour-0, Next contour in same hierarchy level is contour-3. And
there is no previous one. And its first is child is contour-1 in hierarchy-2. It has no parent,
because it is in hierarchy-1. So its hierarchy array is [3,-1,1,-1]
Now take contour-1. It is in hierarchy-2. Next one in same hierarchy (under the parenthood of
contour-1) is contour-2. No previous one. No child, but parent is contour-0. So array is
[2,-1,-1,0].
Similarly contour-2 : It is in hierarchy-2. There is not next contour in same hierarchy under
contour-0. So no Next. Previous is contour-1. No child, parent is contour-0. So array is
[-1,1,-1,0].
Contour - 3 : Next in hierarchy-1 is contour-5. Previous is contour-0. Child is contour-4 and no
parent. So array is [5,0,4,-1].
Contour - 4 : It is in hierarchy 2 under contour-3 and it has no sibling. So no next, no previous,
no child, parent is contour-3. So array is [-1,-1,-1,3].
Remaining you can fill up. This is the final answer I got:
@code{.py}
hierarchy
array([[[ 3, -1, 1, -1],
[ 2, -1, -1, 0],
[-1, 1, -1, 0],
[ 5, 0, 4, -1],
[-1, -1, -1, 3],
[ 7, 3, 6, -1],
[-1, -1, -1, 5],
[ 8, 5, -1, -1],
[-1, 7, -1, -1]]])
@endcode
### 4. RETR_TREE
And this is the final guy, Mr.Perfect. It retrieves all the contours and creates a full family
hierarchy list. **It even tells, who is the grandpa, father, son, grandson and even beyond... :)**.
For examle, I took above image, rewrite the code for cv2.RETR_TREE, reorder the contours as per the
result given by OpenCV and analyze it. Again, red letters give the contour number and green letters
give the hierarchy order.
![image](images/tree_hierarchy.png)
Take contour-0 : It is in hierarchy-0. Next contour in same hierarchy is contour-7. No previous
contours. Child is contour-1. And no parent. So array is [7,-1,1,-1].
Take contour-2 : It is in hierarchy-1. No contour in same level. No previous one. Child is
contour-2. Parent is contour-0. So array is [-1,-1,2,0].
And remaining, try yourself. Below is the full answer:
@code{.py}
hierarchy
array([[[ 7, -1, 1, -1],
[-1, -1, 2, 0],
[-1, -1, 3, 1],
[-1, -1, 4, 2],
[-1, -1, 5, 3],
[ 6, -1, -1, 4],
[-1, 5, -1, 4],
[ 8, 0, -1, -1],
[-1, 7, -1, -1]]])
@endcode
Additional Resources
--------------------
Exercises
---------
Contours : More Functions {#tutorial_py_contours_more_functions}
=========================
Goal
----
In this chapter, we will learn about
- Convexity defects and how to find them.
- Finding shortest distance from a point to a polygon
- Matching different shapes
Theory and Code
---------------
### 1. Convexity Defects
We saw what is convex hull in second chapter about contours. Any deviation of the object from this
hull can be considered as convexity defect.
OpenCV comes with a ready-made function to find this, **cv2.convexityDefects()**. A basic function
call would look like below:
@code{.py}
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
@endcode
@note Remember we have to pass returnPoints = False while finding convex hull, in order to find
convexity defects. It returns an array where each row contains these values - **[ start point, end
point, farthest point, approximate distance to farthest point ]**. We can visualize it using an
image. We draw a line joining start point and end point, then draw a circle at the farthest point.
Remember first three values returned are indices of cnt. So we have to bring those values from cnt.
@code{.py}
import cv2
import numpy as np
img = cv2.imread('star.jpg')
img_gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(img_gray, 127, 255,0)
contours,hierarchy = cv2.findContours(thresh,2,1)
cnt = contours[0]
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
for i in range(defects.shape[0]):
s,e,f,d = defects[i,0]
start = tuple(cnt[s][0])
end = tuple(cnt[e][0])
far = tuple(cnt[f][0])
cv2.line(img,start,end,[0,255,0],2)
cv2.circle(img,far,5,[0,0,255],-1)
cv2.imshow('img',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
And see the result:
![image](images/defects.jpg)
### 2. Point Polygon Test
This function finds the shortest distance between a point in the image and a contour. It returns the
distance which is negative when point is outside the contour, positive when point is inside and zero
if point is on the contour.
For example, we can check the point (50,50) as follows:
@code{.py}
dist = cv2.pointPolygonTest(cnt,(50,50),True)
@endcode
In the function, third argument is measureDist. If it is True, it finds the signed distance. If
False, it finds whether the point is inside or outside or on the contour (it returns +1, -1, 0
respectively).
@note If you don't want to find the distance, make sure third argument is False, because, it is a
time consuming process. So, making it False gives about 2-3X speedup. 3. Match
Shapes -----------------
OpenCV comes with a function **cv2.matchShapes()** which enables us to compare two shapes, or two
contours and returns a metric showing the similarity. The lower the result, the better match it is.
It is calculated based on the hu-moment values. Different measurement methods are explained in the
docs.
@code{.py}
import cv2
import numpy as np
img1 = cv2.imread('star.jpg',0)
img2 = cv2.imread('star2.jpg',0)
ret, thresh = cv2.threshold(img1, 127, 255,0)
ret, thresh2 = cv2.threshold(img2, 127, 255,0)
contours,hierarchy = cv2.findContours(thresh,2,1)
cnt1 = contours[0]
contours,hierarchy = cv2.findContours(thresh2,2,1)
cnt2 = contours[0]
ret = cv2.matchShapes(cnt1,cnt2,1,0.0)
print ret
@endcode
I tried matching shapes with different shapes given below:
![image](images/matchshapes.jpg)
I got following results:
- Matching Image A with itself = 0.0
- Matching Image A with Image B = 0.001946
- Matching Image A with Image C = 0.326911
See, even image rotation doesn't affect much on this comparison.
@sa [Hu-Moments](http://en.wikipedia.org/wiki/Image_moment#Rotation_invariant_moments) are seven
moments invariant to translation, rotation and scale. Seventh one is skew-invariant. Those values
can be found using **cv2.HuMoments()** function. Additional Resources =====================
Exercises
---------
-# Check the documentation for **cv2.pointPolygonTest()**, you can find a nice image in Red and
Blue color. It represents the distance from all pixels to the white curve on it. All pixels
inside curve is blue depending on the distance. Similarly outside points are red. Contour edges
are marked with White. So problem is simple. Write a code to create such a representation of
distance.
2. Compare images of digits or letters using **cv2.matchShapes()**. ( That would be a simple step
towards OCR )
Contours in OpenCV {#tutorial_py_table_of_contents_contours}
==================
- @subpage tutorial_py_contours_begin
Learn to find and draw Contours
- @subpage tutorial_py_contour_features
Learn
to find different features of contours like area, perimeter, bounding rectangle etc.
- @subpage tutorial_py_contour_properties
Learn
to find different properties of contours like Solidity, Mean Intensity etc.
- @subpage tutorial_py_contours_more_functions
Learn
to find convexity defects, pointPolygonTest, match different shapes etc.
- @subpage tutorial_py_contours_hierarchy
Learn
about Contour Hierarchy
Smoothing Images {#tutorial_py_filtering}
================
Goals
-----
Learn to:
- Blur the images with various low pass filters
- Apply custom-made filters to images (2D convolution)
2D Convolution ( Image Filtering )
----------------------------------
As in one-dimensional signals, images also can be filtered with various low-pass filters(LPF),
high-pass filters(HPF) etc. LPF helps in removing noises, blurring the images etc. HPF filters helps
in finding edges in the images.
OpenCV provides a function **cv2.filter2D()** to convolve a kernel with an image. As an example, we
will try an averaging filter on an image. A 5x5 averaging filter kernel will look like below:
\f[K = \frac{1}{25} \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix}\f]
Operation is like this: keep this kernel above a pixel, add all the 25 pixels below this kernel,
take its average and replace the central pixel with the new average value. It continues this
operation for all the pixels in the image. Try this code and check the result:
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('opencv_logo.png')
kernel = np.ones((5,5),np.float32)/25
dst = cv2.filter2D(img,-1,kernel)
plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(dst),plt.title('Averaging')
plt.xticks([]), plt.yticks([])
plt.show()
@endcode
Result:
![image](images/filter.jpg)
Image Blurring (Image Smoothing)
--------------------------------
Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for
removing noises. It actually removes high frequency content (eg: noise, edges) from the image. So
edges are blurred a little bit in this operation. (Well, there are blurring techniques which doesn't
blur the edges too). OpenCV provides mainly four types of blurring techniques.
### 1. Averaging
This is done by convolving image with a normalized box filter. It simply takes the average of all
the pixels under kernel area and replace the central element. This is done by the function
**cv2.blur()** or **cv2.boxFilter()**. Check the docs for more details about the kernel. We should
specify the width and height of kernel. A 3x3 normalized box filter would look like below:
\f[K = \frac{1}{9} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}\f]
@note If you don't want to use normalized box filter, use **cv2.boxFilter()**. Pass an argument
normalize=False to the function. Check a sample demo below with a kernel of 5x5 size:
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('opencv_logo.png')
blur = cv2.blur(img,(5,5))
plt.subplot(121),plt.imshow(img),plt.title('Original')
plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(blur),plt.title('Blurred')
plt.xticks([]), plt.yticks([])
plt.show()
@endcode
Result:
![image](images/blur.jpg)
### 2. Gaussian Blurring
In this, instead of box filter, gaussian kernel is used. It is done with the function,
**cv2.GaussianBlur()**. We should specify the width and height of kernel which should be positive
and odd. We also should specify the standard deviation in X and Y direction, sigmaX and sigmaY
respectively. If only sigmaX is specified, sigmaY is taken as same as sigmaX. If both are given as
zeros, they are calculated from kernel size. Gaussian blurring is highly effective in removing
gaussian noise from the image.
If you want, you can create a Gaussian kernel with the function, **cv2.getGaussianKernel()**.
The above code can be modified for Gaussian blurring:
@code{.py}
blur = cv2.GaussianBlur(img,(5,5),0)
@endcode
Result:
![image](images/gaussian.jpg)
### 3. Median Blurring
Here, the function **cv2.medianBlur()** takes median of all the pixels under kernel area and central
element is replaced with this median value. This is highly effective against salt-and-pepper noise
in the images. Interesting thing is that, in the above filters, central element is a newly
calculated value which may be a pixel value in the image or a new value. But in median blurring,
central element is always replaced by some pixel value in the image. It reduces the noise
effectively. Its kernel size should be a positive odd integer.
In this demo, I added a 50% noise to our original image and applied median blur. Check the result:
@code{.py}
median = cv2.medianBlur(img,5)
@endcode
Result:
![image](images/median.jpg)
### 4. Bilateral Filtering
**cv2.bilateralFilter()** is highly effective in noise removal while keeping edges sharp. But the
operation is slower compared to other filters. We already saw that gaussian filter takes the a
neighbourhood around the pixel and find its gaussian weighted average. This gaussian filter is a
function of space alone, that is, nearby pixels are considered while filtering. It doesn't consider
whether pixels have almost same intensity. It doesn't consider whether pixel is an edge pixel or
not. So it blurs the edges also, which we don't want to do.
Bilateral filter also takes a gaussian filter in space, but one more gaussian filter which is a
function of pixel difference. Gaussian function of space make sure only nearby pixels are considered
for blurring while gaussian function of intensity difference make sure only those pixels with
similar intensity to central pixel is considered for blurring. So it preserves the edges since
pixels at edges will have large intensity variation.
Below samples shows use bilateral filter (For details on arguments, visit docs).
@code{.py}
blur = cv2.bilateralFilter(img,9,75,75)
@endcode
Result:
![image](images/bilateral.jpg)
See, the texture on the surface is gone, but edges are still preserved.
Additional Resources
--------------------
-# Details about the [bilateral filtering](http://people.csail.mit.edu/sparis/bf_course/)
Exercises
---------
Geometric Transformations of Images {#tutorial_py_geometric_transformations}
===================================
Goals
-----
- Learn to apply different geometric transformation to images like translation, rotation, affine
transformation etc.
- You will see these functions: **cv2.getPerspectiveTransform**
Transformations
---------------
OpenCV provides two transformation functions, **cv2.warpAffine** and **cv2.warpPerspective**, with
which you can have all kinds of transformations. **cv2.warpAffine** takes a 2x3 transformation
matrix while **cv2.warpPerspective** takes a 3x3 transformation matrix as input.
### Scaling
Scaling is just resizing of the image. OpenCV comes with a function **cv2.resize()** for this
purpose. The size of the image can be specified manually, or you can specify the scaling factor.
Different interpolation methods are used. Preferable interpolation methods are **cv2.INTER_AREA**
for shrinking and **cv2.INTER_CUBIC** (slow) & **cv2.INTER_LINEAR** for zooming. By default,
interpolation method used is **cv2.INTER_LINEAR** for all resizing purposes. You can resize an
input image either of following methods:
@code{.py}
import cv2
import numpy as np
img = cv2.imread('messi5.jpg')
res = cv2.resize(img,None,fx=2, fy=2, interpolation = cv2.INTER_CUBIC)
#OR
height, width = img.shape[:2]
res = cv2.resize(img,(2*width, 2*height), interpolation = cv2.INTER_CUBIC)
@endcode
### Translation
Translation is the shifting of object's location. If you know the shift in (x,y) direction, let it
be \f$(t_x,t_y)\f$, you can create the transformation matrix \f$\textbf{M}\f$ as follows:
\f[M = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \end{bmatrix}\f]
You can take make it into a Numpy array of type np.float32 and pass it into **cv2.warpAffine()**
function. See below example for a shift of (100,50):
@code{.py}
import cv2
import numpy as np
img = cv2.imread('messi5.jpg',0)
rows,cols = img.shape
M = np.float32([[1,0,100],[0,1,50]])
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imshow('img',dst)
cv2.waitKey(0)
cv2.destroyAllWindows()
@endcode
**warning**
Third argument of the **cv2.warpAffine()** function is the size of the output image, which should
be in the form of **(width, height)**. Remember width = number of columns, and height = number of
rows.
See the result below:
![image](images/translation.jpg)
### Rotation
Rotation of an image for an angle \f$\theta\f$ is achieved by the transformation matrix of the form
\f[M = \begin{bmatrix} cos\theta & -sin\theta \\ sin\theta & cos\theta \end{bmatrix}\f]
But OpenCV provides scaled rotation with adjustable center of rotation so that you can rotate at any
location you prefer. Modified transformation matrix is given by
\f[\begin{bmatrix} \alpha & \beta & (1- \alpha ) \cdot center.x - \beta \cdot center.y \\ - \beta & \alpha & \beta \cdot center.x + (1- \alpha ) \cdot center.y \end{bmatrix}\f]
where:
\f[\begin{array}{l} \alpha = scale \cdot \cos \theta , \\ \beta = scale \cdot \sin \theta \end{array}\f]
To find this transformation matrix, OpenCV provides a function, **cv2.getRotationMatrix2D**. Check
below example which rotates the image by 90 degree with respect to center without any scaling.
@code{.py}
img = cv2.imread('messi5.jpg',0)
rows,cols = img.shape
M = cv2.getRotationMatrix2D((cols/2,rows/2),90,1)
dst = cv2.warpAffine(img,M,(cols,rows))
@endcode
See the result:
![image](images/rotation.jpg)
### Affine Transformation
In affine transformation, all parallel lines in the original image will still be parallel in the
output image. To find the transformation matrix, we need three points from input image and their
corresponding locations in output image. Then **cv2.getAffineTransform** will create a 2x3 matrix
which is to be passed to **cv2.warpAffine**.
Check below example, and also look at the points I selected (which are marked in Green color):
@code{.py}
img = cv2.imread('drawing.png')
rows,cols,ch = img.shape
pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])
M = cv2.getAffineTransform(pts1,pts2)
dst = cv2.warpAffine(img,M,(cols,rows))
plt.subplot(121),plt.imshow(img),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
@endcode
See the result:
![image](images/affine.jpg)
### Perspective Transformation
For perspective transformation, you need a 3x3 transformation matrix. Straight lines will remain
straight even after the transformation. To find this transformation matrix, you need 4 points on the
input image and corresponding points on the output image. Among these 4 points, 3 of them should not
be collinear. Then transformation matrix can be found by the function
**cv2.getPerspectiveTransform**. Then apply **cv2.warpPerspective** with this 3x3 transformation
matrix.
See the code below:
@code{.py}
img = cv2.imread('sudokusmall.png')
rows,cols,ch = img.shape
pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
M = cv2.getPerspectiveTransform(pts1,pts2)
dst = cv2.warpPerspective(img,M,(300,300))
plt.subplot(121),plt.imshow(img),plt.title('Input')
plt.subplot(122),plt.imshow(dst),plt.title('Output')
plt.show()
@endcode
Result:
![image](images/perspective.jpg)
Additional Resources
--------------------
-# "Computer Vision: Algorithms and Applications", Richard Szeliski
Exercises
---------
Interactive Foreground Extraction using GrabCut Algorithm {#tutorial_py_grabcut}
=========================================================
Goal
----
In this chapter
- We will see GrabCut algorithm to extract foreground in images
- We will create an interactive application for this.
Theory
------
GrabCut algorithm was designed by Carsten Rother, Vladimir Kolmogorov & Andrew Blake from Microsoft
Research Cambridge, UK. in their paper, ["GrabCut": interactive foreground extraction using iterated
graph cuts](http://dl.acm.org/citation.cfm?id=1015720) . An algorithm was needed for foreground
extraction with minimal user interaction, and the result was GrabCut.
How it works from user point of view ? Initially user draws a rectangle around the foreground region
(foreground region shoule be completely inside the rectangle). Then algorithm segments it
iteratively to get the best result. Done. But in some cases, the segmentation won't be fine, like,
it may have marked some foreground region as background and vice versa. In that case, user need to
do fine touch-ups. Just give some strokes on the images where some faulty results are there. Strokes
basically says *"Hey, this region should be foreground, you marked it background, correct it in next
iteration"* or its opposite for background. Then in the next iteration, you get better results.
See the image below. First player and football is enclosed in a blue rectangle. Then some final
touchups with white strokes (denoting foreground) and black strokes (denoting background) is made.
And we get a nice result.
![image](images/grabcut_output1.jpg)
So what happens in background ?
- User inputs the rectangle. Everything outside this rectangle will be taken as sure background
(That is the reason it is mentioned before that your rectangle should include all the
objects). Everything inside rectangle is unknown. Similarly any user input specifying
foreground and background are considered as hard-labelling which means they won't change in
the process.
- Computer does an initial labelling depeding on the data we gave. It labels the foreground and
background pixels (or it hard-labels)
- Now a Gaussian Mixture Model(GMM) is used to model the foreground and background.
- Depending on the data we gave, GMM learns and create new pixel distribution. That is, the
unknown pixels are labelled either probable foreground or probable background depending on its
relation with the other hard-labelled pixels in terms of color statistics (It is just like
clustering).
- A graph is built from this pixel distribution. Nodes in the graphs are pixels. Additional two
nodes are added, **Source node** and **Sink node**. Every foreground pixel is connected to
Source node and every background pixel is connected to Sink node.
- The weights of edges connecting pixels to source node/end node are defined by the probability
of a pixel being foreground/background. The weights between the pixels are defined by the edge
information or pixel similarity. If there is a large difference in pixel color, the edge
between them will get a low weight.
- Then a mincut algorithm is used to segment the graph. It cuts the graph into two separating
source node and sink node with minimum cost function. The cost function is the sum of all
weights of the edges that are cut. After the cut, all the pixels connected to Source node
become foreground and those connected to Sink node become background.
- The process is continued until the classification converges.
It is illustrated in below image (Image Courtesy: <http://www.cs.ru.ac.za/research/g02m1682/>)
![image](images/grabcut.jpg)
Demo
----
Now we go for grabcut algorithm with OpenCV. OpenCV has the function, **cv2.grabCut()** for this. We
will see its arguments first:
- *img* - Input image
- *mask* - It is a mask image where we specify which areas are background, foreground or
probable background/foreground etc. It is done by the following flags, **cv2.GC_BGD,
cv2.GC_FGD, cv2.GC_PR_BGD, cv2.GC_PR_FGD**, or simply pass 0,1,2,3 to image.
- *rect* - It is the coordinates of a rectangle which includes the foreground object in the
format (x,y,w,h)
- *bdgModel*, *fgdModel* - These are arrays used by the algorithm internally. You just create
two np.float64 type zero arrays of size (1,65).
- *iterCount* - Number of iterations the algorithm should run.
- *mode* - It should be **cv2.GC_INIT_WITH_RECT** or **cv2.GC_INIT_WITH_MASK** or combined
which decides whether we are drawing rectangle or final touchup strokes.
First let's see with rectangular mode. We load the image, create a similar mask image. We create
*fgdModel* and *bgdModel*. We give the rectangle parameters. It's all straight-forward. Let the
algorithm run for 5 iterations. Mode should be *cv2.GC_INIT_WITH_RECT* since we are using
rectangle. Then run the grabcut. It modifies the mask image. In the new mask image, pixels will be
marked with four flags denoting background/foreground as specified above. So we modify the mask such
that all 0-pixels and 2-pixels are put to 0 (ie background) and all 1-pixels and 3-pixels are put to
1(ie foreground pixels). Now our final mask is ready. Just multiply it with input image to get the
segmented image.
@code{.py}
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg')
mask = np.zeros(img.shape[:2],np.uint8)
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (50,50,450,290)
cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]
plt.imshow(img),plt.colorbar(),plt.show()
@endcode
See the results below:
![image](images/grabcut_rect.jpg)
Oops, Messi's hair is gone. *Who likes Messi without his hair?* We need to bring it back. So we will
give there a fine touchup with 1-pixel (sure foreground). At the same time, Some part of ground has
come to picture which we don't want, and also some logo. We need to remove them. There we give some
0-pixel touchup (sure background). So we modify our resulting mask in previous case as we told now.
*What I actually did is that, I opened input image in paint application and added another layer to
the image. Using brush tool in the paint, I marked missed foreground (hair, shoes, ball etc) with
white and unwanted background (like logo, ground etc) with black on this new layer. Then filled
remaining background with gray. Then loaded that mask image in OpenCV, edited original mask image we
got with corresponding values in newly added mask image. Check the code below:*
@code{.py}
# newmask is the mask image I manually labelled
newmask = cv2.imread('newmask.png',0)
# whereever it is marked white (sure foreground), change mask=1
# whereever it is marked black (sure background), change mask=0
mask[newmask == 0] = 0
mask[newmask == 255] = 1
mask, bgdModel, fgdModel = cv2.grabCut(img,mask,None,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_MASK)
mask = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask[:,:,np.newaxis]
plt.imshow(img),plt.colorbar(),plt.show()
@endcode
See the result below:
![image](images/grabcut_mask.jpg)
So that's it. Here instead of initializing in rect mode, you can directly go into mask mode. Just
mark the rectangle area in mask image with 2-pixel or 3-pixel (probable background/foreground). Then
mark our sure_foreground with 1-pixel as we did in second example. Then directly apply the grabCut
function with mask mode.
Additional Resources
--------------------
Exercises
---------
-# OpenCV samples contain a sample grabcut.py which is an interactive tool using grabcut. Check it.
Also watch this [youtube video](http://www.youtube.com/watch?v=kAwxLTDDAwU) on how to use it.
2. Here, you can make this into a interactive sample with drawing rectangle and strokes with mouse,
create trackbar to adjust stroke width etc.
Image Gradients {#tutorial_py_gradients}
===============
Goal
----
In this chapter, we will learn to:
- Find Image gradients, edges etc
- We will see following functions : **cv2.Sobel()**, **cv2.Scharr()**, **cv2.Laplacian()** etc
Theory
------
OpenCV provides three types of gradient filters or High-pass filters, Sobel, Scharr and Laplacian.
We will see each one of them.
### 1. Sobel and Scharr Derivatives
Sobel operators is a joint Gausssian smoothing plus differentiation operation, so it is more
resistant to noise. You can specify the direction of derivatives to be taken, vertical or horizontal
(by the arguments, yorder and xorder respectively). You can also specify the size of kernel by the
argument ksize. If ksize = -1, a 3x3 Scharr filter is used which gives better results than 3x3 Sobel
filter. Please see the docs for kernels used.
### 2. Laplacian Derivatives
It calculates the Laplacian of the image given by the relation,
\f$\Delta src = \frac{\partial ^2{src}}{\partial x^2} + \frac{\partial ^2{src}}{\partial y^2}\f$ where
each derivative is found using Sobel derivatives. If ksize = 1, then following kernel is used for
filtering:
\f[kernel = \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}\f]
Code
----
Below code shows all operators in a single diagram. All kernels are of 5x5 size. Depth of output
image is passed -1 to get the result in np.uint8 type.
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('dave.jpg',0)
laplacian = cv2.Laplacian(img,cv2.CV_64F)
sobelx = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5)
sobely = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=5)
plt.subplot(2,2,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,2),plt.imshow(laplacian,cmap = 'gray')
plt.title('Laplacian'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,3),plt.imshow(sobelx,cmap = 'gray')
plt.title('Sobel X'), plt.xticks([]), plt.yticks([])
plt.subplot(2,2,4),plt.imshow(sobely,cmap = 'gray')
plt.title('Sobel Y'), plt.xticks([]), plt.yticks([])
plt.show()
@endcode
Result:
![image](images/gradients.jpg)
One Important Matter!
---------------------
In our last example, output datatype is cv2.CV_8U or np.uint8. But there is a slight problem with
that. Black-to-White transition is taken as Positive slope (it has a positive value) while
White-to-Black transition is taken as a Negative slope (It has negative value). So when you convert
data to np.uint8, all negative slopes are made zero. In simple words, you miss that edge.
If you want to detect both edges, better option is to keep the output datatype to some higher forms,
like cv2.CV_16S, cv2.CV_64F etc, take its absolute value and then convert back to cv2.CV_8U.
Below code demonstrates this procedure for a horizontal Sobel filter and difference in results.
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('box.png',0)
# Output dtype = cv2.CV_8U
sobelx8u = cv2.Sobel(img,cv2.CV_8U,1,0,ksize=5)
# Output dtype = cv2.CV_64F. Then take its absolute and convert to cv2.CV_8U
sobelx64f = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5)
abs_sobel64f = np.absolute(sobelx64f)
sobel_8u = np.uint8(abs_sobel64f)
plt.subplot(1,3,1),plt.imshow(img,cmap = 'gray')
plt.title('Original'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,2),plt.imshow(sobelx8u,cmap = 'gray')
plt.title('Sobel CV_8U'), plt.xticks([]), plt.yticks([])
plt.subplot(1,3,3),plt.imshow(sobel_8u,cmap = 'gray')
plt.title('Sobel abs(CV_64F)'), plt.xticks([]), plt.yticks([])
plt.show()
@endcode
Check the result below:
![image](images/double_edge.jpg)
Additional Resources
--------------------
Exercises
---------
Histograms - 3 : 2D Histograms {#tutorial_py_2d_histogram}
==============================
Goal
----
In this chapter, we will learn to find and plot 2D histograms. It will be helpful in coming
chapters.
Introduction
------------
In the first article, we calculated and plotted one-dimensional histogram. It is called
one-dimensional because we are taking only one feature into our consideration, ie grayscale
intensity value of the pixel. But in two-dimensional histograms, you consider two features. Normally
it is used for finding color histograms where two features are Hue & Saturation values of every
pixel.
There is a [python sample in the official
samples](https://github.com/Itseez/opencv/blob/master/samples/python2/color_histogram.py) already
for finding color histograms. We will try to understand how to create such a color histogram, and it
will be useful in understanding further topics like Histogram Back-Projection.
2D Histogram in OpenCV
----------------------
It is quite simple and calculated using the same function, **cv2.calcHist()**. For color histograms,
we need to convert the image from BGR to HSV. (Remember, for 1D histogram, we converted from BGR to
Grayscale). For 2D histograms, its parameters will be modified as follows:
- **channels = [0,1]** *because we need to process both H and S plane.*
- **bins = [180,256]** *180 for H plane and 256 for S plane.*
- **range = [0,180,0,256]** *Hue value lies between 0 and 180 & Saturation lies between 0 and
256.*
Now check the code below:
@code{.py}
import cv2
import numpy as np
img = cv2.imread('home.jpg')
hsv = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
hist = cv2.calcHist([hsv], [0, 1], None, [180, 256], [0, 180, 0, 256])
@endcode
That's it.
2D Histogram in Numpy
---------------------
Numpy also provides a specific function for this : **np.histogram2d()**. (Remember, for 1D histogram
we used **np.histogram()** ).
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('home.jpg')
hsv = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
hist, xbins, ybins = np.histogram2d(h.ravel(),s.ravel(),[180,256],[[0,180],[0,256]])
@endcode
First argument is H plane, second one is the S plane, third is number of bins for each and fourth is
their range.
Now we can check how to plot this color histogram.
Plotting 2D Histograms
----------------------
### Method - 1 : Using cv2.imshow()
The result we get is a two dimensional array of size 180x256. So we can show them as we do normally,
using cv2.imshow() function. It will be a grayscale image and it won't give much idea what colors
are there, unless you know the Hue values of different colors.
### Method - 2 : Using Matplotlib
We can use **matplotlib.pyplot.imshow()** function to plot 2D histogram with different color maps.
It gives us a much better idea about the different pixel density. But this also, doesn't gives us
idea what color is there on a first look, unless you know the Hue values of different colors. Still
I prefer this method. It is simple and better.
@note While using this function, remember, interpolation flag should be nearest for better results.
Consider code:
@code{.py}
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('home.jpg')
hsv = cv2.cvtColor(img,cv2.COLOR_BGR2HSV)
hist = cv2.calcHist( [hsv], [0, 1], None, [180, 256], [0, 180, 0, 256] )
plt.imshow(hist,interpolation = 'nearest')
plt.show()
@endcode
Below is the input image and its color histogram plot. X axis shows S values and Y axis shows Hue.
![image](images/2dhist_matplotlib.jpg)
In histogram, you can see some high values near H = 100 and S = 200. It corresponds to blue of sky.
Similarly another peak can be seen near H = 25 and S = 100. It corresponds to yellow of the palace.
You can verify it with any image editing tools like GIMP.
### Method 3 : OpenCV sample style !!
There is a [sample code for color-histogram in OpenCV-Python2
samples](https://github.com/Itseez/opencv/blob/master/samples/python2/color_histogram.py). If you
run the code, you can see the histogram shows the corresponding color also. Or simply it outputs a
color coded histogram. Its result is very good (although you need to add extra bunch of lines).
In that code, the author created a color map in HSV. Then converted it into BGR. The resulting
histogram image is multiplied with this color map. He also uses some preprocessing steps to remove
small isolated pixels, resulting in a good histogram.
I leave it to the readers to run the code, analyze it and have your own hack arounds. Below is the
output of that code for the same image as above:
![image](images/2dhist_opencv.jpg)
You can clearly see in the histogram what colors are present, blue is there, yellow is there, and
some white due to chessboard is there. Nice !!!
Additional Resources
--------------------
Exercises
---------
Histograms in OpenCV {#tutorial_py_table_of_contents_histograms}
====================
- @subpage tutorial_py_histogram_begins
Learn to find and draw Contours
- @subpage tutorial_py_histogram_equalization
Learn to Equalize Histograms to get better contrast for images
- @subpage tutorial_py_2d_histogram
Learn to find and plot 2D Histograms
- @subpage tutorial_py_histogram_backprojection
Learn histogram backprojection to segment colored objects
This diff is collapsed.
Image Transforms in OpenCV {#tutorial_py_table_of_contents_transforms}
==========================
- @subpage tutorial_py_fourier_transform
Learn to find the Fourier Transform of images
This diff is collapsed.
K-Means Clustering {#tutorial_py_kmeans_index}
==================
- @subpage tutorial_py_kmeans_understanding
Read to get an intuitive understanding of K-Means Clustering
- @subpage tutorial_py_kmeans_opencv
Now let's try K-Means functions in OpenCV
K-Nearest Neighbour {#tutorial_py_knn_index}
===================
- @subpage tutorial_py_knn_understanding
Get a basic understanding of what kNN is
- @subpage tutorial_py_knn_opencv
Now let's use kNN in OpenCV for digit recognition OCR
Support Vector Machines (SVM) {#tutorial_py_svm_index}
=============================
- @subpage tutorial_py_svm_basics
Get a basic understanding of what SVM is
- @subpage tutorial_py_svm_opencv
Let's use SVM functionalities in OpenCV
Object Detection {#tutorial_py_table_of_contents_objdetect}
================
- @subpage tutorial_py_face_detection
Face detection
using haar-cascades
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment