Commit 41a5a5ea authored by Kv Manohar's avatar Kv Manohar Committed by Alexander Alekhin

Merge pull request #1253 from kvmanohar22:GSoC17_dnn_objdetect

GSoC'17 Learning compact models for object detection (#1253)

* Final solver and model for SqueezeNet model

* update README

* update dependencies and CMakeLists

* add global pooling

* Add training scripts

* fix typo

* fix dependency of caffe

* fix whitespace

* Add squeezedet architecture

* Pascal pre process script

* Adding pre process scripts

* Generate the graph of the model

* more readable

* fix some bugs in the graph

* Post process class implementation

* Complete minimal post processing and standalone running

* Complete the base class

* remove c++11 features and fix bugs

* Complete example

* fix bugs

* Adding final scripts

* Classification scripts

* Update README.md

* Add example code and results

* Update README.md

* Re-order and fix some bugs

* fix build failure

* Document classes and functions

* Add instructions on how to use samples

* update instructionos

* fix docs failure

* fix conversion types

* fix type conversion warning

* Change examples to sample directoryu

* restructure directories

* add more references

* fix whitespace

* retain aspect ratio

* Add more examples

* fix docs warnings

* update with links to trained weights

* threshold update

* png -> jpg

* fix tutorial

* model files

* precomp.hpp , fix readme links, module dependencies

* copyrights

- no copyright in samples
- use new style OpenCV copyright header
- precomp.hpp
parent c0b298c5
......@@ -22,6 +22,8 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r
- **datasets**: Datasets Reader -- Code for reading existing computer vision databases and samples of using the readers to train, test and run using that dataset's data.
- **dnn_objdetect**: Object Detection using CNNs -- Implements compact CNN Model for object detection. Trained using Caffe but uses opencv_dnn modeule.
- **dnns_easily_fooled**: Subvert DNNs -- This code can use the activations in a network to fool the networks into recognizing something else.
- **dpm**: Deformable Part Model -- Felzenszwalb's Cascade with deformable parts object recognition code.
......
set(the_description "Object Detection using CNNs")
ocv_define_module(dnn_objdetect opencv_core opencv_imgproc opencv_dnn
OPTIONAL opencv_highgui opencv_imgcodecs # samples
)
# Object Detection using Convolutional Neural Networks
This module uses Convolutional Neural Networks for detecting objects in an image
## Dependencies
- opencv dnn module
- Google Protobuf
## Building this module
Run the following command to build this module:
```make
cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_objdetect=ON <opencv_source_dir>
```
## Models
There are two models which are trained.
#### SqueezeNet model trained for Image Classification.
- This model was trained for 1500000 iterations with a batch size of 16
- Size of Model: 4.9MB
- Top-1 Accuracy on ImageNet 2012 DataSet: 56.10%
- Top-5 Accuracy on ImageNet 2012 DataSet: 79.54%
- Link to trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeNet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
#### SqueezeDet model trained for Object Detection
- This model was trained for 180000 iterations with a batch size of 16
- Size of the Model: 14.2MB
- Link to the trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeDet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
## Usage
#### With Caffe
For details pertaining to the usage of the model, have a look at [this repository](https://github.com/kvmanohar22/caffe)
You can infact train your own object detection models with the loss function which is implemented.
#### Without Caffe, using `opencv's dnn module`
`tutorials/core_detect.cpp` gives an example of how to use the model to predict the bounding boxes.
`tutorials/image_classification.cpp` gives an example of how to use the model to classify an image.
Here's the brief summary of examples. For detailed usage and testing, refer `tutorials` directory.
## Examples:
### Image Classification
```c++
// Read the net along with it's trained weights
cv::dnn::net = cv::dnn::readNetFromCaffe(model_defn, model_weights);
// Read an image
cv::Mat image = cv::imread(image_file);
// Convert the image into blob
cv::Mat image_blob = cv::net::blobFromImage(image);
// Get the output of "predictions" layer
cv::Mat probs = net.forward("predictions");
```
`probs` is a 4-d tensor of shape `[1, 1000, 1, 1]` which is obtained after the application of `softmax` activation.
### Object Detection
```c++
// Reading the network and weights, converting image to blob is same as Image Classification example.
// Forward through the network and collect blob data
cv::Mat delta_bboxs = net.forward("slice")[0];
cv::Mat conf_scores = net.forward("softmax");
cv::Mat class_scores = net.forward("sigmoid");
```
Three blobs aka `delta_bbox`, `conf_scores`, `class_scores` are post-processed in `cv::dnn_objdetect::InferBbox` class and the bounding boxes predicted.
```c++
InferBbox infer(delta_bbox, class_scores, conf_scores);
infer.filter();
```
`infer.filter()` returns vector of `cv::dnn_objdetect::object` of predictions. Here `cv::dnn_objdetect::object` is a structure containing the following elements.
```c++
typedef struct {
int xmin, xmax;
int ymin, ymax;
int class_idx;
std::string label_name;
double class_prob;
} object;
```
For further details on post-processing refer this detailed [blog-post](https://kvmanohar22.github.io/GSoC/).
## Results from Object Detection
Refer `tutorials` directory for results.
@article{SqueezeNet,
Author = {Forrest N. Iandola and Song Han and Matthew W. Moskewicz and Khalid Ashraf and William J. Dally and Kurt Keutzer},
Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size},
Journal = {arXiv:1602.07360},
Year = {2016}
}
@inproceedings{squeezedet,
Author = {Bichen Wu and Forrest Iandola and Peter H. Jin and Kurt Keutzer},
Title = {SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving},
Journal = {arXiv:1612.01051},
Year = {2016}
}
@inproceedings{imagenet_cvpr09,
AUTHOR = {Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and Li, K. and Fei-Fei, L.},
TITLE = {{ImageNet: A Large-Scale Hierarchical Image Database}},
BOOKTITLE = {CVPR09},
YEAR = {2009},
BIBSOURCE = "http://www.image-net.org/papers/imagenet_cvpr09.bib"}
@Article{Everingham10,
author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
title = "The Pascal Visual Object Classes (VOC) Challenge",
journal = "International Journal of Computer Vision",
volume = "88",
year = "2010",
number = "2",
month = jun,
pages = "303--338",
}
\ No newline at end of file
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
#define _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
#include <vector>
#include <memory>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
/** @defgroup dnn_objdetect DNN used for object detection
*/
namespace cv
{
namespace dnn_objdetect
{
//! @addtogroup dnn_objdetect
//! @{
/** @brief Structure to hold the details pertaining to a single bounding box
*/
typedef struct
{
int xmin, xmax;
int ymin, ymax;
size_t class_idx;
std::string label_name;
double class_prob;
} object;
/** @brief A class to post process model predictions
*/
class CV_EXPORTS InferBbox
{
public:
/** @brief Default constructer
@param _delta_bbox Blob containing relative coordinates of bounding boxes
@param _class_scores Blob containing the probability values of each class
@param _conf_scores Blob containing the confidence scores
*/
InferBbox(Mat _delta_bbox, Mat _class_scores, Mat _conf_scores);
/** @brief Filters the bounding boxes.
*/
void filter(double thresh = 0.8);
/** @brief Vector which holds the final detections of the model
*/
std::vector<object> detections;
protected:
/** @brief Transform relative coordinates from ConvDet to bounding box coordinates
@param bboxes Vector to hold the predicted bounding boxes
*/
void transform_bboxes(std::vector<std::vector<double> > *bboxes);
/** @brief Computes final probability values of each bounding box
@param final_probs Vector to hold the probability values
*/
void final_probability_dist(std::vector<std::vector<double> > *final_probs);
/** @brief Transform bounding boxes from [x, y, h, w] to [xmin, ymin, xmax, ymax]
@param pre Vector conatining initial co-ordinates
@param post Vector containing the transformed co-ordinates
*/
void transform_bboxes_inv(std::vector<std::vector<double> > *pre,
std::vector<std::vector<double> > *post);
/** @brief Ensures that the bounding box values are within image boundaries
@param min_max_boxes Vector containing bounding boxes of the form [xmin, ymin, xmax, ymax]
*/
void assert_predictions(std::vector<std::vector<double> > *min_max_boxes);
/** @brief Filter top `n` predictions
@param probs Final probability values of bounding boxes
@param boxes Predicted bounding box co-ordinates
@param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
@param top_n_idxs Containes class indices of top `n` bounding boxes
@param top_n_probs Contains probability values of top `n` bounding boxes
*/
void filter_top_n(std::vector<std::vector<double> > *probs,
std::vector<std::vector<double> > *boxes,
std::vector<std::vector<double> > &top_n_boxes,
std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs);
/** @brief Wrapper to apply Non-Maximal Supression
@param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
@param top_n_idxs Containes class indices of top `n` bounding boxes
@param top_n_probs Contains probability values of top `n` bounding boxes
*/
void nms_wrapper(std::vector<std::vector<double> > &top_n_boxes,
std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs);
/** @brief Applies Non-Maximal Supression
@param boxes Bounding box co-ordinates belonging to one class
@param probs Probability values of boxes belonging to one class
*/
std::vector<bool> non_maximal_suppression(std::vector<std::vector<double> >
*boxes, std::vector<double> *probs);
/** @brief Computes intersection over union of bounding boxes
@param boxes Vector of bounding box co-ordinates
@param base_box Base box wrt which IOU is calculated
@param iou Vector to store IOU values
*/
void intersection_over_union(std::vector<std::vector<double> > *boxes,
std::vector<double> *base_box, std::vector<double> *iou);
static inline bool comparator (std::pair<double, size_t> l1,
std::pair<double, size_t> l2)
{
return l1.first > l2.first;
}
private:
Mat delta_bbox;
Mat class_scores;
Mat conf_scores;
unsigned int image_width;
unsigned int image_height;
unsigned int W, H;
std::vector<std::vector<double> > anchors_values;
std::vector<std::pair<double, double> > anchor_center;
std::vector<std::pair<double, double> > anchor_shapes;
std::vector<std::string> label_map;
unsigned int num_classes;
unsigned int anchors_per_grid;
size_t anchors;
double intersection_thresh;
double nms_intersection_thresh;
size_t n_top_detections;
double epsilon;
};
//! @}
} // namespace dnn_objdetect
} // namespace cv
#endif
# Object Detection using Convolutional Neural Networks
- These files include model weights, model definition files, model deploy files for two trained networks.
### Network 1
- SqueezeNet model trained on ImageNet 2012 Dataset
### Network 2
- SqueezeDet model trained on PASCAL VOC Dataset
This diff is collapsed.
# Training and Testing protocol for Object Detection
base_lr: 0.000001
display: 1
max_iter: 100000
lr_policy: "step"
gamma: 0.5
stepsize: 100000
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
snapshot_prefix: "snapshot"
solver_mode: GPU
net: "SqueezeDet_train_test.prototxt"
This diff is collapsed.
# Solver for SqueezeNet Model
test_iter: 1000
test_interval: 1000
base_lr: 0.03
display: 1
max_iter: 1500000
lr_policy: "step"
gamma: 0.5
stepsize: 100000
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
snapshot_prefix: "snapshot"
solver_mode: GPU
net: "SqueezeNet_train_test.prototxt"
random_seed: 42
average_loss: 80
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/dnn.hpp>
#include <iostream>
#include <cstdlib>
int main(int argc, char **argv)
{
if (argc < 4)
{
std::cerr << "Usage " << argv[0] << ": "
<< "<model-definition-file> " << " "
<< "<model-weights-file> " << " "
<< "<test-image>\n";
return -1;
}
cv::String model_prototxt = argv[1];
cv::String model_binary = argv[2];
cv::String test_image = argv[3];
cv::dnn::Net net = cv::dnn::readNetFromCaffe(model_prototxt, model_binary);
if (net.empty())
{
std::cerr << "Couldn't load the model !\n";
return -2;
}
cv::Mat img = cv::imread(test_image);
if (img.empty())
{
std::cerr << "Couldn't load image: " << test_image << "\n";
return -3;
}
cv::Mat input_blob = cv::dnn::blobFromImage(
img, 1.0, cv::Size(416, 416), cv::Scalar(104, 117, 123), false);
cv::Mat prob;
cv::TickMeter t;
net.setInput(input_blob);
t.start();
prob = net.forward("predictions");
t.stop();
int prob_size[3] = {1000, 1, 1};
cv::Mat prob_data(3, prob_size, CV_32F, prob.ptr<float>(0));
double max_prob = -1.0;
int class_idx = -1;
for (int idx = 0; idx < prob.size[1]; ++idx)
{
double current_prob = prob_data.at<float>(idx, 0, 0);
if (current_prob > max_prob)
{
max_prob = current_prob;
class_idx = idx;
}
}
std::cout << "Best class Index: " << class_idx << "\n";
std::cout << "Time taken: " << t.getTimeSec() << "\n";
std::cout << "Probability: " << max_prob * 100.0<< "\n";
return 0;
}
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <opencv2/core_detect.hpp>
using namespace cv;
using namespace std;
using namespace cv::dnn;
using namespace cv::dnn_objdetect;
int main(int argc, char **argv)
{
if (argc < 4)
{
std::cerr << "Usage " << argv[0] << ": "
<< "<model-definition-file> "
<< "<model-weights-file> "
<< "<test-image> "
<< "<threshold>(optional)\n";
return -1;
}
std::string model_prototxt = argv[1];
std::string model_binary = argv[2];
std::string test_input_image = argv[3];
double threshold = 0.7;
if (argc == 5)
{
threshold = atof(argv[4]);
if (threshold > 1.0 || threshold < 0.0)
{
std::cerr << "Threshold should belong to [0, 1]\n";
return -1;
}
}
// Load the network
std::cout << "Loading the network...\n";
Net net = dnn::readNetFromCaffe(model_prototxt, model_binary);
if (net.empty())
{
std::cerr << "Couldn't load the model !\n";
return -2;
}
else
{
std::cout << "Done loading the network !\n\n";
}
// Load the test image
Mat img = cv::imread(test_input_image);
Mat original_img(img);
if (img.empty())
{
std::cerr << "Couldn't load image: " << test_input_image << "\n";
return -3;
}
cv::namedWindow("Initial Image", WINDOW_AUTOSIZE);
cv::imshow("Initial Image", img);
cv::resize(img, img, cv::Size(416, 416));
Mat img_copy(img);
img.convertTo(img, CV_32FC3);
Mat input_blob = blobFromImage(img, 1.0, Size(), cv::Scalar(104, 117, 123), false);
// Set the input blob
// Set the output layers
std::cout << "Getting the output of all the three blobs...\n";
std::vector<Mat> outblobs(3);
std::vector<cv::String> out_layers;
out_layers.push_back("slice");
out_layers.push_back("softmax");
out_layers.push_back("sigmoid");
// Bbox delta blob
std::vector<Mat> temp_blob;
net.setInput(input_blob);
cv::TickMeter t;
t.start();
net.forward(temp_blob, out_layers[0]);
t.stop();
outblobs[0] = temp_blob[2];
// class_scores blob
net.setInput(input_blob);
t.start();
outblobs[1] = net.forward(out_layers[1]);
t.stop();
// conf_scores blob
net.setInput(input_blob);
t.start();
outblobs[2] = net.forward(out_layers[2]);
t.stop();
// Check that the blobs are valid
for (size_t i = 0; i < outblobs.size(); ++i)
{
if (outblobs[i].empty())
{
std::cerr << "Blob: " << i << " is empty !\n";
}
}
int delta_bbox_size[3] = {23, 23, 36};
Mat delta_bbox(3, delta_bbox_size, CV_32F, outblobs[0].ptr<float>());
int class_scores_size[2] = {4761, 20};
Mat class_scores(2, class_scores_size, CV_32F, outblobs[1].ptr<float>());
int conf_scores_size[3] = {23, 23, 9};
Mat conf_scores(3, conf_scores_size, CV_32F, outblobs[2].ptr<float>());
InferBbox inf(delta_bbox, class_scores, conf_scores);
inf.filter(threshold);
double average_time = t.getTimeSec() / t.getCounter();
std::cout << "\nTotal objects detected: " << inf.detections.size()
<< " in " << average_time << " seconds\n";
std::cout << "------\n";
float x_ratio = (float)original_img.cols / img_copy.cols;
float y_ratio = (float)original_img.rows / img_copy.rows;
for (size_t i = 0; i < inf.detections.size(); ++i)
{
int xmin = inf.detections[i].xmin;
int ymin = inf.detections[i].ymin;
int xmax = inf.detections[i].xmax;
int ymax = inf.detections[i].ymax;
cv::String class_name = inf.detections[i].label_name;
std::cout << "Class: " << class_name << "\n"
<< "Probability: " << inf.detections[i].class_prob << "\n"
<< "Co-ordinates: " << inf.detections[i].xmin << " "
<< inf.detections[i].ymin << " "
<< inf.detections[i].xmax << " "
<< inf.detections[i].ymax << "\n";
std::cout << "------\n";
// Draw the corresponding bounding box(s)
cv::rectangle(original_img, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
cv::Point((int)(xmax * x_ratio), (int)(ymax * y_ratio)), cv::Scalar(255, 0, 0), 2);
cv::putText(original_img, class_name, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255, 0, 0), 1);
}
try
{
cv::namedWindow("Final Detections", WINDOW_AUTOSIZE);
cv::imshow("Final Detections", original_img);
cv::imwrite("image.png", original_img);
cv::waitKey(0);
}
catch (const char* msg)
{
std::cerr << msg << "\n";
return -4;
}
return 0;
}
import argparse
import sys
import os
import time
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
def k_means(K, data, max_iter, n_jobs, image_file):
X = np.array(data)
np.random.shuffle(X)
begin = time.time()
print 'Running kmeans'
kmeans = KMeans(n_clusters=K, max_iter=max_iter, n_jobs=n_jobs, verbose=1).fit(X)
print 'K-Means took {} seconds to complete'.format(time.time()-begin)
step_size = 0.2
xmin, xmax = X[:, 0].min()-1, X[:, 0].max()+1
ymin, ymax = X[:, 1].min()-1, X[:, 1].max()+1
xx, yy = np.meshgrid(np.arange(xmin, xmax, step_size), np.arange(ymin, ymax, step_size))
preds = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
preds = preds.reshape(xx.shape)
plt.figure()
plt.clf()
plt.imshow(preds, interpolation='nearest', extent=(xx.min(), xx.max(), yy.min(), yy.max()), cmap=plt.cm.Paired, aspect='auto', origin='lower')
plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=169, linewidths=5, color='r', zorder=10)
plt.title("Anchor shapes generated using K-Means")
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
print 'Mean centroids are:'
for i, center in enumerate(centroids):
print '{}: {}, {}'.format(i, center[0], center[1])
# plt.xticks(())
# plt.yticks(())
plt.show()
def pre_process(directory, data_list):
if not os.path.exists(directory):
print "Path {} doesn't exist".format(directory)
return
files = os.listdir(directory)
print 'Loading data...'
for i, f in enumerate(files):
# Progress bar
sys.stdout.write('\r')
percentage = (i+1.0) / len(files)
progress = int(percentage * 30)
bar = [progress*'=', ' '*(29-progress), percentage*100]
sys.stdout.write('[{}>{}] {:.0f}%'.format(*bar))
sys.stdout.flush()
with open(directory+"/"+f, 'r') as ann:
l = ann.readline()
l = l.rstrip()
l = l.split(' ')
l = [float(i) for i in l]
if len(l) % 5 != 0:
sys.stderr.write('File {} contains incorrect number of annotations'.format(f))
return
num_objs = len(l) / 5
for obj in range(num_objs):
xmin = l[obj * 5 + 0]
ymin = l[obj * 5 + 1]
xmax = l[obj * 5 + 2]
ymax = l[obj * 5 + 3]
w = xmax - xmin
h = ymax - ymin
data_list.append([w, h])
if w > 1000 or h > 1000:
sys.stdout.write("[{}, {}]".format(w, h))
sys.stdout.write('\nProcessed {} files containing {} objects'.format(len(files), len(data_list)))
return data_list
def main():
parser = argparse.ArgumentParser("Parse hyperparameters")
parser.add_argument("clusters", help="Number of clusters", type=int)
parser.add_argument("dir", help="Directory containing annotations")
parser.add_argument("image_file", help="File to generate the final cluster of image")
parser.add_argument('-jobs', help="Number of jobs for parallel computation", default=1)
parser.add_argument('-iter', help="Max Iterations to run algorithm for", default=1000)
p = parser.parse_args(sys.argv[1:])
K = p.clusters
directory = p.dir
data_list = []
pre_process(directory, data_list )
sys.stdout.write('\nDone collecting data\n')
k_means(K, data_list, int(p.iter), int(p.jobs), p.image_file)
print 'Done !'
if __name__=='__main__':
try:
main()
except Exception as E:
print E
from skimage import io, transform
from multiprocessing.dummy import Pool as ThreadPool
def rescale(root_new, root_old, img_path, ann_path, out_shape):
try:
img = io.imread(root_old+"/"+img_path)
except Exception as E:
print E
h, w, _ = img.shape
f_h, f_w = float(out_shape)/h, float(out_shape)/w
trans_img = transform.rescale(img, (f_h, f_w))
num_objs = 0
with open(root_old+"/"+ann_path, 'r') as f:
ann = f.readline()
ann = ann.rstrip()
ann = ann.split(' ')
ann = [float(i) for i in ann]
num_objs = len(ann) / 5
for idx in xrange(num_objs):
ann[idx * 5 + 0] = int(f_w * ann[idx * 5 + 0])
ann[idx * 5 + 1] = int(f_h * ann[idx * 5 + 1])
ann[idx * 5 + 2] = int(f_w * ann[idx * 5 + 2])
ann[idx * 5 + 3] = int(f_h * ann[idx * 5 + 3])
# Write the new annotations to file
with open(root_new+"/"+ann_path, 'w') as f_new:
for val in ann:
f_new.write(str(val)+' ')
# Save the new image
io.imwrite(root_new+"/"+img_path, trans_img)
def preprocess():
source = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize/source.txt'
root_old = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012'
root_new = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize'
out_shape = 416
with open(source, 'r') as src:
lines = src.readlines()
print 'Processing {} images and annotations'.format(len(lines))
for line in lines:
line = line.rstrip()
line = line.split(' ')
img_path = line[0]
ann_path = line[1]
rescale(root_new, root_old, img_path, ann_path, out_shape)
if __name__ == '__main__':
preprocess()
This diff is collapsed.
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
#define __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
#include <iostream>
#include <vector>
#include <memory>
#include <string>
#include <map>
#include <numeric>
#include <algorithm>
#include "opencv2/core.hpp"
#include "opencv2/dnn.hpp"
#endif // __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
Object Detection using CNNs {#tutorial_dnn_objdetect}
===========================
# Building
Build samples of "dnn_objectect" module. Refer to OpenCV build tutorials for details.
Enable `BUILD_EXAMPLES=ON` CMake option and build these targets (Linux):
- example_dnn_objdetect_image_classification
- example_dnn_objdetect_obj_detect
Download the weights file and model definition file from `opencv_extra/dnn_objdetect`
# Object Detection
```bash
example_dnn_objdetect_obj_detect <model-definition-file> <model-weights-file> <test-image>
```
All the following examples were run on a laptop with `Intel(R) Core(TM)2 i3-4005U CPU @ 1.70GHz` (without GPU).
The model is incredibly fast taking just `0.172091` seconds on an average to predict multiple bounding boxes.
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/aeroplane.jpg
Total objects detected: 1 in 0.168792 seconds
------
Class: aeroplane
Probability: 0.845181
Co-ordinates: 41 116 415 254
------
```
![Train_Dets](images/aero_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/bus.jpg
Total objects detected: 1 in 0.201276 seconds
------
Class: bus
Probability: 0.701829
Co-ordinates: 0 32 415 244
------
```
![Train_Dets](images/bus_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/cat.jpg
Total objects detected: 1 in 0.190335 seconds
------
Class: cat
Probability: 0.703465
Co-ordinates: 34 0 381 282
------
```
![Train_Dets](images/cat_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/persons_mutli.jpg
Total objects detected: 2 in 0.169152 seconds
------
Class: person
Probability: 0.737349
Co-ordinates: 160 67 313 363
------
Class: person
Probability: 0.720328
Co-ordinates: 187 198 222 323
------
```
![Train_Dets](images/person_multi_det.jpg)
Go ahead and run the model with other images !
## Changing threshold
By default this model thresholds the detections at confidence of `0.53`. While filtering there are number of bounding boxes which are predicted, you can manually control what gets thresholded by passing the value of optional arguement `threshold` like:
```bash
<bin_path>/example_dnn_objdetect_obj_detect <model-definition-file> <model-weights-file> <test-image> <threshold>
```
Changing the threshold to say `0.0`, produces the following:
![Train_Dets](images/aero_thresh_det.jpg)
That doesn't seem to be that helpful !
# Image Classification
```bash
example_dnn_objdetect_image_classification <model-definition-file> <model-weights-file> <test-image>
```
The size of the model being **4.9MB**, just takes a time of **0.136401** seconds to classify the image.
Running the model on examples produces the following results:
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/aeroplane.jpg
Best class Index: 404
Time taken: 0.137722
Probability: 77.1757
```
Looking at [synset_words.txt](https://raw.githubusercontent.com/opencv/opencv/3.4.0/samples/data/dnn/synset_words.txt), the predicted class belongs to `airliner`
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/cat.jpg
Best class Index: 285
Time taken: 0.136401
Probability: 40.7111
```
This belongs to the class: `Egyptian cat`
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/space_shuttle.jpg
Best class Index: 812
Time taken: 0.137792
Probability: 15.8467
```
This belongs to the class: `space shuttle`
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment