Merge pull request #1253 from kvmanohar22:GSoC17_dnn_objdetect

GSoC'17 Learning compact models for object detection (#1253) * Final solver and model for SqueezeNet model * update README * update dependencies and CMakeLists * add global pooling * Add training scripts * fix typo * fix dependency of caffe * fix whitespace * Add squeezedet architecture * Pascal pre process script * Adding pre process scripts * Generate the graph of the model * more readable * fix some bugs in the graph * Post process class implementation * Complete minimal post processing and standalone running * Complete the base class * remove c++11 features and fix bugs * Complete example * fix bugs * Adding final scripts * Classification scripts * Update README.md * Add example code and results * Update README.md * Re-order and fix some bugs * fix build failure * Document classes and functions * Add instructions on how to use samples * update instructionos * fix docs failure * fix conversion types * fix type conversion warning * Change examples to sample directoryu * restructure directories * add more references * fix whitespace * retain aspect ratio * Add more examples * fix docs warnings * update with links to trained weights * threshold update * png -> jpg * fix tutorial * model files * precomp.hpp , fix readme links, module dependencies * copyrights - no copyright in samples - use new style OpenCV copyright header - precomp.hpp

Merge pull request #1253 from kvmanohar22:GSoC17_dnn_objdetect
GSoC'17 Learning compact models for object detection (#1253) * Final solver and model for SqueezeNet model * update README * update dependencies and CMakeLists * add global pooling * Add training scripts * fix typo * fix dependency of caffe * fix whitespace * Add squeezedet architecture * Pascal pre process script * Adding pre process scripts * Generate the graph of the model * more readable * fix some bugs in the graph * Post process class implementation * Complete minimal post processing and standalone running * Complete the base class * remove c++11 features and fix bugs * Complete example * fix bugs * Adding final scripts * Classification scripts * Update README.md * Add example code and results * Update README.md * Re-order and fix some bugs * fix build failure * Document classes and functions * Add instructions on how to use samples * update instructionos * fix docs failure * fix conversion types * fix type conversion warning * Change examples to sample directoryu * restructure directories * add more references * fix whitespace * retain aspect ratio * Add more examples * fix docs warnings * update with links to trained weights * threshold update * png -> jpg * fix tutorial * model files * precomp.hpp , fix readme links, module dependencies * copyrights - no copyright in samples - use new style OpenCV copyright header - precomp.hpp
41a5a5ea · Kv Manohar · Alexander Alekhin · c0b298c5 · 41a5a5ea · 41a5a5ea
Commit 41a5a5ea authored Jan 29, 2018 by Kv Manohar Committed by Alexander Alekhin Jan 29, 2018
33 changed files
--- a/modules/README.md
+++ b/modules/README.md
@@ -22,6 +22,8 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r

 - **datasets**: Datasets Reader -- Code for reading existing computer vision databases and samples of using the readers to train, test and run using that dataset's data.

+- **dnn_objdetect**: Object Detection using CNNs -- Implements compact CNN Model for object detection. Trained using Caffe but uses opencv_dnn modeule.
+
 - **dnns_easily_fooled**: Subvert DNNs -- This code can use the activations in a network to fool the networks into recognizing something else.

 - **dpm**: Deformable Part Model -- Felzenszwalb's Cascade with deformable parts object recognition code.

--- a/modules/dnn_objdetect/CMakeLists.txt
+++ b/modules/dnn_objdetect/CMakeLists.txt
+set(the_description "Object Detection using CNNs")
+
+ocv_define_module(dnn_objdetect opencv_core opencv_imgproc opencv_dnn
+    OPTIONAL opencv_highgui opencv_imgcodecs  # samples
+)
--- a/modules/dnn_objdetect/README.md
+++ b/modules/dnn_objdetect/README.md
+# Object Detection using Convolutional Neural Networks
+
+This module uses Convolutional Neural Networks for detecting objects in an image
+
+## Dependencies
+- opencv dnn module
+- Google Protobuf
+
+## Building this module
+
+Run the following command to build this module:
+
+```make
+cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_objdetect=ON <opencv_source_dir>
+```
+
+## Models
+
+There are two models which are trained.
+
+#### SqueezeNet model trained for Image Classification.
+
+- This model was trained for 1500000 iterations with a batch size of 16
+- Size of Model: 4.9MB
+- Top-1 Accuracy on ImageNet 2012 DataSet: 56.10%
+- Top-5 Accuracy on ImageNet 2012 DataSet: 79.54%
+- Link to trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeNet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
+
+#### SqueezeDet model trained for Object Detection
+
+- This model was trained for 180000 iterations with a batch size of 16
+- Size of the Model: 14.2MB
+- Link to the trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeDet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
+
+## Usage
+
+#### With Caffe
+
+For details pertaining to the usage of the model, have a look at [this repository](https://github.com/kvmanohar22/caffe)
+
+You can infact train your own object detection models with the loss function which is implemented.
+
+#### Without Caffe, using `opencv's dnn module`
+`tutorials/core_detect.cpp` gives an example of how to use the model to predict the bounding boxes.
+`tutorials/image_classification.cpp` gives an example of how to use the model to classify an image.
+
+Here's the brief summary of examples. For detailed usage and testing, refer `tutorials` directory.
+
+## Examples:
+
+### Image Classification
+
+```c++
+// Read the net along with it's trained weights
+cv::dnn::net = cv::dnn::readNetFromCaffe(model_defn, model_weights);
+
+// Read an image
+cv::Mat image = cv::imread(image_file);
+
+// Convert the image into blob
+cv::Mat image_blob = cv::net::blobFromImage(image);
+
+// Get the output of "predictions" layer
+cv::Mat probs = net.forward("predictions");
+
+```
+`probs` is a 4-d tensor of shape `[1, 1000, 1, 1]` which is obtained after the application of `softmax` activation.
+
+### Object Detection
+
+```c++
+// Reading the network and weights, converting image to blob is same as Image Classification example.
+
+// Forward through the network and collect blob data
+cv::Mat delta_bboxs = net.forward("slice")[0];
+cv::Mat conf_scores = net.forward("softmax");
+cv::Mat class_scores = net.forward("sigmoid");
+```
+Three blobs aka `delta_bbox`, `conf_scores`, `class_scores` are post-processed in `cv::dnn_objdetect::InferBbox` class and the bounding boxes predicted.
+
+```c++
+InferBbox infer(delta_bbox, class_scores, conf_scores);
+infer.filter();
+```
+
+`infer.filter()` returns vector of `cv::dnn_objdetect::object` of predictions. Here `cv::dnn_objdetect::object` is a structure containing the following elements.
+
+```c++
+typedef struct {
+  int xmin, xmax;
+  int ymin, ymax;
+  int class_idx;
+  std::string label_name;
+  double class_prob;
+} object;
+
+```
+For further details on post-processing refer this detailed [blog-post](https://kvmanohar22.github.io/GSoC/).
+
+## Results from Object Detection
+
+Refer `tutorials` directory for results.
--- a/modules/dnn_objdetect/doc/dnn_objdetect.bib
+++ b/modules/dnn_objdetect/doc/dnn_objdetect.bib
+@article{SqueezeNet,
+    Author = {Forrest N. Iandola and Song Han and Matthew W. Moskewicz and Khalid Ashraf and William J. Dally and Kurt Keutzer},
+    Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size},
+    Journal = {arXiv:1602.07360},
+    Year = {2016}
+}
+
+@inproceedings{squeezedet,
+    Author = {Bichen Wu and Forrest Iandola and Peter H. Jin and Kurt Keutzer},
+    Title = {SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving},
+    Journal = {arXiv:1612.01051},
+    Year = {2016}
+}
+
+@inproceedings{imagenet_cvpr09,
+    AUTHOR = {Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and Li, K. and Fei-Fei, L.},
+    TITLE = {{ImageNet: A Large-Scale Hierarchical Image Database}},
+    BOOKTITLE = {CVPR09},
+    YEAR = {2009},
+    BIBSOURCE = "http://www.image-net.org/papers/imagenet_cvpr09.bib"}
+
+@Article{Everingham10,
+    author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
+    title = "The Pascal Visual Object Classes (VOC) Challenge",
+    journal = "International Journal of Computer Vision",
+    volume = "88",
+    year = "2010",
+    number = "2",
+    month = jun,
+    pages = "303--338",
+}
\ No newline at end of file
--- a/modules/dnn_objdetect/include/opencv2/core_detect.hpp
+++ b/modules/dnn_objdetect/include/opencv2/core_detect.hpp
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
+#define _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
+
+#include <vector>
+#include <memory>
+
+#include <opencv2/highgui.hpp>
+#include <opencv2/imgproc.hpp>
+
+/** @defgroup dnn_objdetect DNN used for object detection
+*/
+
+namespace cv
+{
+namespace dnn_objdetect
+{
+
+    //! @addtogroup dnn_objdetect
+    //! @{
+
+    /** @brief Structure to hold the details pertaining to a single bounding box
+     */
+    typedef struct
+    {
+      int xmin, xmax;
+      int ymin, ymax;
+      size_t class_idx;
+      std::string label_name;
+      double class_prob;
+    } object;
+
+
+    /** @brief A class to post process model predictions
+     */
+    class CV_EXPORTS InferBbox
+    {
+      public:
+        /** @brief Default constructer
+        @param _delta_bbox Blob containing relative coordinates of bounding boxes
+        @param _class_scores Blob containing the probability values of each class
+        @param _conf_scores Blob containing the confidence scores
+         */
+        InferBbox(Mat _delta_bbox, Mat _class_scores, Mat _conf_scores);
+
+        /** @brief Filters the bounding boxes.
+         */
+        void filter(double thresh =  0.8);
+
+        /** @brief Vector which holds the final detections of the model
+         */
+        std::vector<object> detections;
+
+      protected:
+        /** @brief Transform relative coordinates from ConvDet to bounding box coordinates
+        @param bboxes Vector to hold the predicted bounding boxes
+         */
+        void transform_bboxes(std::vector<std::vector<double> > *bboxes);
+
+        /** @brief Computes final probability values of each bounding box
+        @param final_probs Vector to hold the probability values
+         */
+        void final_probability_dist(std::vector<std::vector<double> > *final_probs);
+
+        /** @brief Transform bounding boxes from [x, y, h, w] to [xmin, ymin, xmax, ymax]
+        @param pre Vector conatining initial co-ordinates
+        @param post Vector containing the transformed co-ordinates
+         */
+        void transform_bboxes_inv(std::vector<std::vector<double> > *pre,
+                                  std::vector<std::vector<double> > *post);
+
+        /** @brief Ensures that the bounding box values are within image boundaries
+        @param min_max_boxes Vector containing bounding boxes of the form [xmin, ymin, xmax, ymax]
+         */
+        void assert_predictions(std::vector<std::vector<double> > *min_max_boxes);
+
+        /** @brief Filter top `n` predictions
+        @param probs Final probability values of bounding boxes
+        @param boxes Predicted bounding box co-ordinates
+        @param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
+        @param top_n_idxs Containes class indices of top `n` bounding boxes
+        @param top_n_probs Contains probability values of top `n` bounding boxes
+         */
+        void filter_top_n(std::vector<std::vector<double> > *probs,
+                          std::vector<std::vector<double> > *boxes,
+                          std::vector<std::vector<double> > &top_n_boxes,
+                          std::vector<size_t> &top_n_idxs,
+                          std::vector<double> &top_n_probs);
+
+        /** @brief Wrapper to apply Non-Maximal Supression
+        @param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
+        @param top_n_idxs Containes class indices of top `n` bounding boxes
+        @param top_n_probs Contains probability values of top `n` bounding boxes
+         */
+        void nms_wrapper(std::vector<std::vector<double> > &top_n_boxes,
+                         std::vector<size_t> &top_n_idxs,
+                         std::vector<double> &top_n_probs);
+
+       /** @brief Applies Non-Maximal Supression
+       @param boxes Bounding box co-ordinates belonging to one class
+       @param probs Probability values of boxes belonging to one class
+        */
+        std::vector<bool> non_maximal_suppression(std::vector<std::vector<double> >
+                                         *boxes, std::vector<double> *probs);
+
+       /** @brief Computes intersection over union of bounding boxes
+       @param boxes Vector of bounding box co-ordinates
+       @param base_box Base box wrt which IOU is calculated
+       @param iou Vector to store IOU values
+        */
+        void intersection_over_union(std::vector<std::vector<double> > *boxes,
+                          std::vector<double> *base_box, std::vector<double> *iou);
+
+        static inline bool comparator (std::pair<double, size_t> l1,
+            std::pair<double, size_t> l2)
+        {
+          return l1.first > l2.first;
+        }
+
+      private:
+        Mat delta_bbox;
+        Mat class_scores;
+        Mat conf_scores;
+
+        unsigned int image_width;
+        unsigned int image_height;
+
+        unsigned int W, H;
+        std::vector<std::vector<double> > anchors_values;
+        std::vector<std::pair<double, double> > anchor_center;
+        std::vector<std::pair<double, double> > anchor_shapes;
+
+        std::vector<std::string> label_map;
+
+        unsigned int num_classes;
+        unsigned int anchors_per_grid;
+        size_t anchors;
+        double intersection_thresh;
+        double nms_intersection_thresh;
+        size_t n_top_detections;
+        double epsilon;
+    };
+
+    //! @}
+} // namespace dnn_objdetect
+} // namespace cv
+#endif
--- a/modules/dnn_objdetect/samples/data/README.md
+++ b/modules/dnn_objdetect/samples/data/README.md
+# Object Detection using Convolutional Neural Networks
+
+- These files include model weights, model definition files, model deploy files for two trained networks.
+
+### Network 1
+- SqueezeNet model trained on ImageNet 2012 Dataset
+
+### Network 2
+- SqueezeDet model trained on PASCAL VOC Dataset
--- a/modules/dnn_objdetect/samples/data/SqueezeDet_deploy.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeDet_deploy.prototxt
--- a/modules/dnn_objdetect/samples/data/SqueezeDet_solver.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeDet_solver.prototxt
+# Training and Testing protocol for Object Detection
+base_lr: 0.000001
+display: 1
+max_iter: 100000
+lr_policy: "step"
+gamma: 0.5
+stepsize: 100000
+momentum: 0.9
+weight_decay: 0.0002
+snapshot: 1000
+snapshot_prefix: "snapshot"
+solver_mode: GPU
+net: "SqueezeDet_train_test.prototxt"
--- a/modules/dnn_objdetect/samples/data/SqueezeDet_train_test.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeDet_train_test.prototxt
--- a/modules/dnn_objdetect/samples/data/SqueezeNet_deploy.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeNet_deploy.prototxt
--- a/modules/dnn_objdetect/samples/data/SqueezeNet_solver.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeNet_solver.prototxt
+# Solver for SqueezeNet Model
+test_iter: 1000
+test_interval: 1000
+base_lr: 0.03
+display: 1
+max_iter: 1500000
+lr_policy: "step"
+gamma: 0.5
+stepsize: 100000
+momentum: 0.9
+weight_decay: 0.0002
+snapshot: 1000
+snapshot_prefix: "snapshot"
+solver_mode: GPU
+net: "SqueezeNet_train_test.prototxt"
+random_seed: 42
+average_loss: 80
--- a/modules/dnn_objdetect/samples/data/SqueezeNet_train_test.prototxt
+++ b/modules/dnn_objdetect/samples/data/SqueezeNet_train_test.prototxt
--- a/modules/dnn_objdetect/samples/image_classification.cpp
+++ b/modules/dnn_objdetect/samples/image_classification.cpp
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/dnn.hpp>
+
+#include <iostream>
+#include <cstdlib>
+
+int main(int argc, char **argv)
+{
+
+    if (argc < 4)
+    {
+      std::cerr << "Usage " << argv[0] << ": "
+                << "<model-definition-file> " << " "
+                << "<model-weights-file> " << " "
+                << "<test-image>\n";
+      return -1;
+
+    }
+    cv::String model_prototxt = argv[1];
+    cv::String model_binary = argv[2];
+    cv::String test_image = argv[3];
+    cv::dnn::Net net = cv::dnn::readNetFromCaffe(model_prototxt, model_binary);
+
+    if (net.empty())
+    {
+        std::cerr << "Couldn't load the model !\n";
+        return -2;
+    }
+    cv::Mat img = cv::imread(test_image);
+    if (img.empty())
+    {
+        std::cerr << "Couldn't load image: " << test_image << "\n";
+        return -3;
+    }
+
+    cv::Mat input_blob = cv::dnn::blobFromImage(
+      img, 1.0, cv::Size(416, 416), cv::Scalar(104, 117, 123), false);
+
+    cv::Mat prob;
+    cv::TickMeter t;
+
+    net.setInput(input_blob);
+    t.start();
+    prob = net.forward("predictions");
+    t.stop();
+
+    int prob_size[3] = {1000, 1, 1};
+    cv::Mat prob_data(3, prob_size, CV_32F, prob.ptr<float>(0));
+
+    double max_prob = -1.0;
+    int class_idx = -1;
+    for (int idx = 0; idx < prob.size[1]; ++idx)
+    {
+        double current_prob = prob_data.at<float>(idx, 0, 0);
+        if (current_prob > max_prob)
+        {
+          max_prob = current_prob;
+          class_idx = idx;
+        }
+    }
+    std::cout << "Best class Index: " << class_idx << "\n";
+    std::cout << "Time taken: " << t.getTimeSec() << "\n";
+    std::cout << "Probability: " << max_prob * 100.0<< "\n";
+
+    return 0;
+}
--- a/modules/dnn_objdetect/samples/obj_detect.cpp
+++ b/modules/dnn_objdetect/samples/obj_detect.cpp
+#include <opencv2/dnn.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+
+#include <fstream>
+#include <iostream>
+#include <cstdlib>
+
+#include <opencv2/core_detect.hpp>
+
+using namespace cv;
+using namespace std;
+using namespace cv::dnn;
+using namespace cv::dnn_objdetect;
+
+int main(int argc, char **argv)
+{
+    if (argc < 4)
+    {
+        std::cerr << "Usage " << argv[0] << ": "
+                  << "<model-definition-file> "
+                  << "<model-weights-file> "
+                  << "<test-image> "
+                  << "<threshold>(optional)\n";
+        return -1;
+    }
+
+    std::string model_prototxt = argv[1];
+    std::string model_binary = argv[2];
+    std::string test_input_image = argv[3];
+    double threshold = 0.7;
+
+    if (argc == 5)
+    {
+      threshold = atof(argv[4]);
+      if (threshold > 1.0 || threshold < 0.0)
+      {
+        std::cerr << "Threshold should belong to [0, 1]\n";
+        return -1;
+      }
+    }
+
+    // Load the network
+    std::cout << "Loading the network...\n";
+    Net net = dnn::readNetFromCaffe(model_prototxt, model_binary);
+    if (net.empty())
+    {
+       std::cerr << "Couldn't load the model !\n";
+       return -2;
+    }
+    else
+    {
+      std::cout << "Done loading the network !\n\n";
+    }
+
+    // Load the test image
+    Mat img = cv::imread(test_input_image);
+    Mat original_img(img);
+    if (img.empty())
+    {
+        std::cerr << "Couldn't load image: " << test_input_image << "\n";
+        return -3;
+    }
+
+    cv::namedWindow("Initial Image", WINDOW_AUTOSIZE);
+    cv::imshow("Initial Image", img);
+
+    cv::resize(img, img, cv::Size(416, 416));
+    Mat img_copy(img);
+    img.convertTo(img, CV_32FC3);
+    Mat input_blob = blobFromImage(img, 1.0, Size(), cv::Scalar(104, 117, 123), false);
+
+    // Set the input blob
+
+    // Set the output layers
+    std::cout << "Getting the output of all the three blobs...\n";
+    std::vector<Mat> outblobs(3);
+    std::vector<cv::String> out_layers;
+    out_layers.push_back("slice");
+    out_layers.push_back("softmax");
+    out_layers.push_back("sigmoid");
+
+    // Bbox delta blob
+    std::vector<Mat> temp_blob;
+    net.setInput(input_blob);
+    cv::TickMeter t;
+
+    t.start();
+    net.forward(temp_blob, out_layers[0]);
+    t.stop();
+    outblobs[0] = temp_blob[2];
+
+    // class_scores blob
+    net.setInput(input_blob);
+    t.start();
+    outblobs[1] = net.forward(out_layers[1]);
+    t.stop();
+
+    // conf_scores blob
+    net.setInput(input_blob);
+    t.start();
+    outblobs[2] = net.forward(out_layers[2]);
+    t.stop();
+
+    // Check that the blobs are valid
+    for (size_t i = 0; i < outblobs.size(); ++i)
+    {
+        if (outblobs[i].empty())
+        {
+          std::cerr << "Blob: " << i << " is empty !\n";
+        }
+    }
+
+    int delta_bbox_size[3] = {23, 23, 36};
+    Mat delta_bbox(3, delta_bbox_size, CV_32F, outblobs[0].ptr<float>());
+
+    int class_scores_size[2] = {4761, 20};
+    Mat class_scores(2, class_scores_size, CV_32F, outblobs[1].ptr<float>());
+
+    int conf_scores_size[3] = {23, 23, 9};
+    Mat conf_scores(3, conf_scores_size, CV_32F, outblobs[2].ptr<float>());
+
+    InferBbox inf(delta_bbox, class_scores, conf_scores);
+    inf.filter(threshold);
+
+
+    double average_time = t.getTimeSec() / t.getCounter();
+    std::cout << "\nTotal objects detected: " << inf.detections.size()
+              << " in " << average_time << " seconds\n";
+    std::cout << "------\n";
+    float x_ratio = (float)original_img.cols / img_copy.cols;
+    float y_ratio = (float)original_img.rows / img_copy.rows;
+    for (size_t i = 0; i < inf.detections.size(); ++i)
+    {
+
+      int xmin = inf.detections[i].xmin;
+      int ymin = inf.detections[i].ymin;
+      int xmax = inf.detections[i].xmax;
+      int ymax = inf.detections[i].ymax;
+      cv::String class_name = inf.detections[i].label_name;
+      std::cout << "Class: " << class_name << "\n"
+                << "Probability: " << inf.detections[i].class_prob << "\n"
+                << "Co-ordinates: " << inf.detections[i].xmin << " "
+                << inf.detections[i].ymin << " "
+                << inf.detections[i].xmax << " "
+                << inf.detections[i].ymax << "\n";
+      std::cout << "------\n";
+      // Draw the corresponding bounding box(s)
+      cv::rectangle(original_img, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
+          cv::Point((int)(xmax * x_ratio), (int)(ymax * y_ratio)), cv::Scalar(255, 0, 0), 2);
+      cv::putText(original_img, class_name, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
+        cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255, 0, 0), 1);
+    }
+
+    try
+    {
+      cv::namedWindow("Final Detections", WINDOW_AUTOSIZE);
+      cv::imshow("Final Detections", original_img);
+      cv::imwrite("image.png", original_img);
+      cv::waitKey(0);
+    }
+    catch (const char* msg)
+    {
+      std::cerr << msg << "\n";
+      return -4;
+    }
+
+    return 0;
+}
--- a/modules/dnn_objdetect/scripts/k_means.py
+++ b/modules/dnn_objdetect/scripts/k_means.py
+import argparse
+import sys
+import os
+import time
+import numpy as np
+from sklearn.cluster import KMeans
+import matplotlib.pyplot as plt
+
+
+def k_means(K, data, max_iter, n_jobs, image_file):
+  X = np.array(data)
+  np.random.shuffle(X)
+  begin = time.time()
+  print 'Running kmeans'
+  kmeans = KMeans(n_clusters=K, max_iter=max_iter, n_jobs=n_jobs, verbose=1).fit(X)
+  print 'K-Means took {} seconds to complete'.format(time.time()-begin)
+  step_size = 0.2
+  xmin, xmax = X[:, 0].min()-1, X[:, 0].max()+1
+  ymin, ymax = X[:, 1].min()-1, X[:, 1].max()+1
+  xx, yy = np.meshgrid(np.arange(xmin, xmax, step_size), np.arange(ymin, ymax, step_size))
+  preds = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
+  preds = preds.reshape(xx.shape)
+
+  plt.figure()
+  plt.clf()
+  plt.imshow(preds, interpolation='nearest', extent=(xx.min(), xx.max(), yy.min(), yy.max()), cmap=plt.cm.Paired, aspect='auto', origin='lower')
+  plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)
+  centroids = kmeans.cluster_centers_
+  plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=169, linewidths=5, color='r', zorder=10)
+  plt.title("Anchor shapes generated using K-Means")
+  plt.xlim(xmin, xmax)
+  plt.ylim(ymin, ymax)
+  print 'Mean centroids are:'
+  for i, center in enumerate(centroids):
+    print '{}: {}, {}'.format(i, center[0], center[1])
+  # plt.xticks(())
+  # plt.yticks(())
+  plt.show()
+
+def pre_process(directory, data_list):
+  if not os.path.exists(directory):
+    print "Path {} doesn't exist".format(directory)
+    return
+  files = os.listdir(directory)
+  print 'Loading data...'
+  for i, f in enumerate(files):
+    # Progress bar
+    sys.stdout.write('\r')
+    percentage = (i+1.0) / len(files)
+    progress = int(percentage * 30)
+    bar = [progress*'=', ' '*(29-progress), percentage*100]
+    sys.stdout.write('[{}>{}]  {:.0f}%'.format(*bar))
+    sys.stdout.flush()
+
+    with open(directory+"/"+f, 'r') as ann:
+      l = ann.readline()
+      l = l.rstrip()
+      l = l.split(' ')
+      l = [float(i) for i in l]
+      if len(l) % 5 != 0:
+        sys.stderr.write('File {} contains incorrect number of annotations'.format(f))
+        return
+      num_objs = len(l) / 5
+      for obj in range(num_objs):
+        xmin = l[obj * 5 + 0]
+        ymin = l[obj * 5 + 1]
+        xmax = l[obj * 5 + 2]
+        ymax = l[obj * 5 + 3]
+        w = xmax - xmin
+        h = ymax - ymin
+        data_list.append([w, h])
+        if w > 1000 or h > 1000:
+          sys.stdout.write("[{}, {}]".format(w, h))
+  sys.stdout.write('\nProcessed {} files containing {} objects'.format(len(files), len(data_list)))
+  return data_list
+
+def main():
+  parser = argparse.ArgumentParser("Parse hyperparameters")
+  parser.add_argument("clusters", help="Number of clusters", type=int)
+  parser.add_argument("dir", help="Directory containing annotations")
+  parser.add_argument("image_file", help="File to generate the final cluster of image")
+  parser.add_argument('-jobs', help="Number of jobs for parallel computation", default=1)
+  parser.add_argument('-iter', help="Max Iterations to run algorithm for", default=1000)
+
+  p = parser.parse_args(sys.argv[1:])
+  K = p.clusters
+  directory = p.dir
+  data_list = []
+  pre_process(directory, data_list  )
+  sys.stdout.write('\nDone collecting data\n')
+  k_means(K, data_list, int(p.iter), int(p.jobs), p.image_file)
+  print 'Done !'
+
+if __name__=='__main__':
+  try:
+    main()
+  except Exception as E:
+    print E
--- a/modules/dnn_objdetect/scripts/pascal_preprocess.py
+++ b/modules/dnn_objdetect/scripts/pascal_preprocess.py
+from skimage import io, transform
+from multiprocessing.dummy import Pool as ThreadPool
+
+def rescale(root_new, root_old, img_path, ann_path, out_shape):
+  try:
+    img = io.imread(root_old+"/"+img_path)
+  except Exception as E:
+    print E
+  h, w, _ = img.shape
+  f_h, f_w = float(out_shape)/h, float(out_shape)/w
+  trans_img = transform.rescale(img, (f_h, f_w))
+  num_objs = 0
+  with open(root_old+"/"+ann_path, 'r') as f:
+    ann = f.readline()
+    ann = ann.rstrip()
+    ann = ann.split(' ')
+    ann = [float(i) for i in ann]
+    num_objs = len(ann) / 5
+    for idx in xrange(num_objs):
+      ann[idx * 5 + 0] = int(f_w * ann[idx * 5 + 0])
+      ann[idx * 5 + 1] = int(f_h * ann[idx * 5 + 1])
+      ann[idx * 5 + 2] = int(f_w * ann[idx * 5 + 2])
+      ann[idx * 5 + 3] = int(f_h * ann[idx * 5 + 3])
+    # Write the new annotations to file
+    with open(root_new+"/"+ann_path, 'w') as f_new:
+      for val in ann:
+        f_new.write(str(val)+' ')
+  # Save the new image
+  io.imwrite(root_new+"/"+img_path, trans_img)
+
+def preprocess():
+  source = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize/source.txt'
+  root_old = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012'
+  root_new = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize'
+  out_shape = 416
+  with open(source, 'r') as src:
+    lines = src.readlines()
+    print 'Processing {} images and annotations'.format(len(lines))
+    for line in lines:
+      line = line.rstrip()
+      line = line.split(' ')
+      img_path = line[0]
+      ann_path = line[1]
+      rescale(root_new, root_old, img_path, ann_path, out_shape)
+
+if __name__ == '__main__':
+  preprocess()
--- a/modules/dnn_objdetect/src/core_detect.cpp
+++ b/modules/dnn_objdetect/src/core_detect.cpp
--- a/modules/dnn_objdetect/src/precomp.hpp
+++ b/modules/dnn_objdetect/src/precomp.hpp
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+#ifndef __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
+#define __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
+
+#include <iostream>
+#include <vector>
+#include <memory>
+#include <string>
+#include <map>
+#include <numeric>
+#include <algorithm>
+
+#include "opencv2/core.hpp"
+#include "opencv2/dnn.hpp"
+
+#endif // __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
--- a/modules/dnn_objdetect/tutorials/dnn_objdetect_tutorial.markdown
+++ b/modules/dnn_objdetect/tutorials/dnn_objdetect_tutorial.markdown
+Object Detection using CNNs {#tutorial_dnn_objdetect}
+===========================
+
+# Building
+
+Build samples of "dnn_objectect" module. Refer to OpenCV build tutorials for details.
+Enable `BUILD_EXAMPLES=ON` CMake option and build these targets (Linux):
+- example_dnn_objdetect_image_classification
+- example_dnn_objdetect_obj_detect
+
+Download the weights file and model definition file from `opencv_extra/dnn_objdetect`
+
+
+# Object Detection
+
+```bash
+example_dnn_objdetect_obj_detect  <model-definition-file>  <model-weights-file>  <test-image>
+```
+
+All the following examples were run on a laptop with `Intel(R) Core(TM)2 i3-4005U CPU @ 1.70GHz` (without GPU).
+
+The model is incredibly fast taking just `0.172091` seconds on an average to predict multiple bounding boxes.
+
+```bash
+<bin_path>/example_dnn_objdetect_obj_detect  SqueezeDet_deploy.prototxt  SqueezeDet.caffemodel  tutorials/images/aeroplane.jpg
+
+Total objects detected: 1 in 0.168792 seconds
+------
+Class: aeroplane
+Probability: 0.845181
+Co-ordinates: 41 116 415 254
+------
+```
+
+![Train_Dets](images/aero_det.jpg)
+
+
+```bash
+<bin_path>/example_dnn_objdetect_obj_detect  SqueezeDet_deploy.prototxt  SqueezeDet.caffemodel  tutorials/images/bus.jpg
+
+Total objects detected: 1 in 0.201276 seconds
+------
+Class: bus
+Probability: 0.701829
+Co-ordinates: 0 32 415 244
+------
+```
+
+![Train_Dets](images/bus_det.jpg)
+
+```bash
+<bin_path>/example_dnn_objdetect_obj_detect  SqueezeDet_deploy.prototxt  SqueezeDet.caffemodel  tutorials/images/cat.jpg
+
+Total objects detected: 1 in 0.190335 seconds
+------
+Class: cat
+Probability: 0.703465
+Co-ordinates: 34 0 381 282
+------
+```
+
+![Train_Dets](images/cat_det.jpg)
+
+```bash
+<bin_path>/example_dnn_objdetect_obj_detect  SqueezeDet_deploy.prototxt  SqueezeDet.caffemodel  tutorials/images/persons_mutli.jpg
+
+Total objects detected: 2 in 0.169152 seconds
+------
+Class: person
+Probability: 0.737349
+Co-ordinates: 160 67 313 363
+------
+Class: person
+Probability: 0.720328
+Co-ordinates: 187 198 222 323
+------
+```
+
+![Train_Dets](images/person_multi_det.jpg)
+
+Go ahead and run the model with other images !
+
+
+## Changing threshold
+
+By default this model thresholds the detections at confidence of `0.53`. While filtering there are number of bounding boxes which are predicted, you can manually control what gets thresholded by passing the value of optional arguement `threshold` like:
+
+```bash
+<bin_path>/example_dnn_objdetect_obj_detect  <model-definition-file>  <model-weights-file>  <test-image> <threshold>
+```
+
+Changing the threshold to say `0.0`, produces the following:
+
+![Train_Dets](images/aero_thresh_det.jpg)
+
+That doesn't seem to be that helpful !
+
+# Image Classification
+
+```bash
+example_dnn_objdetect_image_classification  <model-definition-file>  <model-weights-file>  <test-image>
+```
+
+The size of the model being **4.9MB**, just takes a time of **0.136401** seconds to classify the image.
+
+Running the model on examples produces the following results:
+
+```bash
+<bin_path>/example_dnn_objdetect_image_classification  SqueezeNet_deploy.prototxt  SqueezeNet.caffemodel  tutorials/images/aeroplane.jpg
+Best class Index: 404
+Time taken: 0.137722
+Probability: 77.1757
+```
+
+Looking at [synset_words.txt](https://raw.githubusercontent.com/opencv/opencv/3.4.0/samples/data/dnn/synset_words.txt), the predicted class belongs to `airliner`
+
+
+```bash
+<bin_path>/example_dnn_objdetect_image_classification  SqueezeNet_deploy.prototxt  SqueezeNet.caffemodel  tutorials/images/cat.jpg
+Best class Index: 285
+Time taken: 0.136401
+Probability: 40.7111
+```
+
+This belongs to the class: `Egyptian cat`
+
+```bash
+<bin_path>/example_dnn_objdetect_image_classification  SqueezeNet_deploy.prototxt  SqueezeNet.caffemodel  tutorials/images/space_shuttle.jpg
+Best class Index: 812
+Time taken: 0.137792
+Probability: 15.8467
+```
+
+This belongs to the class: `space shuttle`
--- a/modules/dnn_objdetect/tutorials/images/aero_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/aero_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/aero_thresh_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/aero_thresh_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/aeroplane.jpg
+++ b/modules/dnn_objdetect/tutorials/images/aeroplane.jpg
--- a/modules/dnn_objdetect/tutorials/images/bus.jpg
+++ b/modules/dnn_objdetect/tutorials/images/bus.jpg
--- a/modules/dnn_objdetect/tutorials/images/bus_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/bus_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/cat.jpg
+++ b/modules/dnn_objdetect/tutorials/images/cat.jpg
--- a/modules/dnn_objdetect/tutorials/images/cat_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/cat_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/multi_1_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/multi_1_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/multi_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/multi_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/person.jpg
+++ b/modules/dnn_objdetect/tutorials/images/person.jpg
--- a/modules/dnn_objdetect/tutorials/images/person_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/person_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/person_multi_det.jpg
+++ b/modules/dnn_objdetect/tutorials/images/person_multi_det.jpg
--- a/modules/dnn_objdetect/tutorials/images/persons_multi.jpg
+++ b/modules/dnn_objdetect/tutorials/images/persons_multi.jpg
--- a/modules/dnn_objdetect/tutorials/images/space_shuttle.jpg
+++ b/modules/dnn_objdetect/tutorials/images/space_shuttle.jpg