Commit 41a5a5ea authored by Kv Manohar's avatar Kv Manohar Committed by Alexander Alekhin

Merge pull request #1253 from kvmanohar22:GSoC17_dnn_objdetect

GSoC'17 Learning compact models for object detection (#1253)

* Final solver and model for SqueezeNet model

* update README

* update dependencies and CMakeLists

* add global pooling

* Add training scripts

* fix typo

* fix dependency of caffe

* fix whitespace

* Add squeezedet architecture

* Pascal pre process script

* Adding pre process scripts

* Generate the graph of the model

* more readable

* fix some bugs in the graph

* Post process class implementation

* Complete minimal post processing and standalone running

* Complete the base class

* remove c++11 features and fix bugs

* Complete example

* fix bugs

* Adding final scripts

* Classification scripts

* Update README.md

* Add example code and results

* Update README.md

* Re-order and fix some bugs

* fix build failure

* Document classes and functions

* Add instructions on how to use samples

* update instructionos

* fix docs failure

* fix conversion types

* fix type conversion warning

* Change examples to sample directoryu

* restructure directories

* add more references

* fix whitespace

* retain aspect ratio

* Add more examples

* fix docs warnings

* update with links to trained weights

* threshold update

* png -> jpg

* fix tutorial

* model files

* precomp.hpp , fix readme links, module dependencies

* copyrights

- no copyright in samples
- use new style OpenCV copyright header
- precomp.hpp
parent c0b298c5
......@@ -22,6 +22,8 @@ $ cmake -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -D BUILD_opencv_<r
- **datasets**: Datasets Reader -- Code for reading existing computer vision databases and samples of using the readers to train, test and run using that dataset's data.
- **dnn_objdetect**: Object Detection using CNNs -- Implements compact CNN Model for object detection. Trained using Caffe but uses opencv_dnn modeule.
- **dnns_easily_fooled**: Subvert DNNs -- This code can use the activations in a network to fool the networks into recognizing something else.
- **dpm**: Deformable Part Model -- Felzenszwalb's Cascade with deformable parts object recognition code.
......
set(the_description "Object Detection using CNNs")
ocv_define_module(dnn_objdetect opencv_core opencv_imgproc opencv_dnn
OPTIONAL opencv_highgui opencv_imgcodecs # samples
)
# Object Detection using Convolutional Neural Networks
This module uses Convolutional Neural Networks for detecting objects in an image
## Dependencies
- opencv dnn module
- Google Protobuf
## Building this module
Run the following command to build this module:
```make
cmake -DOPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules -Dopencv_dnn_objdetect=ON <opencv_source_dir>
```
## Models
There are two models which are trained.
#### SqueezeNet model trained for Image Classification.
- This model was trained for 1500000 iterations with a batch size of 16
- Size of Model: 4.9MB
- Top-1 Accuracy on ImageNet 2012 DataSet: 56.10%
- Top-5 Accuracy on ImageNet 2012 DataSet: 79.54%
- Link to trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeNet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
#### SqueezeDet model trained for Object Detection
- This model was trained for 180000 iterations with a batch size of 16
- Size of the Model: 14.2MB
- Link to the trained weights: [here](https://github.com/kvmanohar22/caffe/blob/obj_detect_loss/proto/SqueezeDet.caffemodel) ([copy](https://github.com/opencv/opencv_3rdparty/tree/dnn_objdetect_20170827))
## Usage
#### With Caffe
For details pertaining to the usage of the model, have a look at [this repository](https://github.com/kvmanohar22/caffe)
You can infact train your own object detection models with the loss function which is implemented.
#### Without Caffe, using `opencv's dnn module`
`tutorials/core_detect.cpp` gives an example of how to use the model to predict the bounding boxes.
`tutorials/image_classification.cpp` gives an example of how to use the model to classify an image.
Here's the brief summary of examples. For detailed usage and testing, refer `tutorials` directory.
## Examples:
### Image Classification
```c++
// Read the net along with it's trained weights
cv::dnn::net = cv::dnn::readNetFromCaffe(model_defn, model_weights);
// Read an image
cv::Mat image = cv::imread(image_file);
// Convert the image into blob
cv::Mat image_blob = cv::net::blobFromImage(image);
// Get the output of "predictions" layer
cv::Mat probs = net.forward("predictions");
```
`probs` is a 4-d tensor of shape `[1, 1000, 1, 1]` which is obtained after the application of `softmax` activation.
### Object Detection
```c++
// Reading the network and weights, converting image to blob is same as Image Classification example.
// Forward through the network and collect blob data
cv::Mat delta_bboxs = net.forward("slice")[0];
cv::Mat conf_scores = net.forward("softmax");
cv::Mat class_scores = net.forward("sigmoid");
```
Three blobs aka `delta_bbox`, `conf_scores`, `class_scores` are post-processed in `cv::dnn_objdetect::InferBbox` class and the bounding boxes predicted.
```c++
InferBbox infer(delta_bbox, class_scores, conf_scores);
infer.filter();
```
`infer.filter()` returns vector of `cv::dnn_objdetect::object` of predictions. Here `cv::dnn_objdetect::object` is a structure containing the following elements.
```c++
typedef struct {
int xmin, xmax;
int ymin, ymax;
int class_idx;
std::string label_name;
double class_prob;
} object;
```
For further details on post-processing refer this detailed [blog-post](https://kvmanohar22.github.io/GSoC/).
## Results from Object Detection
Refer `tutorials` directory for results.
@article{SqueezeNet,
Author = {Forrest N. Iandola and Song Han and Matthew W. Moskewicz and Khalid Ashraf and William J. Dally and Kurt Keutzer},
Title = {SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size},
Journal = {arXiv:1602.07360},
Year = {2016}
}
@inproceedings{squeezedet,
Author = {Bichen Wu and Forrest Iandola and Peter H. Jin and Kurt Keutzer},
Title = {SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving},
Journal = {arXiv:1612.01051},
Year = {2016}
}
@inproceedings{imagenet_cvpr09,
AUTHOR = {Deng, J. and Dong, W. and Socher, R. and Li, L.-J. and Li, K. and Fei-Fei, L.},
TITLE = {{ImageNet: A Large-Scale Hierarchical Image Database}},
BOOKTITLE = {CVPR09},
YEAR = {2009},
BIBSOURCE = "http://www.image-net.org/papers/imagenet_cvpr09.bib"}
@Article{Everingham10,
author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
title = "The Pascal Visual Object Classes (VOC) Challenge",
journal = "International Journal of Computer Vision",
volume = "88",
year = "2010",
number = "2",
month = jun,
pages = "303--338",
}
\ No newline at end of file
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
#define _OPENCV_DNN_OBJDETECT_CORE_DETECT_HPP_
#include <vector>
#include <memory>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
/** @defgroup dnn_objdetect DNN used for object detection
*/
namespace cv
{
namespace dnn_objdetect
{
//! @addtogroup dnn_objdetect
//! @{
/** @brief Structure to hold the details pertaining to a single bounding box
*/
typedef struct
{
int xmin, xmax;
int ymin, ymax;
size_t class_idx;
std::string label_name;
double class_prob;
} object;
/** @brief A class to post process model predictions
*/
class CV_EXPORTS InferBbox
{
public:
/** @brief Default constructer
@param _delta_bbox Blob containing relative coordinates of bounding boxes
@param _class_scores Blob containing the probability values of each class
@param _conf_scores Blob containing the confidence scores
*/
InferBbox(Mat _delta_bbox, Mat _class_scores, Mat _conf_scores);
/** @brief Filters the bounding boxes.
*/
void filter(double thresh = 0.8);
/** @brief Vector which holds the final detections of the model
*/
std::vector<object> detections;
protected:
/** @brief Transform relative coordinates from ConvDet to bounding box coordinates
@param bboxes Vector to hold the predicted bounding boxes
*/
void transform_bboxes(std::vector<std::vector<double> > *bboxes);
/** @brief Computes final probability values of each bounding box
@param final_probs Vector to hold the probability values
*/
void final_probability_dist(std::vector<std::vector<double> > *final_probs);
/** @brief Transform bounding boxes from [x, y, h, w] to [xmin, ymin, xmax, ymax]
@param pre Vector conatining initial co-ordinates
@param post Vector containing the transformed co-ordinates
*/
void transform_bboxes_inv(std::vector<std::vector<double> > *pre,
std::vector<std::vector<double> > *post);
/** @brief Ensures that the bounding box values are within image boundaries
@param min_max_boxes Vector containing bounding boxes of the form [xmin, ymin, xmax, ymax]
*/
void assert_predictions(std::vector<std::vector<double> > *min_max_boxes);
/** @brief Filter top `n` predictions
@param probs Final probability values of bounding boxes
@param boxes Predicted bounding box co-ordinates
@param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
@param top_n_idxs Containes class indices of top `n` bounding boxes
@param top_n_probs Contains probability values of top `n` bounding boxes
*/
void filter_top_n(std::vector<std::vector<double> > *probs,
std::vector<std::vector<double> > *boxes,
std::vector<std::vector<double> > &top_n_boxes,
std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs);
/** @brief Wrapper to apply Non-Maximal Supression
@param top_n_boxes Contains bounding box co-ordinates of top `n` boxes
@param top_n_idxs Containes class indices of top `n` bounding boxes
@param top_n_probs Contains probability values of top `n` bounding boxes
*/
void nms_wrapper(std::vector<std::vector<double> > &top_n_boxes,
std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs);
/** @brief Applies Non-Maximal Supression
@param boxes Bounding box co-ordinates belonging to one class
@param probs Probability values of boxes belonging to one class
*/
std::vector<bool> non_maximal_suppression(std::vector<std::vector<double> >
*boxes, std::vector<double> *probs);
/** @brief Computes intersection over union of bounding boxes
@param boxes Vector of bounding box co-ordinates
@param base_box Base box wrt which IOU is calculated
@param iou Vector to store IOU values
*/
void intersection_over_union(std::vector<std::vector<double> > *boxes,
std::vector<double> *base_box, std::vector<double> *iou);
static inline bool comparator (std::pair<double, size_t> l1,
std::pair<double, size_t> l2)
{
return l1.first > l2.first;
}
private:
Mat delta_bbox;
Mat class_scores;
Mat conf_scores;
unsigned int image_width;
unsigned int image_height;
unsigned int W, H;
std::vector<std::vector<double> > anchors_values;
std::vector<std::pair<double, double> > anchor_center;
std::vector<std::pair<double, double> > anchor_shapes;
std::vector<std::string> label_map;
unsigned int num_classes;
unsigned int anchors_per_grid;
size_t anchors;
double intersection_thresh;
double nms_intersection_thresh;
size_t n_top_detections;
double epsilon;
};
//! @}
} // namespace dnn_objdetect
} // namespace cv
#endif
# Object Detection using Convolutional Neural Networks
- These files include model weights, model definition files, model deploy files for two trained networks.
### Network 1
- SqueezeNet model trained on ImageNet 2012 Dataset
### Network 2
- SqueezeDet model trained on PASCAL VOC Dataset
# SqueezeDet architecture for object detection on PASCAL VOC dataset
name: "SqueezeDet"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 416
input_dim: 416
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire2_squeeze"
type: "Convolution"
bottom: "pool1"
top: "fire2_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_squeeze"
type: "ReLU"
bottom: "fire2_squeeze"
top: "fire2_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2_expand_1x1"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_1x1"
type: "ReLU"
bottom: "fire2_expand_1x1"
top: "fire2_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2_expand_3x3"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_3x3"
type: "ReLU"
bottom: "fire2_expand_3x3"
top: "fire2_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2"
type: "Concat"
bottom: "fire2_expand_1x1"
bottom: "fire2_expand_3x3"
top: "fire2"
concat_param {
axis: 1
}
}
layer {
name: "fire3_squeeze"
type: "Convolution"
bottom: "fire2"
top: "fire3_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_squeeze"
type: "ReLU"
bottom: "fire3_squeeze"
top: "fire3_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3_expand_1x1"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_1x1"
type: "ReLU"
bottom: "fire3_expand_1x1"
top: "fire3_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3_expand_3x3"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_3x3"
type: "ReLU"
bottom: "fire3_expand_3x3"
top: "fire3_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3"
type: "Concat"
bottom: "fire3_expand_1x1"
bottom: "fire3_expand_3x3"
top: "fire3"
concat_param {
axis: 1
}
}
layer {
name: "fire4_squeeze"
type: "Convolution"
bottom: "fire3"
top: "fire4_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_squeeze"
type: "ReLU"
bottom: "fire4_squeeze"
top: "fire4_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4_expand_1x1"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_1x1"
type: "ReLU"
bottom: "fire4_expand_1x1"
top: "fire4_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4_expand_3x3"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_3x3"
type: "ReLU"
bottom: "fire4_expand_3x3"
top: "fire4_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4"
type: "Concat"
bottom: "fire4_expand_1x1"
bottom: "fire4_expand_3x3"
top: "fire4"
concat_param {
axis: 1
}
}
layer {
name: "pool4"
type: "Pooling"
bottom: "fire4"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire5_squeeze"
type: "Convolution"
bottom: "pool4"
top: "fire5_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_squeeze"
type: "ReLU"
bottom: "fire5_squeeze"
top: "fire5_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5_expand_1x1"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_1x1"
type: "ReLU"
bottom: "fire5_expand_1x1"
top: "fire5_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5_expand_3x3"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_3x3"
type: "ReLU"
bottom: "fire5_expand_3x3"
top: "fire5_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5"
type: "Concat"
bottom: "fire5_expand_1x1"
bottom: "fire5_expand_3x3"
top: "fire5"
concat_param {
axis: 1
}
}
layer {
name: "fire6_squeeze"
type: "Convolution"
bottom: "fire5"
top: "fire6_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_squeeze"
type: "ReLU"
bottom: "fire6_squeeze"
top: "fire6_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6_expand_1x1"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_1x1"
type: "ReLU"
bottom: "fire6_expand_1x1"
top: "fire6_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6_expand_3x3"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_3x3"
type: "ReLU"
bottom: "fire6_expand_3x3"
top: "fire6_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6"
type: "Concat"
bottom: "fire6_expand_1x1"
bottom: "fire6_expand_3x3"
top: "fire6"
concat_param {
axis: 1
}
}
layer {
name: "fire7_squeeze"
type: "Convolution"
bottom: "fire6"
top: "fire7_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_squeeze"
type: "ReLU"
bottom: "fire7_squeeze"
top: "fire7_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7_expand_1x1"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_1x1"
type: "ReLU"
bottom: "fire7_expand_1x1"
top: "fire7_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7_expand_3x3"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_3x3"
type: "ReLU"
bottom: "fire7_expand_3x3"
top: "fire7_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7"
type: "Concat"
bottom: "fire7_expand_1x1"
bottom: "fire7_expand_3x3"
top: "fire7"
concat_param {
axis: 1
}
}
layer {
name: "fire8_squeeze"
type: "Convolution"
bottom: "fire7"
top: "fire8_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_squeeze"
type: "ReLU"
bottom: "fire8_squeeze"
top: "fire8_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8_expand_1x1"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_1x1"
type: "ReLU"
bottom: "fire8_expand_1x1"
top: "fire8_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8_expand_3x3"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_3x3"
type: "ReLU"
bottom: "fire8_expand_3x3"
top: "fire8_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8"
type: "Concat"
bottom: "fire8_expand_1x1"
bottom: "fire8_expand_3x3"
top: "fire8"
concat_param {
axis: 1
}
}
layer {
name: "pool8"
type: "Pooling"
bottom: "fire8"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire9_squeeze"
type: "Convolution"
bottom: "pool8"
top: "fire9_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_squeeze"
type: "ReLU"
bottom: "fire9_squeeze"
top: "fire9_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9_expand_1x1"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_1x1"
type: "ReLU"
bottom: "fire9_expand_1x1"
top: "fire9_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9_expand_3x3"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_3x3"
type: "ReLU"
bottom: "fire9_expand_3x3"
top: "fire9_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9"
type: "Concat"
bottom: "fire9_expand_1x1"
bottom: "fire9_expand_3x3"
top: "fire9"
concat_param {
axis: 1
}
}
layer {
name: "conv10"
type: "Convolution"
bottom: "fire9"
top: "conv10"
convolution_param {
num_output: 1000
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.01
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv10"
type: "ReLU"
bottom: "conv10"
top: "conv10"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_squeeze"
type: "Convolution"
bottom: "conv10"
top: "fire10_squeeze"
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_squeeze"
type: "ReLU"
bottom: "fire10_squeeze"
top: "fire10_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_expand_1x1"
type: "Convolution"
bottom: "fire10_squeeze"
top: "fire10_expand_1x1"
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_expand_1x1"
type: "ReLU"
bottom: "fire10_expand_1x1"
top: "fire10_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_expand_3x3"
type: "Convolution"
bottom: "fire10_squeeze"
top: "fire10_expand_3x3"
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_expand_3x3"
type: "ReLU"
bottom: "fire10_expand_3x3"
top: "fire10_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10"
type: "Concat"
bottom: "fire10_expand_1x1"
bottom: "fire10_expand_3x3"
top: "fire10"
concat_param {
axis: 1
}
}
layer {
name: "fire11_squeeze"
type: "Convolution"
bottom: "fire10"
top: "fire11_squeeze"
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_squeeze"
type: "ReLU"
bottom: "fire11_squeeze"
top: "fire11_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11_expand_1x1"
type: "Convolution"
bottom: "fire11_squeeze"
top: "fire11_expand_1x1"
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_expand_1x1"
type: "ReLU"
bottom: "fire11_expand_1x1"
top: "fire11_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11_expand_3x3"
type: "Convolution"
bottom: "fire11_squeeze"
top: "fire11_expand_3x3"
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_expand_3x3"
type: "ReLU"
bottom: "fire11_expand_3x3"
top: "fire11_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11"
type: "Concat"
bottom: "fire11_expand_1x1"
bottom: "fire11_expand_3x3"
top: "fire11"
concat_param {
axis: 1
}
}
layer {
name: "conv11"
type: "Convolution"
bottom: "fire11"
top: "conv11"
convolution_param {
num_output: 225
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.0001
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "permute"
type: "Permute"
bottom: "conv11"
top: "permute_conv11"
permute_param {
order: 0 # N
order: 2 # H
order: 3 # W
order: 1 # C
}
}
layer {
name: "slice"
type: "Slice"
bottom: "permute_conv11"
top: "soft_class_reg"
top: "sig_conf_reg"
top: "delta_bbox"
slice_param {
axis: 3
slice_point: 180 # anchors_per_grid * classes_
slice_point: 189 # anchors_per_grid * (classes_ + 1)
}
}
layer {
name: "reshape"
type: "Reshape"
bottom: "soft_class_reg"
top: "reshape_soft_class_reg"
reshape_param {
shape {
dim: 0 # batch_size
dim: 4761 # H*W*anchors_per_grid
dim: 20 # No. of classes
}
}
}
layer {
name: "softmax"
type: "Softmax"
bottom: "reshape_soft_class_reg"
top: "class_scores"
softmax_param {
axis: 2
}
}
layer {
name: "sigmoid"
type: "Sigmoid"
bottom: "sig_conf_reg"
top: "conf_scores"
}
# Training and Testing protocol for Object Detection
base_lr: 0.000001
display: 1
max_iter: 100000
lr_policy: "step"
gamma: 0.5
stepsize: 100000
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
snapshot_prefix: "snapshot"
solver_mode: GPU
net: "SqueezeDet_train_test.prototxt"
# SqueezeDet architecture for object detection on PASCAL VOC dataset
name: "SqueezeDet"
layer {
name: "data"
type: "BboxData"
top: "data"
top: "bbox"
bbox_data_param {
source: "source.txt"
batch_size: 2
is_color: true
shuffle: true
root_folder: "VOC2012_Resize"
}
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
}
include {
phase: TRAIN
}
}
layer {
name: "data"
type: "BboxData"
top: "data"
top: "bbox"
bbox_data_param {
source: "source.txt"
batch_size: 2
is_color: true
shuffle: true
root_folder: "VOC2012_Resize"
}
transform_param {
mean_value: 104
mean_value: 117
mean_value: 123
}
include {
phase: TEST
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire2_squeeze"
type: "Convolution"
bottom: "pool1"
top: "fire2_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_squeeze"
type: "ReLU"
bottom: "fire2_squeeze"
top: "fire2_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2_expand_1x1"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_1x1"
type: "ReLU"
bottom: "fire2_expand_1x1"
top: "fire2_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2_expand_3x3"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_3x3"
type: "ReLU"
bottom: "fire2_expand_3x3"
top: "fire2_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire2"
type: "Concat"
bottom: "fire2_expand_1x1"
bottom: "fire2_expand_3x3"
top: "fire2"
concat_param {
axis: 1
}
}
layer {
name: "fire3_squeeze"
type: "Convolution"
bottom: "fire2"
top: "fire3_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_squeeze"
type: "ReLU"
bottom: "fire3_squeeze"
top: "fire3_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3_expand_1x1"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_1x1"
type: "ReLU"
bottom: "fire3_expand_1x1"
top: "fire3_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3_expand_3x3"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_3x3"
type: "ReLU"
bottom: "fire3_expand_3x3"
top: "fire3_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire3"
type: "Concat"
bottom: "fire3_expand_1x1"
bottom: "fire3_expand_3x3"
top: "fire3"
concat_param {
axis: 1
}
}
layer {
name: "fire4_squeeze"
type: "Convolution"
bottom: "fire3"
top: "fire4_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_squeeze"
type: "ReLU"
bottom: "fire4_squeeze"
top: "fire4_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4_expand_1x1"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_1x1"
type: "ReLU"
bottom: "fire4_expand_1x1"
top: "fire4_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4_expand_3x3"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_3x3"
type: "ReLU"
bottom: "fire4_expand_3x3"
top: "fire4_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire4"
type: "Concat"
bottom: "fire4_expand_1x1"
bottom: "fire4_expand_3x3"
top: "fire4"
concat_param {
axis: 1
}
}
layer {
name: "pool4"
type: "Pooling"
bottom: "fire4"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire5_squeeze"
type: "Convolution"
bottom: "pool4"
top: "fire5_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_squeeze"
type: "ReLU"
bottom: "fire5_squeeze"
top: "fire5_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5_expand_1x1"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_1x1"
type: "ReLU"
bottom: "fire5_expand_1x1"
top: "fire5_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5_expand_3x3"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_3x3"
type: "ReLU"
bottom: "fire5_expand_3x3"
top: "fire5_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire5"
type: "Concat"
bottom: "fire5_expand_1x1"
bottom: "fire5_expand_3x3"
top: "fire5"
concat_param {
axis: 1
}
}
layer {
name: "fire6_squeeze"
type: "Convolution"
bottom: "fire5"
top: "fire6_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_squeeze"
type: "ReLU"
bottom: "fire6_squeeze"
top: "fire6_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6_expand_1x1"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_1x1"
type: "ReLU"
bottom: "fire6_expand_1x1"
top: "fire6_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6_expand_3x3"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_3x3"
type: "ReLU"
bottom: "fire6_expand_3x3"
top: "fire6_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire6"
type: "Concat"
bottom: "fire6_expand_1x1"
bottom: "fire6_expand_3x3"
top: "fire6"
concat_param {
axis: 1
}
}
layer {
name: "fire7_squeeze"
type: "Convolution"
bottom: "fire6"
top: "fire7_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_squeeze"
type: "ReLU"
bottom: "fire7_squeeze"
top: "fire7_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7_expand_1x1"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_1x1"
type: "ReLU"
bottom: "fire7_expand_1x1"
top: "fire7_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7_expand_3x3"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_3x3"
type: "ReLU"
bottom: "fire7_expand_3x3"
top: "fire7_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire7"
type: "Concat"
bottom: "fire7_expand_1x1"
bottom: "fire7_expand_3x3"
top: "fire7"
concat_param {
axis: 1
}
}
layer {
name: "fire8_squeeze"
type: "Convolution"
bottom: "fire7"
top: "fire8_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_squeeze"
type: "ReLU"
bottom: "fire8_squeeze"
top: "fire8_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8_expand_1x1"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_1x1"
type: "ReLU"
bottom: "fire8_expand_1x1"
top: "fire8_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8_expand_3x3"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_3x3"
type: "ReLU"
bottom: "fire8_expand_3x3"
top: "fire8_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire8"
type: "Concat"
bottom: "fire8_expand_1x1"
bottom: "fire8_expand_3x3"
top: "fire8"
concat_param {
axis: 1
}
}
layer {
name: "pool8"
type: "Pooling"
bottom: "fire8"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire9_squeeze"
type: "Convolution"
bottom: "pool8"
top: "fire9_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_squeeze"
type: "ReLU"
bottom: "fire9_squeeze"
top: "fire9_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9_expand_1x1"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_1x1"
type: "ReLU"
bottom: "fire9_expand_1x1"
top: "fire9_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9_expand_3x3"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_3x3"
type: "ReLU"
bottom: "fire9_expand_3x3"
top: "fire9_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire9"
type: "Concat"
bottom: "fire9_expand_1x1"
bottom: "fire9_expand_3x3"
top: "fire9"
concat_param {
axis: 1
}
}
layer {
name: "conv10"
type: "Convolution"
bottom: "fire9"
top: "conv10"
convolution_param {
num_output: 1000
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.01
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv10"
type: "ReLU"
bottom: "conv10"
top: "conv10"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_squeeze"
type: "Convolution"
bottom: "conv10"
top: "fire10_squeeze"
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_squeeze"
type: "ReLU"
bottom: "fire10_squeeze"
top: "fire10_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_expand_1x1"
type: "Convolution"
bottom: "fire10_squeeze"
top: "fire10_expand_1x1"
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_expand_1x1"
type: "ReLU"
bottom: "fire10_expand_1x1"
top: "fire10_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10_expand_3x3"
type: "Convolution"
bottom: "fire10_squeeze"
top: "fire10_expand_3x3"
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire10_expand_3x3"
type: "ReLU"
bottom: "fire10_expand_3x3"
top: "fire10_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire10"
type: "Concat"
bottom: "fire10_expand_1x1"
bottom: "fire10_expand_3x3"
top: "fire10"
concat_param {
axis: 1
}
}
layer {
name: "fire11_squeeze"
type: "Convolution"
bottom: "fire10"
top: "fire11_squeeze"
convolution_param {
num_output: 96
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_squeeze"
type: "ReLU"
bottom: "fire11_squeeze"
top: "fire11_squeeze"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11_expand_1x1"
type: "Convolution"
bottom: "fire11_squeeze"
top: "fire11_expand_1x1"
convolution_param {
num_output: 384
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_expand_1x1"
type: "ReLU"
bottom: "fire11_expand_1x1"
top: "fire11_expand_1x1"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11_expand_3x3"
type: "Convolution"
bottom: "fire11_squeeze"
top: "fire11_expand_3x3"
convolution_param {
num_output: 384
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire11_expand_3x3"
type: "ReLU"
bottom: "fire11_expand_3x3"
top: "fire11_expand_3x3"
relu_param {
negative_slope: 0.01
}
}
layer {
name: "fire11"
type: "Concat"
bottom: "fire11_expand_1x1"
bottom: "fire11_expand_3x3"
top: "fire11"
concat_param {
axis: 1
}
}
layer {
name: "conv11"
type: "Convolution"
bottom: "fire11"
top: "conv11"
convolution_param {
num_output: 225
kernel_size: 3
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.0001
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "permute"
type: "Permute"
bottom: "conv11"
top: "permute_conv11"
permute_param {
order: 0 # N
order: 2 # H
order: 3 # W
order: 1 # C
}
}
layer {
name: "slice"
type: "Slice"
bottom: "permute_conv11"
top: "soft_class_reg"
top: "sig_conf_reg"
top: "delta_bbox"
slice_param {
axis: 3
slice_point: 180 # anchors_per_grid * classes_
slice_point: 189 # anchors_per_grid * (classes_ + 1)
}
}
layer {
name: "reshape"
type: "Reshape"
bottom: "soft_class_reg"
top: "reshape_soft_class_reg"
reshape_param {
shape {
dim: 0 # batch_size
dim: 4761 # H*W*anchors_per_grid
dim: 20 # No. of classes
}
}
}
layer {
name: "softmax"
type: "Softmax"
bottom: "reshape_soft_class_reg"
top: "class_scores"
softmax_param {
axis: 2
}
}
layer {
name: "sigmoid"
type: "Sigmoid"
bottom: "sig_conf_reg"
top: "conf_scores"
}
layer {
name: "loss"
type: "SqueezeDetLoss"
bottom: "class_scores"
bottom: "conf_scores"
bottom: "delta_bbox"
bottom: "bbox"
top: "loss"
squeezedet_param {
engine: CAFFE
classes: 20
anchors_per_grid: 9
anchor_shapes: 377
anchor_shapes: 371
anchor_shapes: 64
anchor_shapes: 118
anchor_shapes: 129
anchor_shapes: 326
anchor_shapes: 172
anchor_shapes: 126
anchor_shapes: 34
anchor_shapes: 46
anchor_shapes: 353
anchor_shapes: 204
anchor_shapes: 89
anchor_shapes: 214
anchor_shapes: 249
anchor_shapes: 361
anchor_shapes: 209
anchor_shapes: 239
pos_conf: 75
neg_conf: 100
lambda_bbox: 5
lambda_conf: 1
}
}
# SqueezeNet architecture for image classification on ImageNet dataset
name: "SqueezeNet"
input: "data"
input_dim: 1
input_dim: 3
input_dim: 416
input_dim: 416
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire2_squeeze"
type: "Convolution"
bottom: "pool1"
top: "fire2_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_squeeze"
top: "fire2_squeeze"
}
layer {
name: "fire2_expand_1x1"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_expand_1x1"
top: "fire2_expand_1x1"
}
layer {
name: "fire2_expand_3x3"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_expand_3x3"
top: "fire2_expand_3x3"
}
layer {
name: "fire2"
type: "Concat"
bottom: "fire2_expand_1x1"
bottom: "fire2_expand_3x3"
top: "fire2"
concat_param {
axis: 1
}
}
layer {
name: "fire3_squeeze"
type: "Convolution"
bottom: "fire2"
top: "fire3_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_squeeze"
top: "fire3_squeeze"
}
layer {
name: "fire3_expand_1x1"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_expand_1x1"
top: "fire3_expand_1x1"
}
layer {
name: "fire3_expand_3x3"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_expand_3x3"
top: "fire3_expand_3x3"
}
layer {
name: "fire3"
type: "Concat"
bottom: "fire3_expand_1x1"
bottom: "fire3_expand_3x3"
top: "fire3"
concat_param {
axis: 1
}
}
layer {
name: "fire4_squeeze"
type: "Convolution"
bottom: "fire3"
top: "fire4_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_squeeze"
top: "fire4_squeeze"
}
layer {
name: "fire4_expand_1x1"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_expand_1x1"
top: "fire4_expand_1x1"
}
layer {
name: "fire4_expand_3x3"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_expand_3x3"
top: "fire4_expand_3x3"
}
layer {
name: "fire4"
type: "Concat"
bottom: "fire4_expand_1x1"
bottom: "fire4_expand_3x3"
top: "fire4"
concat_param {
axis: 1
}
}
layer {
name: "pool4"
type: "Pooling"
bottom: "fire4"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire5_squeeze"
type: "Convolution"
bottom: "pool4"
top: "fire5_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_squeeze"
top: "fire5_squeeze"
}
layer {
name: "fire5_expand_1x1"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_expand_1x1"
top: "fire5_expand_1x1"
}
layer {
name: "fire5_expand_3x3"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_expand_3x3"
top: "fire5_expand_3x3"
}
layer {
name: "fire5"
type: "Concat"
bottom: "fire5_expand_1x1"
bottom: "fire5_expand_3x3"
top: "fire5"
concat_param {
axis: 1
}
}
layer {
name: "fire6_squeeze"
type: "Convolution"
bottom: "fire5"
top: "fire6_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_squeeze"
top: "fire6_squeeze"
}
layer {
name: "fire6_expand_1x1"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_expand_1x1"
top: "fire6_expand_1x1"
}
layer {
name: "fire6_expand_3x3"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_expand_3x3"
top: "fire6_expand_3x3"
}
layer {
name: "fire6"
type: "Concat"
bottom: "fire6_expand_1x1"
bottom: "fire6_expand_3x3"
top: "fire6"
concat_param {
axis: 1
}
}
layer {
name: "fire7_squeeze"
type: "Convolution"
bottom: "fire6"
top: "fire7_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_squeeze"
top: "fire7_squeeze"
}
layer {
name: "fire7_expand_1x1"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_expand_1x1"
top: "fire7_expand_1x1"
}
layer {
name: "fire7_expand_3x3"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_expand_3x3"
top: "fire7_expand_3x3"
}
layer {
name: "fire7"
type: "Concat"
bottom: "fire7_expand_1x1"
bottom: "fire7_expand_3x3"
top: "fire7"
concat_param {
axis: 1
}
}
layer {
name: "fire8_squeeze"
type: "Convolution"
bottom: "fire7"
top: "fire8_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_squeeze"
top: "fire8_squeeze"
}
layer {
name: "fire8_expand_1x1"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_expand_1x1"
top: "fire8_expand_1x1"
}
layer {
name: "fire8_expand_3x3"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_expand_3x3"
top: "fire8_expand_3x3"
}
layer {
name: "fire8"
type: "Concat"
bottom: "fire8_expand_1x1"
bottom: "fire8_expand_3x3"
top: "fire8"
concat_param {
axis: 1
}
}
layer {
name: "pool8"
type: "Pooling"
bottom: "fire8"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire9_squeeze"
type: "Convolution"
bottom: "pool8"
top: "fire9_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_squeeze"
top: "fire9_squeeze"
}
layer {
name: "fire9_expand_1x1"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_expand_1x1"
top: "fire9_expand_1x1"
}
layer {
name: "fire9_expand_3x3"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_expand_3x3"
top: "fire9_expand_3x3"
}
layer {
name: "fire9"
type: "Concat"
bottom: "fire9_expand_1x1"
bottom: "fire9_expand_3x3"
top: "fire9"
concat_param {
axis: 1
}
}
layer {
name: "conv10"
type: "Convolution"
bottom: "fire9"
top: "conv10"
convolution_param {
num_output: 1000
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.01
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv10"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "conv10"
top: "conv10"
}
layer {
name: "pool10"
type: "Pooling"
bottom: "conv10"
top: "pool10"
pooling_param {
pool: AVE
global_pooling: true
}
}
layer {
name: "predictions"
type: "Softmax"
bottom: "pool10"
top: "predictions"
softmax_param {
axis: 1
}
}
# Solver for SqueezeNet Model
test_iter: 1000
test_interval: 1000
base_lr: 0.03
display: 1
max_iter: 1500000
lr_policy: "step"
gamma: 0.5
stepsize: 100000
momentum: 0.9
weight_decay: 0.0002
snapshot: 1000
snapshot_prefix: "snapshot"
solver_mode: GPU
net: "SqueezeNet_train_test.prototxt"
random_seed: 42
average_loss: 80
# SqueezeNet architecture for image classification on ImageNet dataset
name: "SqueezeNet"
layer {
name: "ImageNet"
type: "Data"
top: "data"
top: "label"
transform_param {
crop_size: 227
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "ImageNet_train_lmdb"
batch_size: 64
backend: LMDB
}
include {
phase: TRAIN
}
}
layer {
name: "ImageNet"
type: "Data"
top: "data"
top: "label"
transform_param {
crop_size: 227
mean_value: 104
mean_value: 117
mean_value: 123
}
data_param {
source: "ImageNet_val_lmdb"
batch_size: 5
backend: LMDB
}
include {
phase: TEST
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
convolution_param {
num_output: 96
kernel_size: 7
stride: 2
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire2_squeeze"
type: "Convolution"
bottom: "pool1"
top: "fire2_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_squeeze"
top: "fire2_squeeze"
}
layer {
name: "fire2_expand_1x1"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_expand_1x1"
top: "fire2_expand_1x1"
}
layer {
name: "fire2_expand_3x3"
type: "Convolution"
bottom: "fire2_squeeze"
top: "fire2_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire2_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire2_expand_3x3"
top: "fire2_expand_3x3"
}
layer {
name: "fire2"
type: "Concat"
bottom: "fire2_expand_1x1"
bottom: "fire2_expand_3x3"
top: "fire2"
concat_param {
axis: 1
}
}
layer {
name: "fire3_squeeze"
type: "Convolution"
bottom: "fire2"
top: "fire3_squeeze"
convolution_param {
num_output: 16
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_squeeze"
top: "fire3_squeeze"
}
layer {
name: "fire3_expand_1x1"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_1x1"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_expand_1x1"
top: "fire3_expand_1x1"
}
layer {
name: "fire3_expand_3x3"
type: "Convolution"
bottom: "fire3_squeeze"
top: "fire3_expand_3x3"
convolution_param {
num_output: 64
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire3_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire3_expand_3x3"
top: "fire3_expand_3x3"
}
layer {
name: "fire3"
type: "Concat"
bottom: "fire3_expand_1x1"
bottom: "fire3_expand_3x3"
top: "fire3"
concat_param {
axis: 1
}
}
layer {
name: "fire4_squeeze"
type: "Convolution"
bottom: "fire3"
top: "fire4_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_squeeze"
top: "fire4_squeeze"
}
layer {
name: "fire4_expand_1x1"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_expand_1x1"
top: "fire4_expand_1x1"
}
layer {
name: "fire4_expand_3x3"
type: "Convolution"
bottom: "fire4_squeeze"
top: "fire4_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire4_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire4_expand_3x3"
top: "fire4_expand_3x3"
}
layer {
name: "fire4"
type: "Concat"
bottom: "fire4_expand_1x1"
bottom: "fire4_expand_3x3"
top: "fire4"
concat_param {
axis: 1
}
}
layer {
name: "pool4"
type: "Pooling"
bottom: "fire4"
top: "pool4"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire5_squeeze"
type: "Convolution"
bottom: "pool4"
top: "fire5_squeeze"
convolution_param {
num_output: 32
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_squeeze"
top: "fire5_squeeze"
}
layer {
name: "fire5_expand_1x1"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_1x1"
convolution_param {
num_output: 128
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_expand_1x1"
top: "fire5_expand_1x1"
}
layer {
name: "fire5_expand_3x3"
type: "Convolution"
bottom: "fire5_squeeze"
top: "fire5_expand_3x3"
convolution_param {
num_output: 128
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire5_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire5_expand_3x3"
top: "fire5_expand_3x3"
}
layer {
name: "fire5"
type: "Concat"
bottom: "fire5_expand_1x1"
bottom: "fire5_expand_3x3"
top: "fire5"
concat_param {
axis: 1
}
}
layer {
name: "fire6_squeeze"
type: "Convolution"
bottom: "fire5"
top: "fire6_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_squeeze"
top: "fire6_squeeze"
}
layer {
name: "fire6_expand_1x1"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_expand_1x1"
top: "fire6_expand_1x1"
}
layer {
name: "fire6_expand_3x3"
type: "Convolution"
bottom: "fire6_squeeze"
top: "fire6_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire6_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire6_expand_3x3"
top: "fire6_expand_3x3"
}
layer {
name: "fire6"
type: "Concat"
bottom: "fire6_expand_1x1"
bottom: "fire6_expand_3x3"
top: "fire6"
concat_param {
axis: 1
}
}
layer {
name: "fire7_squeeze"
type: "Convolution"
bottom: "fire6"
top: "fire7_squeeze"
convolution_param {
num_output: 48
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_squeeze"
top: "fire7_squeeze"
}
layer {
name: "fire7_expand_1x1"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_1x1"
convolution_param {
num_output: 192
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_expand_1x1"
top: "fire7_expand_1x1"
}
layer {
name: "fire7_expand_3x3"
type: "Convolution"
bottom: "fire7_squeeze"
top: "fire7_expand_3x3"
convolution_param {
num_output: 192
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire7_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire7_expand_3x3"
top: "fire7_expand_3x3"
}
layer {
name: "fire7"
type: "Concat"
bottom: "fire7_expand_1x1"
bottom: "fire7_expand_3x3"
top: "fire7"
concat_param {
axis: 1
}
}
layer {
name: "fire8_squeeze"
type: "Convolution"
bottom: "fire7"
top: "fire8_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_squeeze"
top: "fire8_squeeze"
}
layer {
name: "fire8_expand_1x1"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_expand_1x1"
top: "fire8_expand_1x1"
}
layer {
name: "fire8_expand_3x3"
type: "Convolution"
bottom: "fire8_squeeze"
top: "fire8_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire8_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire8_expand_3x3"
top: "fire8_expand_3x3"
}
layer {
name: "fire8"
type: "Concat"
bottom: "fire8_expand_1x1"
bottom: "fire8_expand_3x3"
top: "fire8"
concat_param {
axis: 1
}
}
layer {
name: "pool8"
type: "Pooling"
bottom: "fire8"
top: "pool8"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fire9_squeeze"
type: "Convolution"
bottom: "pool8"
top: "fire9_squeeze"
convolution_param {
num_output: 64
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_squeeze"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_squeeze"
top: "fire9_squeeze"
}
layer {
name: "fire9_expand_1x1"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_1x1"
convolution_param {
num_output: 256
kernel_size: 1
stride: 1
pad: 0
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_1x1"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_expand_1x1"
top: "fire9_expand_1x1"
}
layer {
name: "fire9_expand_3x3"
type: "Convolution"
bottom: "fire9_squeeze"
top: "fire9_expand_3x3"
convolution_param {
num_output: 256
kernel_size: 3
stride: 1
pad: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_fire9_expand_3x3"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "fire9_expand_3x3"
top: "fire9_expand_3x3"
}
layer {
name: "fire9"
type: "Concat"
bottom: "fire9_expand_1x1"
bottom: "fire9_expand_3x3"
top: "fire9"
concat_param {
axis: 1
}
}
layer {
name: "conv10"
type: "Convolution"
bottom: "fire9"
top: "conv10"
convolution_param {
num_output: 1000
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
mean: 0.0
std: 0.01
}
bias_filler {
type: "constant"
value: 0.01
}
}
}
layer {
name: "rect_conv10"
type: "ReLU"
relu_param {
negative_slope: 0.01
}
bottom: "conv10"
top: "conv10"
}
layer {
name: "pool10"
type: "Pooling"
bottom: "conv10"
top: "pool10"
pooling_param {
pool: AVE
global_pooling: true
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "pool10"
bottom: "label"
top: "loss"
include {
phase: TRAIN
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "pool10"
bottom: "label"
top: "accuracy"
}
layer {
name: "accuracy_top_5"
type: "Accuracy"
bottom: "pool10"
bottom: "label"
top: "accuracy_top_5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/dnn.hpp>
#include <iostream>
#include <cstdlib>
int main(int argc, char **argv)
{
if (argc < 4)
{
std::cerr << "Usage " << argv[0] << ": "
<< "<model-definition-file> " << " "
<< "<model-weights-file> " << " "
<< "<test-image>\n";
return -1;
}
cv::String model_prototxt = argv[1];
cv::String model_binary = argv[2];
cv::String test_image = argv[3];
cv::dnn::Net net = cv::dnn::readNetFromCaffe(model_prototxt, model_binary);
if (net.empty())
{
std::cerr << "Couldn't load the model !\n";
return -2;
}
cv::Mat img = cv::imread(test_image);
if (img.empty())
{
std::cerr << "Couldn't load image: " << test_image << "\n";
return -3;
}
cv::Mat input_blob = cv::dnn::blobFromImage(
img, 1.0, cv::Size(416, 416), cv::Scalar(104, 117, 123), false);
cv::Mat prob;
cv::TickMeter t;
net.setInput(input_blob);
t.start();
prob = net.forward("predictions");
t.stop();
int prob_size[3] = {1000, 1, 1};
cv::Mat prob_data(3, prob_size, CV_32F, prob.ptr<float>(0));
double max_prob = -1.0;
int class_idx = -1;
for (int idx = 0; idx < prob.size[1]; ++idx)
{
double current_prob = prob_data.at<float>(idx, 0, 0);
if (current_prob > max_prob)
{
max_prob = current_prob;
class_idx = idx;
}
}
std::cout << "Best class Index: " << class_idx << "\n";
std::cout << "Time taken: " << t.getTimeSec() << "\n";
std::cout << "Probability: " << max_prob * 100.0<< "\n";
return 0;
}
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
#include <fstream>
#include <iostream>
#include <cstdlib>
#include <opencv2/core_detect.hpp>
using namespace cv;
using namespace std;
using namespace cv::dnn;
using namespace cv::dnn_objdetect;
int main(int argc, char **argv)
{
if (argc < 4)
{
std::cerr << "Usage " << argv[0] << ": "
<< "<model-definition-file> "
<< "<model-weights-file> "
<< "<test-image> "
<< "<threshold>(optional)\n";
return -1;
}
std::string model_prototxt = argv[1];
std::string model_binary = argv[2];
std::string test_input_image = argv[3];
double threshold = 0.7;
if (argc == 5)
{
threshold = atof(argv[4]);
if (threshold > 1.0 || threshold < 0.0)
{
std::cerr << "Threshold should belong to [0, 1]\n";
return -1;
}
}
// Load the network
std::cout << "Loading the network...\n";
Net net = dnn::readNetFromCaffe(model_prototxt, model_binary);
if (net.empty())
{
std::cerr << "Couldn't load the model !\n";
return -2;
}
else
{
std::cout << "Done loading the network !\n\n";
}
// Load the test image
Mat img = cv::imread(test_input_image);
Mat original_img(img);
if (img.empty())
{
std::cerr << "Couldn't load image: " << test_input_image << "\n";
return -3;
}
cv::namedWindow("Initial Image", WINDOW_AUTOSIZE);
cv::imshow("Initial Image", img);
cv::resize(img, img, cv::Size(416, 416));
Mat img_copy(img);
img.convertTo(img, CV_32FC3);
Mat input_blob = blobFromImage(img, 1.0, Size(), cv::Scalar(104, 117, 123), false);
// Set the input blob
// Set the output layers
std::cout << "Getting the output of all the three blobs...\n";
std::vector<Mat> outblobs(3);
std::vector<cv::String> out_layers;
out_layers.push_back("slice");
out_layers.push_back("softmax");
out_layers.push_back("sigmoid");
// Bbox delta blob
std::vector<Mat> temp_blob;
net.setInput(input_blob);
cv::TickMeter t;
t.start();
net.forward(temp_blob, out_layers[0]);
t.stop();
outblobs[0] = temp_blob[2];
// class_scores blob
net.setInput(input_blob);
t.start();
outblobs[1] = net.forward(out_layers[1]);
t.stop();
// conf_scores blob
net.setInput(input_blob);
t.start();
outblobs[2] = net.forward(out_layers[2]);
t.stop();
// Check that the blobs are valid
for (size_t i = 0; i < outblobs.size(); ++i)
{
if (outblobs[i].empty())
{
std::cerr << "Blob: " << i << " is empty !\n";
}
}
int delta_bbox_size[3] = {23, 23, 36};
Mat delta_bbox(3, delta_bbox_size, CV_32F, outblobs[0].ptr<float>());
int class_scores_size[2] = {4761, 20};
Mat class_scores(2, class_scores_size, CV_32F, outblobs[1].ptr<float>());
int conf_scores_size[3] = {23, 23, 9};
Mat conf_scores(3, conf_scores_size, CV_32F, outblobs[2].ptr<float>());
InferBbox inf(delta_bbox, class_scores, conf_scores);
inf.filter(threshold);
double average_time = t.getTimeSec() / t.getCounter();
std::cout << "\nTotal objects detected: " << inf.detections.size()
<< " in " << average_time << " seconds\n";
std::cout << "------\n";
float x_ratio = (float)original_img.cols / img_copy.cols;
float y_ratio = (float)original_img.rows / img_copy.rows;
for (size_t i = 0; i < inf.detections.size(); ++i)
{
int xmin = inf.detections[i].xmin;
int ymin = inf.detections[i].ymin;
int xmax = inf.detections[i].xmax;
int ymax = inf.detections[i].ymax;
cv::String class_name = inf.detections[i].label_name;
std::cout << "Class: " << class_name << "\n"
<< "Probability: " << inf.detections[i].class_prob << "\n"
<< "Co-ordinates: " << inf.detections[i].xmin << " "
<< inf.detections[i].ymin << " "
<< inf.detections[i].xmax << " "
<< inf.detections[i].ymax << "\n";
std::cout << "------\n";
// Draw the corresponding bounding box(s)
cv::rectangle(original_img, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
cv::Point((int)(xmax * x_ratio), (int)(ymax * y_ratio)), cv::Scalar(255, 0, 0), 2);
cv::putText(original_img, class_name, cv::Point((int)(xmin * x_ratio), (int)(ymin * y_ratio)),
cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(255, 0, 0), 1);
}
try
{
cv::namedWindow("Final Detections", WINDOW_AUTOSIZE);
cv::imshow("Final Detections", original_img);
cv::imwrite("image.png", original_img);
cv::waitKey(0);
}
catch (const char* msg)
{
std::cerr << msg << "\n";
return -4;
}
return 0;
}
import argparse
import sys
import os
import time
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
def k_means(K, data, max_iter, n_jobs, image_file):
X = np.array(data)
np.random.shuffle(X)
begin = time.time()
print 'Running kmeans'
kmeans = KMeans(n_clusters=K, max_iter=max_iter, n_jobs=n_jobs, verbose=1).fit(X)
print 'K-Means took {} seconds to complete'.format(time.time()-begin)
step_size = 0.2
xmin, xmax = X[:, 0].min()-1, X[:, 0].max()+1
ymin, ymax = X[:, 1].min()-1, X[:, 1].max()+1
xx, yy = np.meshgrid(np.arange(xmin, xmax, step_size), np.arange(ymin, ymax, step_size))
preds = kmeans.predict(np.c_[xx.ravel(), yy.ravel()])
preds = preds.reshape(xx.shape)
plt.figure()
plt.clf()
plt.imshow(preds, interpolation='nearest', extent=(xx.min(), xx.max(), yy.min(), yy.max()), cmap=plt.cm.Paired, aspect='auto', origin='lower')
plt.plot(X[:, 0], X[:, 1], 'k.', markersize=2)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=169, linewidths=5, color='r', zorder=10)
plt.title("Anchor shapes generated using K-Means")
plt.xlim(xmin, xmax)
plt.ylim(ymin, ymax)
print 'Mean centroids are:'
for i, center in enumerate(centroids):
print '{}: {}, {}'.format(i, center[0], center[1])
# plt.xticks(())
# plt.yticks(())
plt.show()
def pre_process(directory, data_list):
if not os.path.exists(directory):
print "Path {} doesn't exist".format(directory)
return
files = os.listdir(directory)
print 'Loading data...'
for i, f in enumerate(files):
# Progress bar
sys.stdout.write('\r')
percentage = (i+1.0) / len(files)
progress = int(percentage * 30)
bar = [progress*'=', ' '*(29-progress), percentage*100]
sys.stdout.write('[{}>{}] {:.0f}%'.format(*bar))
sys.stdout.flush()
with open(directory+"/"+f, 'r') as ann:
l = ann.readline()
l = l.rstrip()
l = l.split(' ')
l = [float(i) for i in l]
if len(l) % 5 != 0:
sys.stderr.write('File {} contains incorrect number of annotations'.format(f))
return
num_objs = len(l) / 5
for obj in range(num_objs):
xmin = l[obj * 5 + 0]
ymin = l[obj * 5 + 1]
xmax = l[obj * 5 + 2]
ymax = l[obj * 5 + 3]
w = xmax - xmin
h = ymax - ymin
data_list.append([w, h])
if w > 1000 or h > 1000:
sys.stdout.write("[{}, {}]".format(w, h))
sys.stdout.write('\nProcessed {} files containing {} objects'.format(len(files), len(data_list)))
return data_list
def main():
parser = argparse.ArgumentParser("Parse hyperparameters")
parser.add_argument("clusters", help="Number of clusters", type=int)
parser.add_argument("dir", help="Directory containing annotations")
parser.add_argument("image_file", help="File to generate the final cluster of image")
parser.add_argument('-jobs', help="Number of jobs for parallel computation", default=1)
parser.add_argument('-iter', help="Max Iterations to run algorithm for", default=1000)
p = parser.parse_args(sys.argv[1:])
K = p.clusters
directory = p.dir
data_list = []
pre_process(directory, data_list )
sys.stdout.write('\nDone collecting data\n')
k_means(K, data_list, int(p.iter), int(p.jobs), p.image_file)
print 'Done !'
if __name__=='__main__':
try:
main()
except Exception as E:
print E
from skimage import io, transform
from multiprocessing.dummy import Pool as ThreadPool
def rescale(root_new, root_old, img_path, ann_path, out_shape):
try:
img = io.imread(root_old+"/"+img_path)
except Exception as E:
print E
h, w, _ = img.shape
f_h, f_w = float(out_shape)/h, float(out_shape)/w
trans_img = transform.rescale(img, (f_h, f_w))
num_objs = 0
with open(root_old+"/"+ann_path, 'r') as f:
ann = f.readline()
ann = ann.rstrip()
ann = ann.split(' ')
ann = [float(i) for i in ann]
num_objs = len(ann) / 5
for idx in xrange(num_objs):
ann[idx * 5 + 0] = int(f_w * ann[idx * 5 + 0])
ann[idx * 5 + 1] = int(f_h * ann[idx * 5 + 1])
ann[idx * 5 + 2] = int(f_w * ann[idx * 5 + 2])
ann[idx * 5 + 3] = int(f_h * ann[idx * 5 + 3])
# Write the new annotations to file
with open(root_new+"/"+ann_path, 'w') as f_new:
for val in ann:
f_new.write(str(val)+' ')
# Save the new image
io.imwrite(root_new+"/"+img_path, trans_img)
def preprocess():
source = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize/source.txt'
root_old = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012'
root_new = '/users2/Datasets/PASCAL_VOC/VOCdevkit/VOC2012_Resize'
out_shape = 416
with open(source, 'r') as src:
lines = src.readlines()
print 'Processing {} images and annotations'.format(len(lines))
for line in lines:
line = line.rstrip()
line = line.split(' ')
img_path = line[0]
ann_path = line[1]
rescale(root_new, root_old, img_path, ann_path, out_shape)
if __name__ == '__main__':
preprocess()
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#include "precomp.hpp"
#include "opencv2/core_detect.hpp"
namespace cv
{
namespace dnn_objdetect
{
InferBbox::InferBbox(Mat _delta_bbox, Mat _class_scores, Mat _conf_scores)
{
this->delta_bbox = _delta_bbox;
this->class_scores = _class_scores;
this->conf_scores = _conf_scores;
image_width = 416;
image_height = 416;
W = 23;
H = 23;
num_classes = 20;
anchors_per_grid = 9;
anchors = W * H * anchors_per_grid;
intersection_thresh = 0.65;
nms_intersection_thresh = 0.1;
n_top_detections = 64;
epsilon = 1e-7;
anchors_values.resize(anchors);
for (size_t i = 0; i < anchors; ++i)
{
anchors_values[i].resize(4);
}
// Anchor shapes predicted from kmeans clustering
double arr[9][2] = {{377, 371}, {64, 118}, {129, 326},
{172, 126}, {34, 46}, {353, 204},
{89, 214}, {249, 361}, {209, 239}};
for (size_t i = 0; i < anchors_per_grid; ++i)
{
anchor_shapes.push_back(std::make_pair(arr[i][1], arr[i][0]));
}
// Generate the anchor centers
for (size_t x = 1; x < W + 1; ++x) {
double c_x = (x * static_cast<double>(image_width)) / (W+1.0);
for (size_t y = 1; y < H + 1; ++y) {
double c_y = (y * static_cast<double>(image_height)) / (H+1.0);
anchor_center.push_back(std::make_pair(c_x, c_y));
}
}
// Generate the final anchor values
for (size_t i = 0, anchor = 0, j = 0; anchor < anchors; ++anchor)
{
anchors_values[anchor][0] = anchor_center.at(i).first;
anchors_values[anchor][1] = anchor_center.at(i).second;
anchors_values[anchor][2] = anchor_shapes.at(j).first;
anchors_values[anchor][3] = anchor_shapes.at(j).second;
if ((anchor+1) % anchors_per_grid == 0)
{
i += 1;
j = 0;
}
else
{
++j;
}
}
// Map the class index to the corresponding labels
std::string arrs[20] = {"aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair",
"cow", "diningtable", "dog", "horse",
"motorbike", "person", "pottedplant",
"sheep", "sofa", "train", "tvmonitor"};
for (size_t idx = 0; idx < num_classes; ++idx)
{
label_map.push_back(arrs[idx]);
}
}
void InferBbox::filter(double thresh)
{
this->intersection_thresh = thresh;
// Some containers
std::vector<std::vector<double> > transformed_bbox_preds(this->anchors);
std::vector<std::vector<double> > min_max_bboxes(this->anchors);
std::vector<std::vector<double> > final_probs(this->anchors);
for (size_t i = 0; i < this->anchors; ++i)
{
transformed_bbox_preds[i].resize(4);
final_probs[i].resize(num_classes);
min_max_bboxes[i].resize(4);
}
// Transform relative coordinates from ConvDet to bounding box coordinates
transform_bboxes(&transformed_bbox_preds);
// Do the inverse transformation of the predicted bboxes
transform_bboxes_inv(&transformed_bbox_preds, &min_max_bboxes);
// Ensure that the predicted bounding boxes are well within the image
// dimensions
assert_predictions(&min_max_bboxes);
// Compute the final probability values
final_probability_dist(&final_probs);
// Filter the classes of n_top_detections
std::vector<std::vector<double> > top_n_boxes(n_top_detections);
std::vector<size_t> top_n_idxs(n_top_detections);
std::vector<double> top_n_probs(n_top_detections);
for (size_t i = 0; i < n_top_detections; ++i)
{
top_n_boxes[i].resize(4);
}
filter_top_n(&final_probs, &min_max_bboxes, top_n_boxes,
top_n_idxs, top_n_probs);
// Apply Non-Maximal-Supression to the n_top_detections
nms_wrapper(top_n_boxes, top_n_idxs, top_n_probs);
}
void InferBbox::transform_bboxes(std::vector<std::vector<double> > *bboxes)
{
for (unsigned int h = 0; h < H; ++h)
{
for (unsigned int w = 0; w < W; ++w)
{
for (unsigned int anchor = 0; anchor < anchors_per_grid; ++anchor)
{
const int anchor_idx = (h * W + w) * anchors_per_grid + anchor;
double delta_x = this->delta_bbox.at<float>(h, w, anchor * 4 + 0);
double delta_y = this->delta_bbox.at<float>(h, w, anchor * 4 + 1);
double delta_h = this->delta_bbox.at<float>(h, w, anchor * 4 + 2);
double delta_w = this->delta_bbox.at<float>(h, w, anchor * 4 + 3);
(*bboxes)[anchor_idx][0] = this->anchors_values[anchor_idx][0] +
this->anchors_values[anchor_idx][3] * delta_x;
(*bboxes)[anchor_idx][1] = this->anchors_values[anchor_idx][1] +
this->anchors_values[anchor_idx][2] * delta_y;;
(*bboxes)[anchor_idx][2] =
this->anchors_values[anchor_idx][2] * exp(delta_h);
(*bboxes)[anchor_idx][3] =
this->anchors_values[anchor_idx][3] * exp(delta_w);
}
}
}
}
void InferBbox::final_probability_dist(
std::vector<std::vector<double> > *final_probs)
{
for (unsigned int h = 0; h < H; ++h)
{
for (unsigned int w = 0; w < W; ++w)
{
for (unsigned int ch = 0; ch < anchors_per_grid * num_classes; ++ch)
{
const int anchor_idx =
(h * W + w) * anchors_per_grid + ch / num_classes;
double pr_object =
conf_scores.at<float>(h, w, ch / num_classes);
double pr_class_idx =
class_scores.at<float>(anchor_idx, ch % num_classes);
(*final_probs)[anchor_idx][ch % num_classes] =
pr_object * pr_class_idx;
}
}
}
}
void InferBbox::transform_bboxes_inv(
std::vector<std::vector<double> > *pre,
std::vector<std::vector<double> > *post)
{
for (size_t anchor = 0; anchor < anchors; ++anchor)
{
double c_x = (*pre)[anchor][0];
double c_y = (*pre)[anchor][1];
double b_h = (*pre)[anchor][2];
double b_w = (*pre)[anchor][3];
(*post)[anchor][0] = c_x - b_w / 2.0;
(*post)[anchor][1] = c_y - b_h / 2.0;
(*post)[anchor][2] = c_x + b_w / 2.0;
(*post)[anchor][3] = c_y + b_h / 2.0;
}
}
void InferBbox::assert_predictions(std::vector<std::vector<double> >
*min_max_boxes)
{
for (size_t anchor = 0; anchor < anchors; ++anchor)
{
double p_xmin = (*min_max_boxes)[anchor][0];
double p_ymin = (*min_max_boxes)[anchor][1];
double p_xmax = (*min_max_boxes)[anchor][2];
double p_ymax = (*min_max_boxes)[anchor][3];
(*min_max_boxes)[anchor][0] = std::min(std::max(
static_cast<double>(0.0), p_xmin), image_width -
static_cast<double>(1.0));
(*min_max_boxes)[anchor][1] = std::min(std::max(
static_cast<double>(0.0), p_ymin), image_height -
static_cast<double>(1.0));
(*min_max_boxes)[anchor][2] = std::max(std::min(
image_width - static_cast<double>(1.0), p_xmax),
static_cast<double>(0.0));
(*min_max_boxes)[anchor][3] = std::max(std::min(
image_height - static_cast<double>(1.0), p_ymax),
static_cast<double>(0.0));
}
}
void InferBbox::filter_top_n(std::vector<std::vector<double> >
*probs, std::vector<std::vector<double> > *boxes,
std::vector<std::vector<double> > &top_n_boxes,
std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs)
{
std::vector<double> max_class_probs((*probs).size());
std::vector<size_t> args((*probs).size());
for (unsigned int box = 0; box < (*boxes).size(); ++box)
{
size_t _prob_idx =
std::max_element((*probs)[box].begin(),
(*probs)[box].end()) - (*probs)[box].begin();
max_class_probs[box] = (*probs)[box][_prob_idx];
}
std::vector<std::pair<double, size_t> > temp_sort(max_class_probs.size());
for (size_t tidx = 0; tidx < max_class_probs.size(); ++tidx)
{
temp_sort[tidx] = std::make_pair(max_class_probs[tidx],
static_cast<size_t>(tidx));
}
std::sort(temp_sort.begin(), temp_sort.end(), InferBbox::comparator);
for (size_t idx = 0; idx < temp_sort.size(); ++idx)
{
args[idx] = temp_sort[idx].second;
}
// Get n_top_detections
std::vector<size_t> top_n_order(args.begin(),
args.begin() + n_top_detections);
// Have a separate copy of all the n_top_detections
for (size_t n = 0; n < n_top_detections; ++n)
{
top_n_probs[n] = max_class_probs[top_n_order[n]];
top_n_idxs[n] =
std::max_element((*probs)[top_n_order[n]].begin(),
(*probs)[top_n_order[n]].end()) -
(*probs)[top_n_order[n]].begin();
for (size_t i = 0; i < 4; ++i)
{
top_n_boxes[n][i] = (*boxes)[top_n_order[n]][i];
}
}
}
void InferBbox::nms_wrapper(std::vector<std::vector<double> >
&top_n_boxes, std::vector<size_t> &top_n_idxs,
std::vector<double> &top_n_probs)
{
for (size_t c = 0; c < this->num_classes; ++c)
{
std::vector<size_t> idxs_per_class;
for (size_t n = 0; n < n_top_detections; ++n)
{
if (top_n_idxs[n] == c)
{
idxs_per_class.push_back(n);
}
}
// Just continue in case there are no objects of this class
if (idxs_per_class.size() == 0)
{
continue;
}
// Process per class detections
std::vector<std::vector<double> > boxes_per_class(idxs_per_class.size());
std::vector<double> probs_per_class(idxs_per_class.size());
std::vector<bool> keep_per_class;
for (std::vector<size_t>::iterator itr = idxs_per_class.begin();
itr != idxs_per_class.end(); ++itr)
{
size_t idx = itr - idxs_per_class.begin();
probs_per_class[idx] = top_n_probs[*itr];
for (size_t b = 0; b < 4; ++b)
{
boxes_per_class[idx].push_back(top_n_boxes[*itr][b]);
}
}
keep_per_class =
non_maximal_suppression(&boxes_per_class, &probs_per_class);
for (std::vector<bool>::iterator itr = keep_per_class.begin();
itr != keep_per_class.end(); ++itr)
{
size_t idx = itr - keep_per_class.begin();
if (*itr && probs_per_class[idx] > this->intersection_thresh)
{
dnn_objdetect::object new_detection;
new_detection.class_idx = c;
new_detection.label_name = this->label_map[c];
new_detection.xmin = (int)boxes_per_class[idx][0];
new_detection.ymin = (int)boxes_per_class[idx][1];
new_detection.xmax = (int)boxes_per_class[idx][2];
new_detection.ymax = (int)boxes_per_class[idx][3];
new_detection.class_prob = probs_per_class[idx];
this->detections.push_back(new_detection);
}
}
}
}
std::vector<bool> InferBbox::non_maximal_suppression(
std::vector<std::vector<double> > *boxes, std::vector<double>
*probs)
{
std::vector<bool> keep(((*probs).size()));
std::fill(keep.begin(), keep.end(), true);
std::vector<size_t> prob_args_sorted((*probs).size());
std::vector<std::pair<double, size_t> > temp_sort((*probs).size());
for (size_t tidx = 0; tidx < (*probs).size(); ++tidx)
{
temp_sort[tidx] = std::make_pair((*probs)[tidx],
static_cast<size_t>(tidx));
}
std::sort(temp_sort.begin(), temp_sort.end(), InferBbox::comparator);
for (size_t idx = 0; idx < temp_sort.size(); ++idx)
{
prob_args_sorted[idx] = temp_sort[idx].second;
}
for (std::vector<size_t>::iterator itr = prob_args_sorted.begin();
itr != prob_args_sorted.end()-1; ++itr)
{
size_t idx = itr - prob_args_sorted.begin();
std::vector<double> iou_(prob_args_sorted.size() - idx - 1);
std::vector<std::vector<double> > temp_boxes(iou_.size());
for (size_t bb = 0; bb < temp_boxes.size(); ++bb)
{
std::vector<double> temp_box(4);
for (size_t b = 0; b < 4; ++b)
{
temp_box[b] = (*boxes)[prob_args_sorted[idx + bb + 1]][b];
}
temp_boxes[bb] = temp_box;
}
intersection_over_union(&temp_boxes,
&(*boxes)[prob_args_sorted[idx]], &iou_);
for (std::vector<double>::iterator _itr = iou_.begin();
_itr != iou_.end(); ++_itr)
{
size_t iou_idx = _itr - iou_.begin();
if (*_itr > nms_intersection_thresh)
{
keep[prob_args_sorted[idx+iou_idx+1]] = false;
}
}
}
return keep;
}
void InferBbox::intersection_over_union(std::vector<std::vector<double> >
*boxes, std::vector<double> *base_box, std::vector<double> *iou)
{
double g_xmin = (*base_box)[0];
double g_ymin = (*base_box)[1];
double g_xmax = (*base_box)[2];
double g_ymax = (*base_box)[3];
double base_box_w = g_xmax - g_xmin;
double base_box_h = g_ymax - g_ymin;
for (size_t b = 0; b < (*boxes).size(); ++b)
{
double xmin = std::max((*boxes)[b][0], g_xmin);
double ymin = std::max((*boxes)[b][1], g_ymin);
double xmax = std::min((*boxes)[b][2], g_xmax);
double ymax = std::min((*boxes)[b][3], g_ymax);
// Intersection
double w = std::max(static_cast<double>(0.0), xmax - xmin);
double h = std::max(static_cast<double>(0.0), ymax - ymin);
// Union
double test_box_w = (*boxes)[b][2] - (*boxes)[b][0];
double test_box_h = (*boxes)[b][3] - (*boxes)[b][1];
double inter_ = w * h;
double union_ = test_box_h * test_box_w + base_box_h * base_box_w - inter_;
(*iou)[b] = inter_ / (union_ + epsilon);
}
}
}
}
// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.
#ifndef __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
#define __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
#include <iostream>
#include <vector>
#include <memory>
#include <string>
#include <map>
#include <numeric>
#include <algorithm>
#include "opencv2/core.hpp"
#include "opencv2/dnn.hpp"
#endif // __OPENCV_DNN_OBJDETECT_PRECOMP_HPP__
Object Detection using CNNs {#tutorial_dnn_objdetect}
===========================
# Building
Build samples of "dnn_objectect" module. Refer to OpenCV build tutorials for details.
Enable `BUILD_EXAMPLES=ON` CMake option and build these targets (Linux):
- example_dnn_objdetect_image_classification
- example_dnn_objdetect_obj_detect
Download the weights file and model definition file from `opencv_extra/dnn_objdetect`
# Object Detection
```bash
example_dnn_objdetect_obj_detect <model-definition-file> <model-weights-file> <test-image>
```
All the following examples were run on a laptop with `Intel(R) Core(TM)2 i3-4005U CPU @ 1.70GHz` (without GPU).
The model is incredibly fast taking just `0.172091` seconds on an average to predict multiple bounding boxes.
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/aeroplane.jpg
Total objects detected: 1 in 0.168792 seconds
------
Class: aeroplane
Probability: 0.845181
Co-ordinates: 41 116 415 254
------
```
![Train_Dets](images/aero_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/bus.jpg
Total objects detected: 1 in 0.201276 seconds
------
Class: bus
Probability: 0.701829
Co-ordinates: 0 32 415 244
------
```
![Train_Dets](images/bus_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/cat.jpg
Total objects detected: 1 in 0.190335 seconds
------
Class: cat
Probability: 0.703465
Co-ordinates: 34 0 381 282
------
```
![Train_Dets](images/cat_det.jpg)
```bash
<bin_path>/example_dnn_objdetect_obj_detect SqueezeDet_deploy.prototxt SqueezeDet.caffemodel tutorials/images/persons_mutli.jpg
Total objects detected: 2 in 0.169152 seconds
------
Class: person
Probability: 0.737349
Co-ordinates: 160 67 313 363
------
Class: person
Probability: 0.720328
Co-ordinates: 187 198 222 323
------
```
![Train_Dets](images/person_multi_det.jpg)
Go ahead and run the model with other images !
## Changing threshold
By default this model thresholds the detections at confidence of `0.53`. While filtering there are number of bounding boxes which are predicted, you can manually control what gets thresholded by passing the value of optional arguement `threshold` like:
```bash
<bin_path>/example_dnn_objdetect_obj_detect <model-definition-file> <model-weights-file> <test-image> <threshold>
```
Changing the threshold to say `0.0`, produces the following:
![Train_Dets](images/aero_thresh_det.jpg)
That doesn't seem to be that helpful !
# Image Classification
```bash
example_dnn_objdetect_image_classification <model-definition-file> <model-weights-file> <test-image>
```
The size of the model being **4.9MB**, just takes a time of **0.136401** seconds to classify the image.
Running the model on examples produces the following results:
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/aeroplane.jpg
Best class Index: 404
Time taken: 0.137722
Probability: 77.1757
```
Looking at [synset_words.txt](https://raw.githubusercontent.com/opencv/opencv/3.4.0/samples/data/dnn/synset_words.txt), the predicted class belongs to `airliner`
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/cat.jpg
Best class Index: 285
Time taken: 0.136401
Probability: 40.7111
```
This belongs to the class: `Egyptian cat`
```bash
<bin_path>/example_dnn_objdetect_image_classification SqueezeNet_deploy.prototxt SqueezeNet.caffemodel tutorials/images/space_shuttle.jpg
Best class Index: 812
Time taken: 0.137792
Probability: 15.8467
```
This belongs to the class: `space shuttle`
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment