cnn_3dobj.hpp 11.9 KB
Newer Older
Wangyida's avatar
Wangyida committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
/*
By downloading, copying, installing or using the software you agree to this license.
If you do not agree to this license, do not download, install,
copy or use the software.


                          License Agreement
               For Open Source Computer Vision Library
                       (3-clause BSD License)

Copyright (C) 2000-2015, Intel Corporation, all rights reserved.
Copyright (C) 2009-2011, Willow Garage Inc., all rights reserved.
Copyright (C) 2009-2015, NVIDIA Corporation, all rights reserved.
Copyright (C) 2010-2013, Advanced Micro Devices, Inc., all rights reserved.
Copyright (C) 2015, OpenCV Foundation, all rights reserved.
Copyright (C) 2015, Itseez Inc., all rights reserved.
Third party copyrights are property of their respective owners.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

  * Redistributions of source code must retain the above copyright notice,
    this list of conditions and the following disclaimer.

  * Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions and the following disclaimer in the documentation
    and/or other materials provided with the distribution.

  * Neither the names of the copyright holders nor the names of the contributors
    may be used to endorse or promote products derived from this software
    without specific prior written permission.

This software is provided by the copyright holders and contributors "as is" and
any express or implied warranties, including, but not limited to, the implied
warranties of merchantability and fitness for a particular purpose are disclaimed.
In no event shall copyright holders or contributors be liable for any direct,
indirect, incidental, special, exemplary, or consequential damages
(including, but not limited to, procurement of substitute goods or services;
loss of use, data, or profits; or business interruption) however caused
and on any theory of liability, whether in contract, strict liability,
or tort (including negligence or otherwise) arising in any way out of
the use of this software, even if advised of the possibility of such damage.
*/

#ifndef __OPENCV_CNN_3DOBJ_HPP__
#define __OPENCV_CNN_3DOBJ_HPP__
#ifdef __cplusplus
48

Wangyida's avatar
Wangyida committed
49
#include <string>
Wangyida's avatar
Wangyida committed
50 51
#include <fstream>
#include <vector>
Wangyida's avatar
Wangyida committed
52
#include <stdio.h>
53
#include <math.h>
Wangyida's avatar
Wangyida committed
54
#include <iostream>
55 56 57 58
#include <set>
#include <string.h>
#include <stdlib.h>
#include <dirent.h>
59
#define CPU_ONLY
60 61 62 63 64 65 66

#include <caffe/blob.hpp>
#include <caffe/common.hpp>
#include <caffe/net.hpp>
#include <caffe/proto/caffe.pb.h>
#include <caffe/util/io.hpp>

67 68 69 70
#include "opencv2/viz/vizcore.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/highgui/highgui_c.h"
#include "opencv2/imgproc.hpp"
71

72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109
/** @defgroup cnn_3dobj 3D object recognition and pose estimation API

As CNN based learning algorithm shows better performance on the classification issues,
the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation
is a jointed mission aimming at seperate different posed apart in the descriptor form.

In the training stage, we prepare 2D training images generated from our module with their
class label and pose label. We fully exploit the information lies in their labels
by using a triplet and pair-wise jointed loss function in CNN training.

As CNN based learning algorithm shows better performance on the classification issues,
the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation
is a jointed mission aiming at separate different posea apart in the descriptor form.

In the training stage, we prepare 2D training images generated from our module with their
class label and pose label. We fully exploit the information that lies in their labels
by using a triplet and pair-wise jointed loss function in CNN training.

Both class and pose label are in consideration in the triplet loss. The loss score
will be smaller when features from the same class and same pose is more similar
and features from different classes or different poses will lead to a much larger loss score.

This loss is also jointed with a pair wise component to make sure the loss is never be zero
and have a restriction on the model scale.

About the training and feature extraction process, it is a rough implementation by using OpenCV
and Caffe from the idea of Paul Wohlhart. The principal purpose of this API is constructing
a well labeled database from .ply models for CNN training with triplet loss and extracting features
with the constructed model for prediction or other purpose of pattern recognition, algorithms into two main Class:

**icoSphere: methods belonging to this class generates 2D images from a 3D model, together with their class and pose from camera view labels.

**descriptorExtractor: methods belonging to this class extract descriptors from 2D images which is
discriminant on category prediction and pose estimation.

@note This API need Caffe with triplet version which is designed for this module
<https://github.com/Wangyida/caffe/tree/cnn_triplet>.

Wangyida's avatar
Wangyida committed
110 111 112 113 114 115 116 117 118
*/
namespace cv
{
namespace cnn_3dobj
{

//! @addtogroup cnn_3dobj
//! @{

119 120 121
/** @brief Icosohedron based camera view data generator.
 The class create some sphere views of camera towards a 3D object meshed from .ply files @cite hinterstoisser2008panter .
 */
Wangyida's avatar
Wangyida committed
122

123
/************************************ Data Generation Class ************************************/
124
    class CV_EXPORTS_W icoSphere
125 126
    {
        private:
127 128 129
        /** @brief X position of one base point on the initial Icosohedron sphere,
          Y is set to be 0 as default.
         */
130
        float X;
131 132 133

        /** @brief Z position of one base point on the initial Icosohedron sphere.
         */
134 135
        float Z;

136 137
        /** @brief A threshold for the dupicated points elimination.
         */
138
        float diff;
139 140 141 142 143 144 145

        /** @brief Temp camera position for duplex position elimination.
         */
        std::vector<cv::Point3d> CameraPos_temp;

        /** @brief Make all view points having the same distance from the focal point used by the camera view.
         */
146
        CV_WRAP void norm(float v[]);
147 148 149

        /** @brief Add a new view point.
         */
150
        CV_WRAP void add(float v[]);
151 152 153

        /** @brief Generate new view points from all triangles.
         */
154
        CV_WRAP void subdivide(float v1[], float v2[], float v3[], int depth);
155 156 157 158 159 160 161 162 163 164 165 166

        public:
        /** @brief Camera position on the sphere after duplicated points elimination.
         */
        std::vector<cv::Point3d> CameraPos;

        /** @brief Generating a sphere by mean of a iteration based points selection process.
        @param radius_in Another radius used for adjusting the view distance.
        @param depth_in Number of interations for increasing the points on sphere.
         */
        icoSphere(float radius_in, int depth_in);

167
        /** @brief Get the center of points on surface in .ply model.
168 169 170 171
        @param cloud Point cloud used for computing the center point.
         */
        CV_WRAP cv::Point3d getCenter(cv::Mat cloud);

172
        /** @brief Get the proper camera radius from the view point to the center of model.
173 174 175 176 177 178 179
        @param cloud Point cloud used for computing the center point.
        @param center center point of the point cloud.
         */
        CV_WRAP float getRadius(cv::Mat cloud, cv::Point3d center);

        /** @brief Suit the position of bytes in 4 byte data structure for particular system.
         */
180
        CV_WRAP static int swapEndian(int val);
181

182
        /** @brief Create header in binary files collecting the image data and label.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
        @param num_item Number of items.
        @param rows Rows of a single sample image.
        @param cols Columns of a single sample image.
        @param headerPath Path where the header will be stored.
         */
        CV_WRAP static void createHeader(int num_item, int rows, int cols, const char* headerPath);

        /** @brief Write binary files used for training in other open source project including Caffe.
        @param filenameImg Path which including a set of images.
        @param binaryPath Path which will output a binary file.
        @param headerPath Path which header belongs to.
        @param num_item Number of samples.
        @param label_class Class label of the sample.
        @param x Pose label of X.
        @param y Pose label of Y.
        @param z Pose label of Z.
        @param isrgb Option for choice of using RGB images or not.
         */
201
        CV_WRAP static void writeBinaryfile(String filenameImg, const char* binaryPath, const char* headerPath, int num_item, int label_class, int x, int y, int z, int isrgb);
202 203
    };

204 205 206 207 208
/** @brief Caffe based 3D images descriptor.
 A class to extract features from an image. The so obtained descriptors can be used for classification and pose estimation goals @cite wohlhart15.
 */

/************************************ Feature Extraction Class ************************************/
209
    class CV_EXPORTS_W descriptorExtractor
210 211
    {
        private:
212 213 214 215 216
        caffe::Net<float>* convnet;
        cv::Size input_geometry;
        int num_channels;
        bool net_set;
        int net_ready;
217
        cv::Mat mean_;
218
        String deviceType;
219 220
        int deviceId;

221
        /** @brief Load the mean file in binaryproto format if it is needed.
222 223
        @param mean_file Path of mean file which stores the mean of training images, it is usually generated by Caffe tool.
         */
224
        void setMean(const String& mean_file);
225 226 227 228 229

        /** @brief Wrap the input layer of the network in separate cv::Mat objects(one per channel).
         This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D.
         The last preprocessing operation will write the separate channels directly to the input layer.
         */
230
        void wrapInput(std::vector<cv::Mat>* input_channels);
231

232
        /** @brief Convert the input image to the input image format of the network.
233 234 235
         */
        void preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels);

236
        public:
237 238 239 240
        /** @brief Set the device for feature extraction, if the GPU is used, there should be a device_id.
        @param device_type CPU or GPU.
        @param device_id ID of GPU.
         */
241
        descriptorExtractor(const String& device_type, int device_id = 0);
242 243 244

        /** @brief Get device type information for feature extraction.
         */
245
        String getDeviceType();
246 247 248 249 250 251 252 253 254

        /** @brief Get device ID information for feature extraction.
         */
        int getDeviceId();

        /** @brief Set device type information for feature extraction.
         Useful to change device without the need to reload the net.
        @param device_type CPU or GPU.
         */
255
        void setDeviceType(const String& device_type);
256 257 258 259 260 261 262 263 264 265 266 267 268

        /** @brief Set device ID information for feature extraction.
         Useful to change device without the need to reload the net. Only used for GPU.
        @param device_id ID of GPU.
         */
        void setDeviceId(const int& device_id);

        /** @brief Initiate a classification structure, the net work parameter is stored in model_file,
         the network structure is stored in trained_file, you can decide whether to use mean images or not.
        @param model_file Path of caffemodel which including all parameters in CNN.
        @param trained_file Path of prototxt which defining the structure of CNN.
        @param mean_file Path of mean file(option).
         */
269
        void loadNet(const String& model_file, const String& trained_file, const String& mean_file = "");
270 271 272 273 274 275 276

        /** @brief Extract features from a single image or from a vector of images.
         If loadNet was not called before, this method invocation will fail.
        @param inputimg Input images.
        @param feature Output features.
        @param feature_blob Layer which the feature is extracted from.
         */
277
        void extract(InputArrayOfArrays inputimg, OutputArray feature, String feature_blob);
278 279 280 281
    };
    //! @}
}
}
Wangyida's avatar
Wangyida committed
282 283 284

#endif /* CNN_3DOBJ_HPP_ */
#endif