Commit 197fba68 authored by Wangyida's avatar Wangyida

modify the test of module cnn_3dobj

parent cabd5d40
#Convolutional Neural Network for 3D object classification and pose estimation.
============================================
===========================================================
#Module Description on cnn_3dobj:
This learning structure construction and feature extraction concept is based on Convolutional Neural Network, the main reference paper could be found at:
https://cvarlab.icg.tugraz.at/pubs/wohlhart_cvpr15.pdf
The author provided Codes on Theano on:
https://cvarlab.icg.tugraz.at/projects/3d_object_detection/
I implemented the training and feature extraction codes mainly based on CAFFE project which will be compiled as libcaffe for the cnn_3dobj OpenCV module, codes are mainly concentrating on triplet and pair-wise jointed loss layer, the training data arrangement is also important which basic training information.
Codes about my triplet version of caffe are released on GIthub, you can git it through:
####This learning structure construction and feature extraction concept is based on Convolutional Neural Network, the main reference paper could be found at:
<https://cvarlab.icg.tugraz.at/pubs/wohlhart_cvpr15.pdf>.
####The author provided Codes on Theano on:
<https://cvarlab.icg.tugraz.at/projects/3d_object_detection/>.
####I implemented the training and feature extraction codes mainly based on CAFFE project(<http://caffe.berkeleyvision.org/>) which will be compiled as libcaffe for the cnn_3dobj OpenCV module, codes are mainly concentrating on triplet and pair-wise jointed loss layer, the training data arrangement is also important which basic training information.
####Codes about my triplet version of caffe are released on Github:
<https://github.com/Wangyida/caffe/tree/cnn_triplet>.
####You can git it through:
```
$ git clone https://github.com/Wangyida/caffe/tree/cnn_triplet.
```
============================================
===========================================================
#Module Building Process:
###Prerequisite for this module: protobuf and caffe, for the libcaffe installation, you can install it on standard system path for the aim of being able to be linked by this OpenCV module when compiling and function using. Using: -D CMAKE_INSTALL_PREFIX=/usr/local as an building option when you cmake, the building process on Caffe on system could be like this:
####Prerequisite for this module: protobuf and caffe, for the libcaffe installation, you can install it on standard system path for the aim of being able to be linked by this OpenCV module when compiling and function using. Using: -D CMAKE_INSTALL_PREFIX=/usr/local as an building option when you cmake, the building process on Caffe on system could be like this:
```
$ cd <caffe_source_directory>
$ mkdir biuld
......@@ -21,26 +23,26 @@ $ cmake -D CMAKE_INSTALL_PREFIX=/usr/local ..
$ make all -j4
$ sudo make install
```
###After all these steps, the headers and libs of CAFFE will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobuf and caffe will be recognized as already installed while building. Protobuf is needed.
####After all these steps, the headers and libs of CAFFE will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobuf and caffe will be recognized as already installed while building. Protobuf is needed.
#Compiling OpenCV
```
$ cd <opencv_source_directory>
$ mkdir build
$ cd build
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=OFF -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D WITH_VTK=ON -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules ..
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=ON -D BUILD_NEW_PYTHON_SUPPORT=OFF -D WITH_V4L=ON -D WITH_QT=ON -D WITH_OPENGL=ON -D WITH_VTK=ON -D INSTALL_TESTS=ON -D OPENCV_EXTRA_MODULES_PATH=<opencv_contrib>/modules ..
$ make -j4
$ sudo make install
```
##Tips on compiling problems:
###If you encouter the no declaration errors when you 'make', it might becaused that you have installed the older version of cnn_3dobj module and the header file changed in a newly released version of codes. This problem is the cmake and make can't detect the header should be updated and it keeps the older header remains in /usr/local/include/opencv2 whithout updating. This error could be solved by remove the installed older version of cnn_3dobj module by:
####If you encouter the no declaration errors when you 'make', it might becaused that you have installed the older version of cnn_3dobj module and the header file changed in a newly released version of codes. This problem is the cmake and make can't detect the header should be updated and it keeps the older header remains in /usr/local/include/opencv2 whithout updating. This error could be solved by remove the installed older version of cnn_3dobj module by:
```
$ cd /
$ cd usr/local/include/opencv2/
$ sudo rm -rf cnn_3dobj.hpp
```
###And redo the compiling steps above again.
================================================
####And redo the compiling steps above again.
===========================================================
#Building samples
```
$ cd <opencv_contrib>/modules/cnn_3dobj/samples
......@@ -49,50 +51,60 @@ $ cd build
$ cmake ..
$ make
```
=============
===========================================================
#Demos
##Demo1: training data generation
###Imagas generation from different pose, by default there are 4 models used, there will be 276 images in all which each class contains 69 iamges, if you want to use additional .ply models, it is necessary to change the class number parameter to the new class number and also give it a new class label. If you will train net work and extract feature from RGB images set the parameter rgb_use as 1.
####Imagas generation from different pose, by default there are 4 models used, there will be 276 images in all which each class contains 69 iamges, if you want to use additional .ply models, it is necessary to change the class number parameter to the new class number and also give it a new class label. If you will train net work and extract feature from RGB images set the parameter rgb_use as 1.
```
$ ./sphereview_test -plymodel=../3Dmodel/ape.ply -label_class=0
$ ./sphereview_test -plymodel=../data/3Dmodel/ape.ply -label_class=0
```
###press 'Q' to start 2D image genaration
####press 'Q' to start 2D image genaration
```
$ ./sphereview_test -plymodel=../3Dmodel/ant.ply -label_class=1
$ ./sphereview_test -plymodel=../data/3Dmodel/ant.ply -label_class=1
```
###press 'Q' to start
####press 'Q' to start
```
$ ./sphereview_test -plymodel=../3Dmodel/cow.ply -label_class=2
$ ./sphereview_test -plymodel=../data/3Dmodel/cow.ply -label_class=2
```
###press 'Q' to start
####press 'Q' to start
```
$ ./sphereview_test -plymodel=../3Dmodel/plane.ply -label_class=3
$ ./sphereview_test -plymodel=../data/3Dmodel/plane.ply -label_class=3
```
###press 'Q' to start
####press 'Q' to start
###When all images are created in images_all folder as a collection of training images for network tranining and as a gallery of reference images for the classification part, then proceed on.
###After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'. Here I use the same as trianing and testing data, you can use different data for training and testing the performance in the CAFFE training process. It's important to observe the loss of testing data to check whether training data is suitable for the your aim. Loss should be obseved as keep decreasing and remain on a much smaller number than the initial loss.
###You could start triplet tranining using Caffe like this:
####When all images are created in images_all folder as a collection of training images for network tranining and as a gallery of reference images for the classification part, then proceed on.
####After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'. Here I use the same as trianing and testing data, you can use different data for training and testing the performance in the CAFFE training process. It's important to observe the loss of testing data to check whether training data is suitable for the your aim. Loss should be obseved as keep decreasing and remain on a much smaller number than the initial loss.
####You could start triplet tranining using Caffe like this:
```
$ cd
$ cd <caffe_source_directory>
$ ./examples/triplet/create_3d_triplet.sh
$ ./examples/triplet/train_3d_triplet.sh
```
###After doing this, you will get .caffemodel files as the trained parameter of net work. I have already provide the net definition .prototxt files and the pretrained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/samples/build/data folder, you could just use them without training in caffe.
==============
####After doing this, you will get .caffemodel files as the trained parameter of net work. I have already provide the net definition .prototxt files and the pretrained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/testdata/cv folder, you could just use them without training in caffe.
===========================================================
##Demo2: feature extraction and classification
```
$ cd
$ cd <opencv_contrib>/modules/cnn_3dobj/samples/build
```
###Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. Just run:
####Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. The caffe model and network prototxt file is attached in <opencv_contrib>/modules/cnn_3dobj/testdata/cv. Just run:
```
$ ./classify_test
```
###if the classification and pose estimation issue need to extract mean got from all training images, you can run this:
####if the classification and pose estimation issue need to extract mean got from all training images, you can run this:
```
$ ./classify_test -mean_file=../data/images_mean/triplet_mean.binaryproto
```
==============================================
===========================================================
##Demo3: model performance test
####This demo will have a test on the performance of trained CNN model on several images. If the the model fail on telling different samples from seperate classes or confused on samples with similar pose but from different classes, it will give some information on the model analysis.
```
$ ./model_test
```
===========================================================
#Test
####If you want to have a test on cnn_3dobj module, the path of test data must be set in advance:
```
$ export OPENCV_TEST_DATA_PATH=<opencv_contrib>/modules/cnn_3dobj/testdata
```
......@@ -4,3 +4,10 @@
booktitle = {BMVC British Machine Vision Conference 2008},
year = {2008}
}
@inproceedings{wohlhart15,
author = {Paul Wohlhart and Vincent Lepetit},
title = {Learning Descriptors for Object Recognition and 3D Pose Estimation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year = {2015}
}
......@@ -77,7 +77,44 @@ using caffe::Blob;
using caffe::Caffe;
using caffe::Datum;
using caffe::Net;
/** @defgroup cnn_3dobj CNN based on Caffe aimming at 3D object recognition and pose estimation
/** @defgroup cnn_3dobj 3D object recognition and pose estimation API
As CNN based learning algorithm shows better performance on the classification issues,
the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation
is a jointed mission aimming at seperate different posed apart in the descriptor form.
In the training stage, we prepare 2D training images generated from our module with their
class label and pose label. We fully exploit the information lies in their labels
by using a triplet and pair-wise jointed loss function in CNN training.
As CNN based learning algorithm shows better performance on the classification issues,
the rich labeled data could be more useful in the training stage. 3D object classification and pose estimation
is a jointed mission aiming at separate different posea apart in the descriptor form.
In the training stage, we prepare 2D training images generated from our module with their
class label and pose label. We fully exploit the information that lies in their labels
by using a triplet and pair-wise jointed loss function in CNN training.
Both class and pose label are in consideration in the triplet loss. The loss score
will be smaller when features from the same class and same pose is more similar
and features from different classes or different poses will lead to a much larger loss score.
This loss is also jointed with a pair wise component to make sure the loss is never be zero
and have a restriction on the model scale.
About the training and feature extraction process, it is a rough implementation by using OpenCV
and Caffe from the idea of Paul Wohlhart. The principal purpose of this API is constructing
a well labeled database from .ply models for CNN training with triplet loss and extracting features
with the constructed model for prediction or other purpose of pattern recognition, algorithms into two main Class:
**icoSphere: methods belonging to this class generates 2D images from a 3D model, together with their class and pose from camera view labels.
**descriptorExtractor: methods belonging to this class extract descriptors from 2D images which is
discriminant on category prediction and pose estimation.
@note This API need Caffe with triplet version which is designed for this module
<https://github.com/Wangyida/caffe/tree/cnn_triplet>.
*/
namespace cv
{
......@@ -87,52 +124,96 @@ namespace cnn_3dobj
//! @addtogroup cnn_3dobj
//! @{
/** @brief Icosohedron based camera view generator.
/** @brief Icosohedron based camera view data generator.
The class create some sphere views of camera towards a 3D object meshed from .ply files @cite hinterstoisser2008panter .
*/
The class create some sphere views of camera towards a 3D object meshed from .ply files @cite hinterstoisser2008panter .
*/
/************************************ Data Generation Class ************************************/
class CV_EXPORTS_W icoSphere
{
private:
/** @brief X position of one base point on the initial Icosohedron sphere,
Y is set to be 0 as default.
*/
float X;
/** @brief Z position of one base point on the initial Icosohedron sphere.
*/
float Z;
public:
std::vector<float> vertexNormalsList;
std::vector<float> vertexList;
std::vector<cv::Point3d> CameraPos;
std::vector<cv::Point3d> CameraPos_temp;
float radius;
/** @brief A threshold for the dupicated points elimination.
*/
float diff;
icoSphere(float radius_in, int depth_in);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
/** @brief Temp camera position for duplex position elimination.
*/
std::vector<cv::Point3d> CameraPos_temp;
/** @brief Make all view points having the same distance from the focal point used by the camera view.
*/
CV_WRAP void norm(float v[]);
/** @brief Add new view point between 2 point of the previous view point.
*/
/** @brief Add a new view point.
*/
CV_WRAP void add(float v[]);
/** @brief Generating new view points from all triangles.
*/
/** @brief Generate new view points from all triangles.
*/
CV_WRAP void subdivide(float v1[], float v2[], float v3[], int depth);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
CV_WRAP static uint32_t swapEndian(uint32_t val);
/** @brief Suit the position of bytes in 4 byte data structure for particular system.
*/
CV_WRAP cv::Point3d getCenter(cv::Mat cloud);
public:
/** @brief Camera position on the sphere after duplicated points elimination.
*/
std::vector<cv::Point3d> CameraPos;
/** @brief Generating a sphere by mean of a iteration based points selection process.
@param radius_in Another radius used for adjusting the view distance.
@param depth_in Number of interations for increasing the points on sphere.
*/
icoSphere(float radius_in, int depth_in);
/** @brief Get the center of points on surface in .ply model.
*/
CV_WRAP float getRadius(cv::Mat cloud, cv::Point3d center);
@param cloud Point cloud used for computing the center point.
*/
CV_WRAP cv::Point3d getCenter(cv::Mat cloud);
/** @brief Get the proper camera radius from the view point to the center of model.
*/
CV_WRAP static void createHeader(int num_item, int rows, int cols, const char* headerPath);
@param cloud Point cloud used for computing the center point.
@param center center point of the point cloud.
*/
CV_WRAP float getRadius(cv::Mat cloud, cv::Point3d center);
/** @brief Suit the position of bytes in 4 byte data structure for particular system.
*/
CV_WRAP static uint32_t swapEndian(uint32_t val);
/** @brief Create header in binary files collecting the image data and label.
*/
@param num_item Number of items.
@param rows Rows of a single sample image.
@param cols Columns of a single sample image.
@param headerPath Path where the header will be stored.
*/
CV_WRAP static void createHeader(int num_item, int rows, int cols, const char* headerPath);
/** @brief Write binary files used for training in other open source project including Caffe.
@param filenameImg Path which including a set of images.
@param binaryPath Path which will output a binary file.
@param headerPath Path which header belongs to.
@param num_item Number of samples.
@param label_class Class label of the sample.
@param x Pose label of X.
@param y Pose label of Y.
@param z Pose label of Z.
@param isrgb Option for choice of using RGB images or not.
*/
CV_WRAP static void writeBinaryfile(string filenameImg, const char* binaryPath, const char* headerPath, int num_item, int label_class, int x, int y, int z, int isrgb);
/** @brief Write binary files used for training in other open source project.
*/
};
/** @brief Caffe based 3D images descriptor.
A class to extract features from an image. The so obtained descriptors can be used for classification and pose estimation goals @cite wohlhart15.
*/
/************************************ Feature Extraction Class ************************************/
class CV_EXPORTS_W descriptorExtractor
{
private:
......@@ -142,32 +223,66 @@ The class create some sphere views of camera towards a 3D object meshed from .pl
bool net_set;
int net_ready;
cv::Mat mean_;
std::vector<string> device_info;
void setMean(const string& mean_file);
string deviceType;
int deviceId;
/** @brief Load the mean file in binaryproto format if it is needed.
*/
@param mean_file Path of mean file which stores the mean of training images, it is usually generated by Caffe tool.
*/
void setMean(const string& mean_file);
/** @brief Wrap the input layer of the network in separate cv::Mat objects(one per channel).
This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D.
The last preprocessing operation will write the separate channels directly to the input layer.
*/
void wrapInput(std::vector<cv::Mat>* input_channels);
/** @brief Wrap the input layer of the network in separate cv::Mat objects(one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer.
*/
void preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels);
/** @brief Convert the input image to the input image format of the network.
*/
*/
void preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels);
public:
descriptorExtractor(const string& device_type, int device_id);
/** @brief Set the device for feature extraction.
*/
std::vector<string> getDevice();
/** @brief Get device information for feature extraction.
*/
void setDevice(const string& device_type, const string& device_id = "");
/** @brief Set device information for feature extraction.
*/
void loadNet(const string& model_file, const string& trained_file, string mean_file = "");
/** @brief Initiate a classification structure.
*/
/** @brief Set the device for feature extraction, if the GPU is used, there should be a device_id.
@param device_type CPU or GPU.
@param device_id ID of GPU.
*/
descriptorExtractor(const string& device_type, int device_id = 0);
/** @brief Get device type information for feature extraction.
*/
string getDeviceType();
/** @brief Get device ID information for feature extraction.
*/
int getDeviceId();
/** @brief Set device type information for feature extraction.
Useful to change device without the need to reload the net.
@param device_type CPU or GPU.
*/
void setDeviceType(const string& device_type);
/** @brief Set device ID information for feature extraction.
Useful to change device without the need to reload the net. Only used for GPU.
@param device_id ID of GPU.
*/
void setDeviceId(const int& device_id);
/** @brief Initiate a classification structure, the net work parameter is stored in model_file,
the network structure is stored in trained_file, you can decide whether to use mean images or not.
@param model_file Path of caffemodel which including all parameters in CNN.
@param trained_file Path of prototxt which defining the structure of CNN.
@param mean_file Path of mean file(option).
*/
void loadNet(const string& model_file, const string& trained_file, const string& mean_file = "");
/** @brief Extract features from a single image or from a vector of images.
If loadNet was not called before, this method invocation will fail.
@param inputimg Input images.
@param feature Output features.
@param feature_blob Layer which the feature is extracted from.
*/
void extract(InputArrayOfArrays inputimg, OutputArray feature, std::string feature_blob);
/** @brief Extract features from a set of images.
*/
};
//! @}
}
......
#ifndef __OPENCV_CNN_3DOBJ_CONFIG_HPP__
#define __OPENCV_CNN_3DOBJ_CONFIG_HPP__
// HAVE CAFFE
#define HAVE_CAFFE
#endif
......@@ -11,3 +11,7 @@ target_link_libraries(sphereview_test ${OpenCV_LIBS})
set(SOURCES_classifier classifyIMG_demo.cpp)
add_executable(classify_test ${SOURCES_classifier})
target_link_libraries(classify_test ${OpenCV_LIBS})
set(SOURCES_modelanalysis model_analysis_demo.cpp)
add_executable(model_test ${SOURCES_modelanalysis})
target_link_libraries(model_test ${OpenCV_LIBS})
......@@ -74,10 +74,10 @@ int main(int argc, char** argv)
{
const String keys = "{help | | this demo will convert a set of images in a particular path into leveldb database for feature extraction using Caffe. If there little variance in data such as human faces, you can add a mean_file, otherwise it is not so useful}"
"{src_dir | ../data/images_all/ | Source direction of the images ready for being used for extract feature as gallery.}"
"{caffemodel | ../data/3d_triplet_iter_20000.caffemodel | caffe model for feature exrtaction.}"
"{network_forIMG | ../data/3d_triplet_testIMG.prototxt | Network definition file used for extracting feature from a single image and making a classification}"
"{caffemodel | ../../testdata/cv/3d_triplet_iter_30000.caffemodel | caffe model for feature exrtaction.}"
"{network_forIMG | ../../testdata/cv/3d_triplet_testIMG.prototxt | Network definition file used for extracting feature from a single image and making a classification}"
"{mean_file | no | The mean file generated by Caffe from all gallery images, this could be used for mean value substraction from all images. If you want to use the mean file, you can set this as ../data/images_mean/triplet_mean.binaryproto.}"
"{target_img | ../data/images_all/3_13.png | Path of image waiting to be classified.}"
"{target_img | ../data/images_all/1_8.png | Path of image waiting to be classified.}"
"{feature_blob | feat | Name of layer which will represent as the feature, in this network, ip1 or feat is well.}"
"{num_candidate | 15 | Number of candidates in gallery as the prediction result.}"
"{device | CPU | device}"
......@@ -99,21 +99,22 @@ int main(int argc, char** argv)
string device = parser.get<string>("device");
int dev_id = parser.get<int>("dev_id");
cv::cnn_3dobj::descriptorExtractor descriptor(device, dev_id);
std::vector<string> device_info = descriptor.getter();
std::cout << "Using" << device_info[0] << std::endl;
cv::cnn_3dobj::descriptorExtractor descriptor(device);
std::cout << "Using" << descriptor.getDeviceType() << std::endl;
if (strcmp(mean_file.c_str(), "no") == 0)
descriptor.loadNet(network_forIMG, caffemodel);
else
descriptor.loadNet(network_forIMG, caffemodel, mean_file);
std::vector<string> name_gallery;
listDir(src_dir.c_str(), name_gallery, false);
for (unsigned int i = 0; i < name_gallery.size(); i++) {
for (unsigned int i = 0; i < name_gallery.size(); i++)
{
name_gallery[i] = src_dir + name_gallery[i];
}
std::vector<cv::Mat> img_gallery;
cv::Mat feature_reference;
for (unsigned int i = 0; i < name_gallery.size(); i++) {
for (unsigned int i = 0; i < name_gallery.size(); i++)
{
img_gallery.push_back(cv::imread(name_gallery[i], -1));
}
descriptor.extract(img_gallery, feature_reference, feature_blob);
......@@ -122,7 +123,7 @@ int main(int argc, char** argv)
cv::Mat img = cv::imread(target_img, -1);
// CHECK(!img.empty()) << "Unable to decode image " << target_img;
std::cout << std::endl << "---------- Featrue of gallery images ----------" << std::endl;
std::cout << std::endl << "---------- Features of gallery images ----------" << std::endl;
std::vector<std::pair<string, float> > prediction;
for (unsigned int i = 0; i < feature_reference.rows; i++)
std::cout << feature_reference.row(i) << endl;
......@@ -131,10 +132,11 @@ int main(int argc, char** argv)
cv::BFMatcher matcher(NORM_L2);
std::vector<std::vector<cv::DMatch> > matches;
matcher.knnMatch(feature_test, feature_reference, matches, num_candidate);
std::cout << std::endl << "---------- Featrue of target image: " << target_img << "----------" << endl << feature_test << std::endl;
std::cout << std::endl << "---------- Features of target image: " << target_img << "----------" << endl << feature_test << std::endl;
// Print the top N prediction.
std::cout << std::endl << "---------- Prediction result(Distance - File Name in Gallery) ----------" << std::endl;
for (size_t i = 0; i < matches[0].size(); ++i) {
for (size_t i = 0; i < matches[0].size(); ++i)
{
std::cout << i << " - " << std::fixed << std::setprecision(2) << name_gallery[matches[0][i].trainIdx] << " - \"" << matches[0][i].distance << "\"" << std::endl;
}
return 0;
......
/*
* Software License Agreement (BSD License)
*
* Copyright (c) 2009, Willow Garage, Inc.
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials provided
* with the distribution.
* * Neither the name of Willow Garage, Inc. nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
* CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
* ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*
*/
#define HAVE_CAFFE
#include <iostream>
#include "opencv2/imgproc.hpp"
#include "opencv2/cnn_3dobj.hpp"
using namespace cv;
using namespace cv::cnn_3dobj;
int main(int argc, char** argv)
{
const String keys = "{help | | this demo will have an analysis on the trained model, it will print information about whether the model is suit for set different classes apart and also discriminant on object pose at the same time.}"
"{caffemodel | ../../testdata/cv/3d_triplet_iter_30000.caffemodel | caffe model for feature exrtaction.}"
"{network_forIMG | ../../testdata/cv/3d_triplet_testIMG.prototxt | Network definition file used for extracting feature from a single image and making a classification}"
"{mean_file | no | The mean file generated by Caffe from all gallery images, this could be used for mean value substraction from all images. If you want to use the mean file, you can set this as ../data/images_mean/triplet_mean.binaryproto.}"
"{target_img | ../data/images_all/1_8.png | Path of image in reference.}"
"{ref_img1 | ../data/images_all/1_23.png | Path of closest image.}"
"{ref_img2 | ../data/images_all/1_14.png | Path of less closer image in the same class with reference image.}"
"{ref_img3 | ../data/images_all/3_8.png | Path of image with the same pose in another class.}"
"{feature_blob | feat | Name of layer which will represent as the feature, in this network, ip1 or feat is well.}"
"{device | CPU | device}"
"{dev_id | 0 | dev_id}";
cv::CommandLineParser parser(argc, argv, keys);
parser.about("Demo for object data classification and pose estimation");
if (parser.has("help"))
{
parser.printMessage();
return 0;
}
string caffemodel = parser.get<string>("caffemodel");
string network_forIMG = parser.get<string>("network_forIMG");
string mean_file = parser.get<string>("mean_file");
string target_img = parser.get<string>("target_img");
string ref_img1 = parser.get<string>("ref_img1");
string ref_img2 = parser.get<string>("ref_img2");
string ref_img3 = parser.get<string>("ref_img3");
string feature_blob = parser.get<string>("feature_blob");
string device = parser.get<string>("device");
int dev_id = parser.get<int>("dev_id");
std::vector<string> ref_img;
ref_img.push_back(ref_img1);
ref_img.push_back(ref_img2);
ref_img.push_back(ref_img3);
cv::cnn_3dobj::descriptorExtractor descriptor(device, dev_id);
if (strcmp(mean_file.c_str(), "no") == 0)
descriptor.loadNet(network_forIMG, caffemodel);
else
descriptor.loadNet(network_forIMG, caffemodel, mean_file);
cv::Mat img_base = cv::imread(target_img, -1);
if (img_base.empty())
{
printf("could not read reference image %s\n, make sure the path of images are set properly.", target_img.c_str());
}
std::vector<cv::Mat> img;
for (unsigned int i = 0; i < ref_img.size(); i++)
{
img.push_back(cv::imread(ref_img[i], -1));
if (img[i].empty()) {
printf("could not read reference image %s\n, make sure the path of images are set properly.", ref_img[i].c_str());
}
}
cv::Mat feature_test;
descriptor.extract(img_base, feature_test, feature_blob);
if (feature_test.empty()) {
printf("could not extract feature from test image which is read into cv::Mat.");
}
cv::Mat feature_reference;
descriptor.extract(img, feature_reference, feature_blob);
if (feature_reference.empty()) {
printf("could not extract feature from reference images which is already stored in vector<cv::Mat>.");
}
std::vector<float> matches;
for (int i = 0; i < feature_reference.rows; i++)
{
cv::Mat distance = feature_test-feature_reference.row(i);
matches.push_back(cv::norm(distance));
}
bool pose_pass = false;
bool class_pass = false;
if (matches[0] < matches[1] && matches[0] < matches[2])
pose_pass = true;
if (matches[1] < matches[2])
class_pass = true;
if (!pose_pass)
{
printf("\n =========== Model %s ========== \nIs not trained properly that the similar pose could not be tell from a cluster of features.\n", caffemodel.c_str());
}
else if (!class_pass)
{
printf("\n =========== Model %s ========== \nIs not trained properly that feature from the same class is not discriminant from the one of another class with similar pose.\n", caffemodel.c_str());
}
else
{
printf("\n =========== Model %s ========== \nSuits for setting different classes apart and also discriminant on object pose at the same time.\n", caffemodel.c_str());
}
return 0;
}
......@@ -42,9 +42,9 @@ using namespace std;
using namespace cv::cnn_3dobj;
int main(int argc, char *argv[])
{
const String keys = "{help | | demo :$ ./sphereview_test -ite_depth=2 -plymodel=../3Dmodel/ape.ply -imagedir=../data/images_ape/ -labeldir=../data/label_ape.txt -num_class=4 -label_class=0, then press 'q' to run the demo for images generation when you see the gray background and a coordinate.}"
const String keys = "{help | | demo :$ ./sphereview_test -ite_depth=2 -plymodel=../data/3Dmodel/ape.ply -imagedir=../data/images_all/ -labeldir=../data/label_all.txt -num_class=4 -label_class=0, then press 'q' to run the demo for images generation when you see the gray background and a coordinate.}"
"{ite_depth | 2 | Iteration of sphere generation.}"
"{plymodel | ../3Dmodel/ape.ply | path of the '.ply' file for image rendering. }"
"{plymodel | ../data/3Dmodel/ape.ply | path of the '.ply' file for image rendering. }"
"{imagedir | ../data/images_all/ | path of the generated images for one particular .ply model. }"
"{labeldir | ../data/label_all.txt | path of the generated images for one particular .ply model. }"
"{num_class | 4 | total number of classes of models}"
......@@ -84,8 +84,8 @@ int main(int argc, char *argv[])
Point3d cam_focal_point = ViewSphere.getCenter(objmesh.cloud);
float radius = ViewSphere.getRadius(objmesh.cloud, cam_focal_point);
Point3d cam_y_dir(0.0f,0.0f,1.0f);
const char* headerPath = "./header_for_";
const char* binaryPath = "./binary_";
const char* headerPath = "../data/header_for_";
const char* binaryPath = "../data/binary_";
ViewSphere.createHeader((int)campos.size(), 64, 64, headerPath);
for(int pose = 0; pose < (int)campos.size(); pose++){
char* temp = new char;
......
......@@ -8,19 +8,20 @@ namespace cnn_3dobj
{
descriptorExtractor::descriptorExtractor(const string& device_type, int device_id)
{
net_ready = 0;
if (strcmp(device_type.c_str(), "CPU") == 0 || strcmp(device_type.c_str(), "GPU") == 0)
{
if (strcmp(device_type.c_str(), "CPU") == 0)
{
caffe::Caffe::set_mode(caffe::Caffe::CPU);
device_info.push_back("CPU");
deviceType = "CPU";
std::cout << "Using CPU" << std::endl;
}
else
{
caffe::Caffe::set_mode(caffe::Caffe::GPU);
caffe::Caffe::SetDevice(device_id);
device_info.push_back("GPU");
deviceType = "GPU";
std::cout << "Using GPU" << std::endl;
std::cout << "Using Device_id=" << device_id << std::endl;
}
......@@ -33,44 +34,59 @@ namespace cnn_3dobj
}
};
std::vector<string> descriptorExtractor::getDevice()
string descriptorExtractor::getDeviceType()
{
string device_info_out;
device_info_out = deviceType;
return device_info_out;
};
int descriptorExtractor::getDeviceId()
{
std::vector<string> device_info_out;
device_info_out = device_info;
int device_info_out;
device_info_out = deviceId;
return device_info_out;
};
void descriptorExtractor::setDevice(const string& device_type, const string& device_id)
void descriptorExtractor::setDeviceType(const string& device_type)
{
if (strcmp(device_type.c_str(), "CPU") == 0 || strcmp(device_type.c_str(), "GPU") == 0)
{
if (strcmp(device_type.c_str(), "CPU") == 0)
{
caffe::Caffe::set_mode(caffe::Caffe::CPU);
device_info.push_back("CPU");
deviceType = "CPU";
std::cout << "Using CPU" << std::endl;
}
else
{
int dev_id = atoi(device_id.c_str());
caffe::Caffe::set_mode(caffe::Caffe::GPU);
caffe::Caffe::SetDevice(dev_id);
device_info.push_back("GPU");
deviceType = "GPU";
std::cout << "Using GPU" << std::endl;
std::cout << "Using Device_id=" << dev_id << std::endl;
}
net_set = true;
}
else
{
std::cout << "Error: Device name must be 'GPU' together with an device number or 'CPU'." << std::endl;
net_set = false;
std::cout << "Error: Device name must be 'GPU' or 'CPU'." << std::endl;
}
};
void descriptorExtractor::loadNet(const string& model_file, const string& trained_file, string mean_file)
void descriptorExtractor::setDeviceId(const int& device_id)
{
if (strcmp(deviceType.c_str(), "GPU") == 0)
{
caffe::Caffe::SetDevice(device_id);
deviceId = device_id;
std::cout << "Using GPU with Device ID = " << device_id << std::endl;
}
else
{
std::cout << "Error: Device ID only need to be set when GPU is used." << std::endl;
}
};
void descriptorExtractor::loadNet(const string& model_file, const string& trained_file, const string& mean_file)
{
net_ready = 0;
if (net_set)
{
/* Load the network. */
......@@ -98,7 +114,7 @@ namespace cnn_3dobj
}
else
{
std::cout << "Error: Device must be set in advance using SetNet function" << std::endl;
std::cout << "Error: Net is not set properly in advance using construtor." << std::endl;
}
};
......@@ -181,14 +197,14 @@ namespace cnn_3dobj
}
}
else
std::cout << "Network must be set properly using SetNet and LoadNet in advance.";
std::cout << "Device must be set properly using constructor and the net must be set in advance using loadNet.";
};
/* Wrap the input layer of the network in separate cv::Mat objects
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
* (one per channel). This way we save one memcpy operation and we
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void descriptorExtractor::wrapInput(std::vector<cv::Mat>* input_channels)
{
Blob<float>* input_layer = convnet->input_blobs()[0];
......@@ -233,12 +249,12 @@ namespace cnn_3dobj
else
sample_normalized = sample_float;
/* This operation will write the separate BGR planes directly to the
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized, *input_channels);
if (reinterpret_cast<float*>(input_channels->at(0).data)
!= convnet->input_blobs()[0]->cpu_data())
std::cout << "Input channels are not wrapping the input layer of the network." << std::endl;
};
}
}
} /* namespace cnn_3dobj */
} /* namespace cv */
......@@ -8,12 +8,8 @@ namespace cnn_3dobj
{
icoSphere::icoSphere(float radius_in, int depth_in)
{
X = 0.5f;
Z = 0.5f;
X *= (int)radius_in;
Z *= (int)radius_in;
diff = 0.00000005964;
float vdata[12][3] = { { -X, 0.0f, Z }, { X, 0.0f, Z },
{ -X, 0.0f, -Z }, { X, 0.0f, -Z }, { 0.0f, Z, X }, { 0.0f, Z, -X },
{ 0.0f, -Z, X }, { 0.0f, -Z, -X }, { Z, X, 0.0f }, { -Z, X, 0.0f },
......@@ -23,6 +19,9 @@ namespace cnn_3dobj
{ 5, 2, 3 }, { 2, 7, 3 }, { 7, 10, 3 }, { 7, 6, 10 }, { 7, 11, 6 },
{ 11, 0, 6 }, { 0, 1, 6 }, { 6, 1, 10 }, { 9, 0, 11 },
{ 9, 11, 2 }, { 9, 2, 5 }, { 7, 2, 11 } };
diff = 0.00000001;
X *= (int)radius_in;
Z *= (int)radius_in;
// Iterate over points
for (int i = 0; i < 20; ++i)
......@@ -31,20 +30,24 @@ namespace cnn_3dobj
vdata[tindices[i][2]], depth_in);
}
CameraPos_temp.push_back(CameraPos[0]);
for (int j = 1; j<int(CameraPos.size()); j++)
for (unsigned int j = 1; j < CameraPos.size(); ++j)
{
for (int k = 0; k<j; k++)
for (unsigned int k = 0; k < j; ++k)
{
if (CameraPos.at(k).x-CameraPos.at(j).x < diff && CameraPos.at(k).y-CameraPos.at(j).y < diff && CameraPos.at(k).z-CameraPos.at(j).z < diff)
float dist_x, dist_y, dist_z;
dist_x = (CameraPos.at(k).x-CameraPos.at(j).x) * (CameraPos.at(k).x-CameraPos.at(j).x);
dist_y = (CameraPos.at(k).y-CameraPos.at(j).y) * (CameraPos.at(k).y-CameraPos.at(j).y);
dist_z = (CameraPos.at(k).z-CameraPos.at(j).z) * (CameraPos.at(k).z-CameraPos.at(j).z);
if (dist_x < diff && dist_y < diff && dist_z < diff)
break;
if(k == j-1)
else if (k == j-1)
CameraPos_temp.push_back(CameraPos[j]);
}
}
CameraPos = CameraPos_temp;
cout << "View points in total: " << CameraPos.size() << endl;
cout << "The coordinate of view point: " << endl;
for(int i=0; i < (int)CameraPos.size(); i++)
for(unsigned int i = 0; i < CameraPos.size(); i++)
{
cout << CameraPos.at(i).x <<' '<< CameraPos.at(i).y << ' ' << CameraPos.at(i).z << endl;
}
......@@ -69,8 +72,6 @@ namespace cnn_3dobj
std::vector<float>* temp = new std::vector<float>;
for (int k = 0; k < 3; ++k)
{
vertexList.push_back(v[k]);
vertexNormalsList.push_back(v[k]);
temp->push_back(v[k]);
}
temp_Campos.x = temp->at(0);temp_Campos.y = temp->at(1);temp_Campos.z = temp->at(2);
......@@ -261,4 +262,5 @@ namespace cnn_3dobj
img_file.close();
lab_file.close();
};
}}
} /* namespace cnn_3dobj */
} /* namespace cv */
/*
* Created on: Aug 14, 2015
* Author: yidawang
* Author: Yida Wang
*/
#include "test_precomp.hpp"
......@@ -26,33 +26,38 @@ CV_CNN_Feature_Test::CV_CNN_Feature_Test()
*/
void CV_CNN_Feature_Test::run(int)
{
string caffemodel = ts->get_data_path() + "cnn_3dobj/samples/data/3d_triplet_iter_20000.caffemodel";
string network_forIMG = ts->get_data_path() + "cnn_3dobj/samples/data/3d_triplet_testIMG.prototxt";
string mean_file = "no";
string target_img = ts->get_data_path() + "cnn_3dobj/samples/data/images_all/2_24.png";
string caffemodel = std::string(ts->get_data_path()) + "3d_triplet_iter_30000.caffemodel";
string network_forIMG = cvtest::TS::ptr()->get_data_path() + "3d_triplet_testIMG.prototxt";
string mean_file = "no";
std::vector<string> ref_img;
string target_img = std::string(ts->get_data_path()) + "1_8.png";
string feature_blob = "feat";
string device = "CPU";
int dev_id = 0;
cv::Mat img_base = cv::imread(target_img, -1);
if (img_base.empty())
{
ts->printf(cvtest::TS::LOG, "could not read reference image %s\n", target_img.c_str(), "make sure the path of images are set properly.");
ts->set_failed_test_info(cvtest::TS::FAIL_MISSING_TEST_DATA);
return;
}
cv::cnn_3dobj::descriptorExtractor descriptor(device, dev_id);
if (strcmp(mean_file.c_str(), "no") == 0)
descriptor.loadNet(network_forIMG, caffemodel);
else
descriptor.loadNet(network_forIMG, caffemodel, mean_file);
cv::Mat img = cv::imread(target_img, -1);
if (img.empty()) {
ts->printf(cvtest::TS::LOG, "could not read image %s\n", target_img.c_str());
ts->set_failed_test_info(cvtest::TS::FAIL_MISSING_TEST_DATA);
return;
}
cv::Mat feature_test;
descriptor.extract(img, feature_test, feature_blob);
if (feature_test.empty()) {
ts->printf(cvtest::TS::LOG, "could not extract feature from image %s\n", target_img.c_str());
descriptor.extract(img_base, feature_test, feature_blob);
Mat feature_reference = (Mat_<float>(1,16) << -134.03548, -203.48265, -105.96752, 55.343075, -211.36378, 487.85968, -182.15063, 62.229042, 297.19876, 206.07578, 291.74951, -19.906454, -464.09152, 135.79895, 420.43616, 2.2887282);
printf("Reference feature is computed by Caffe extract_features tool by \n To generate values for different images, use extract_features \n with the resetted image list in prototxt.");
float dist = norm(feature_test - feature_reference);
if (dist > 5) {
ts->printf(cvtest::TS::LOG, "Extracted featrue is not the same from the one extracted from Caffe.");
ts->set_failed_test_info(cvtest::TS::FAIL_MISSING_TEST_DATA);
return;
}
}
TEST(VIDEO_BGSUBGMG, accuracy) { CV_CNN_Feature_Test test; test.safe_run(); }
TEST(CNN_FEATURE, accuracy) { CV_CNN_Feature_Test test; test.safe_run(); }
......@@ -81,6 +81,6 @@ layer {
bottom: "ip1"
top: "feat"
inner_product_param {
num_output: 4
num_output: 16
}
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment