Commit 73663871 authored by Wangyida's avatar Wangyida

add error solving tips and information about model in README

parent e4e374e2
##CNN for 3D object recognition and pose estimation including a completed Sphere View of 3D objects from .ply files, when the windows shows the coordinate, press 'q' to go on image generation.
#Convolutional Neural Network for 3D object classification and pose estimation.
#Building Process:
###Prerequisite for this module: protobuf, leveldb, glog, gflags and caffe, for the libcaffe installation, you can install it on standard system path for being able to be linked by this OpenCV module when compiling. Just using: -D CMAKE_INSTALL_PREFIX=/usr/local, so the building process on Caffe on system could be like this:
#Module Description on cnn_3dobj:
This learning structure construction and feature extraction concept is based on Convolutional Neural Network, the main reference paper could be found at:
The author provided Codes on Theano on:
I implemented the training and feature extraction codes mainly based on CAFFE project which will be compiled as libcaffe for the cnn_3dobj OpenCV module, codes are mainly concentrating on triplet and pair-wise jointed loss layer, the training data arrangement is also important which basic training information.
Codes about my triplet version of caffe are released on GIthub, you can git it through:
$ git clone
#Module Building Process:
###Prerequisite for this module: protobuf and caffe, for the libcaffe installation, you can install it on standard system path for the aim of being able to be linked by this OpenCV module when compiling and function using. Using: -D CMAKE_INSTALL_PREFIX=/usr/local as an building option when you cmake, the building process on Caffe on system could be like this:
$ cd <caffe_source_directory>
$ mkdir biuld
$ cd build
$ cmake -D CMAKE_INSTALL_PREFIX=/usr/local ..
$ make all
$ make install
$ make all -j4
$ sudo make install
###After all these steps, the headers and libs of caffe will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobif, leveldb, glog, gflags and caffe will be recognized as already installed while building.
###After all these steps, the headers and libs of CAFFE will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobuf and caffe will be recognized as already installed while building. Protobuf is
#Compiling OpenCV
......@@ -22,7 +32,14 @@ $ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_T
$ make -j4
$ sudo make install
##Tips on compiling problems:
###If you encouter the no declaration errors when you 'make', it might becaused that you have installed the older version of cnn_3dobj module and the header file changed in a newly released version of codes. This problem is the cmake and make can't detect the header should be updated and it keeps the older header remains in /usr/local/include/opencv2 whithout updating. This error could be solved by remove the installed older version of cnn_3dobj module by:
$ cd /
$ cd usr/local/include/opencv2/
$ sudo rm -rf cnn_3dobj.hpp
###And redo the compiling steps above again.
#Building samples
......@@ -34,41 +51,43 @@ $ make
###Imagas generation from different pose, 4 models are used, there will be 276 images in all which each class contains 69 iamges
##Demo1: training data generation
###Imagas generation from different pose, by default there are 4 models used, there will be 276 images in all which each class contains 69 iamges, if you want to use additional .ply models, it is necessary to change the class number parameter to the new class number and also give it a new class label.
$ ./sphereview_test -plymodel=../3Dmodel/ape.ply -label_class=0
###press q to start
###press 'Q' to start 2D image genaration
$ ./sphereview_test -plymodel=../3Dmodel/ant.ply -label_class=1
###press q to start
###press 'Q' to start
$ ./sphereview_test -plymodel=../3Dmodel/cow.ply -label_class=2
###press q to start
###press 'Q' to start
$ ./sphereview_test -plymodel=../3Dmodel/plane.ply -label_class=3
###press q to start, when all images are created in images_all folder as a collection of images for network tranining and feature extraction, then proceed on.
###After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'.
###We could start triplet tranining using Caffe
###press 'Q' to start
###When all images are created in images_all folder as a collection of training images for network tranining and as a gallery of reference images for the classification part, then proceed on.
###After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'. Here I use the same as trianing and testing data, you can use different data for training and testing the performance in the CAFFE training process. It's important to observe the loss of testing data to check whether training data is suitable for the your aim. Loss should be obseved as keep decreasing and remain on a much smaller number than the initial loss.
###You could start triplet tranining using Caffe like this:
$ cd
$ cd <caffe_source_directory>
$ ./examples/triplet/
$ ./examples/triplet/
###After doing this, you will get .caffemodel files as the trained net work. I have already provide the net definition .prototxt files and the trained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/samples/build folder, you could just use them without training in caffe. If you are not interested on feature analysis with the help of binary files provided in Demo2, just skip to Demo3 for feature extraction or Demo4 for classifier.
###After doing this, you will get .caffemodel files as the trained parameter of net work. I have already provide the net definition .prototxt files and the pretrained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/samples/build/data folder, you could just use them without training in caffe.
##Demo2: feature extraction and classification
$ cd
$ cd <opencv_contrib>/modules/cnn_3dobj/samples/build
###Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. Demo2 should be used in advance to generate a file name list for the prediction list. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. Just run:
###Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. Just run:
$ ./classify_test
......@@ -90,11 +90,9 @@ namespace cnn_3dobj
/** @brief Icosohedron based camera view generator.
The class create some sphere views of camera towards a 3D object meshed from .ply files @cite hinterstoisser2008panter .
class CV_EXPORTS_W IcoSphere
class CV_EXPORTS_W IcoSphere
float X;
float Z;
......@@ -133,11 +131,10 @@ class CV_EXPORTS_W IcoSphere
CV_WRAP static void writeBinaryfile(string filenameImg, const char* binaryPath, const char* headerPath, int num_item, int label_class, int x, int y, int z);
/** @brief Write binary files used for training in other open source project.
class CV_EXPORTS_W Feature
class CV_EXPORTS_W DescriptorExtractor
caffe::Net<float>* net_;
cv::Size input_geometry_;
......@@ -154,11 +151,14 @@ class CV_EXPORTS_W Feature
/** @brief Convert the input image to the input image format of the network.
void list_dir(const char *path,std::vector<string>& files,bool r);
/** @brief Get the file name from a root dictionary.
void NetSetter(const string& model_file, const string& trained_file, const string& mean_file, const string& cpu_only, int device_id);
bool SetNet(const string& cpu_only, int device_id);
/** @brief Initiate a classification structure.
bool LoadNet(bool netsetter, const string& model_file, const string& trained_file, const string& mean_file);
/** @brief Initiate a classification structure.
void GetLabellist(const std::vector<string>& name_gallery);
......@@ -167,15 +167,16 @@ class CV_EXPORTS_W Feature
std::vector<std::pair<string, float> > Classify(const cv::Mat& reference, const cv::Mat& target, int N);
/** @brief Make a classification.
void FeatureExtract(InputArray inputimg, OutputArray feature, bool mean_subtract, std::string feature_blob);
void Extract(bool net_ready, InputArray inputimg, OutputArray feature, bool mean_subtract, std::string feature_blob);
/** @brief Extract a single featrue of one image.
std::vector<int> Argmax(const std::vector<float>& v, int N);
/** @brief Find the N largest number.
//! @}
//! @}
#define HAVE_CAFFE
......@@ -67,11 +67,12 @@ int main(int argc, char** argv)
string device = parser.get<string>("device");
int dev_id = parser.get<int>("dev_id");
cv::cnn_3dobj::Classification classifier;
classifier.NetSetter(network_forIMG, caffemodel, mean_file, device, dev_id);
cv::cnn_3dobj::DescriptorExtractor descriptor;
bool set_succeed = descriptor.SetNet(device, dev_id);
descriptor.LoadNet(set_succeed, network_forIMG, caffemodel, mean_file);
std::vector<string> name_gallery;
classifier.list_dir(src_dir.c_str(), name_gallery, false);
descriptor.list_dir(src_dir.c_str(), name_gallery, false);
for (unsigned int i = 0; i < name_gallery.size(); i++) {
name_gallery[i] = src_dir + name_gallery[i];
......@@ -80,7 +81,7 @@ int main(int argc, char** argv)
for (unsigned int i = 0; i < name_gallery.size(); i++) {
img_gallery.push_back(cv::imread(name_gallery[i], -1));
classifier.FeatureExtract(img_gallery, feature_reference, false, feature_blob);
descriptor.FeatureExtract(img_gallery, feature_reference, false, feature_blob);
std::cout << std::endl << "---------- Prediction for "
<< target_img << " ----------" << std::endl;
......@@ -92,9 +93,9 @@ int main(int argc, char** argv)
for (unsigned int i = 0; i < feature_reference.rows; i++)
std::cout << feature_reference.row(i) << endl;
cv::Mat feature_test;
classifier.FeatureExtract(img, feature_test, false, feature_blob);
descriptor.FeatureExtract(img, feature_test, false, feature_blob);
std::cout << std::endl << "---------- Featrue of target image: " << target_img << "----------" << endl << feature_test << std::endl;
prediction = classifier.Classify(feature_reference, feature_test, num_candidate);
prediction = descriptor.Classify(feature_reference, feature_test, num_candidate);
// Print the top N prediction.
std::cout << std::endl << "---------- Prediction result(Distance - File Name in Gallery) ----------" << std::endl;
for (size_t i = 0; i < prediction.size(); ++i) {
......@@ -6,8 +6,8 @@ namespace cv
namespace cnn_3dobj
void Feature::list_dir(const char *path,vector<string>& files,bool r)
void DescriptorExtractor::list_dir(const char *path,vector<string>& files,bool r)
DIR *pDir;
struct dirent *ent;
......@@ -25,7 +25,7 @@ namespace cnn_3dobj
sprintf(childpath, "%s/%s", path, ent->d_name);
......@@ -36,7 +36,9 @@ namespace cnn_3dobj
void Feature::NetSetter(const string& model_file, const string& trained_file, const string& mean_file, const string& cpu_only, int device_id)
bool DescriptorExtractor::SetNet(const string& cpu_only, int device_id)
if (strcmp(cpu_only.c_str(), "CPU") == 0 || strcmp(cpu_only.c_str(), "GPU") == 0)
if (strcmp(cpu_only.c_str(), "CPU") == 0)
......@@ -46,29 +48,54 @@ namespace cnn_3dobj
std::cout << "Using Device_id=" << device_id << std::endl;
return true;
std::cout << "Error: Device name must be 'GPU' together with an device number or 'CPU'." << std::endl;
return false;
bool DescriptorExtractor::LoadNet(bool netsetter, const string& model_file, const string& trained_file, const string& mean_file)
bool net_ready = false;
if (netsetter)
/* Load the network. */
net_ = new Net<float>(model_file, TEST);
CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";
if (net_->num_inputs() != 1)
std::cout << "Network should have exactly one input." << std::endl;
if (net_->num_outputs() != 1)
std::cout << "Network should have exactly one output." << std::endl;
Blob<float>* input_layer = net_->input_blobs()[0];
num_channels_ = input_layer->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1)
<< "Input layer should have 1 or 3 channels.";
if (num_channels_ != 3 && num_channels_ != 1)
std::cout << "Input layer should have 1 or 3 channels." << std::endl;
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
/* Load the binaryproto mean file. */
net_ready = true;
return net_ready;
std::cout << "Error: Device must be set in advance using SetNet function" << std::endl;
return net_ready;
void Feature::GetLabellist(const std::vector<string>& name_gallery)
void DescriptorExtractor::GetLabellist(const std::vector<string>& name_gallery)
for (unsigned int i = 0; i < name_gallery.size(); ++i)
/* Return the indices of the top N values of vector v. */
std::vector<int> Feature::Argmax(const std::vector<float>& v, int N)
std::vector<int> DescriptorExtractor::Argmax(const std::vector<float>& v, int N)
std::vector<std::pair<float, int> > pairs;
for (size_t i = 0; i < v.size(); ++i)
......@@ -80,8 +107,7 @@ namespace cnn_3dobj
return result;
//Return the top N predictions.
std::vector<std::pair<string, float> > Feature::Classify(const cv::Mat& reference, const cv::Mat& target, int N)
std::vector<std::pair<string, float> > DescriptorExtractor::Classify(const cv::Mat& reference, const cv::Mat& target, int N)
std::vector<float> output;
for (int i = 0; i < reference.rows; i++)
......@@ -102,15 +128,15 @@ namespace cnn_3dobj
/* Load the mean file in binaryproto format. */
void Feature::SetMean(const string& mean_file)
void DescriptorExtractor::SetMean(const string& mean_file)
BlobProto blob_proto;
ReadProtoFromBinaryFileOrDie(mean_file.c_str(), &blob_proto);
/* Convert from BlobProto to Blob<float> */
Blob<float> mean_blob;
CHECK_EQ(mean_blob.channels(), num_channels_)
<< "Number of channels of mean file doesn't match input layer.";
if (mean_blob.channels() != num_channels_)
std::cout << "Number of channels of mean file doesn't match input layer." << std::endl;
/* The format of the mean file is planar 32-bit float BGR or grayscale. */
std::vector<cv::Mat> channels;
float* data = mean_blob.mutable_cpu_data();
......@@ -130,7 +156,9 @@ namespace cnn_3dobj
mean_ = cv::Mat(input_geometry_, mean.type(), channel_mean);
void Feature::FeatureExtract(InputArray inputimg, OutputArray feature, bool mean_subtract, std::string featrue_blob)
void DescriptorExtractor::Extract(bool net_ready, InputArray inputimg, OutputArray feature, bool mean_subtract, std::string featrue_blob)
if (net_ready)
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(1, num_channels_,
......@@ -176,6 +204,9 @@ namespace cnn_3dobj
std::cout << "Network must be set properly using SetNet and LoadNet in advance.";
/* Wrap the input layer of the network in separate cv::Mat objects
......@@ -183,7 +214,7 @@ namespace cnn_3dobj
* don't need to rely on cudaMemcpy2D. The last preprocessing
* operation will write the separate channels directly to the input
* layer. */
void Feature::WrapInputLayer(std::vector<cv::Mat>* input_channels)
void DescriptorExtractor::WrapInputLayer(std::vector<cv::Mat>* input_channels)
Blob<float>* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
......@@ -197,7 +228,7 @@ namespace cnn_3dobj
void Feature::Preprocess(const cv::Mat& img,
void DescriptorExtractor::Preprocess(const cv::Mat& img,
std::vector<cv::Mat>* input_channels, bool mean_subtract)
/* Convert the input image to the input image format of the network. */
......@@ -231,9 +262,9 @@ std::vector<cv::Mat>* input_channels, bool mean_subtract)
* input layer of the network because it is wrapped by the cv::Mat
* objects in input_channels. */
cv::split(sample_normalized, *input_channels);
== net_->input_blobs()[0]->cpu_data())
<< "Input channels are not wrapping the input layer of the network.";
if (reinterpret_cast<float*>(input_channels->at(0).data)
!= net_->input_blobs()[0]->cpu_data())
std::cout << "Input channels are not wrapping the input layer of the network." << std::endl;
