Commit 73663871 authored by Wangyida's avatar Wangyida

add error solving tips and information about model in README

parent e4e374e2
##CNN for 3D object recognition and pose estimation including a completed Sphere View of 3D objects from .ply files, when the windows shows the coordinate, press 'q' to go on image generation.
#Convolutional Neural Network for 3D object classification and pose estimation.
============================================
#Building Process:
###Prerequisite for this module: protobuf, leveldb, glog, gflags and caffe, for the libcaffe installation, you can install it on standard system path for being able to be linked by this OpenCV module when compiling. Just using: -D CMAKE_INSTALL_PREFIX=/usr/local, so the building process on Caffe on system could be like this:
#Module Description on cnn_3dobj:
This learning structure construction and feature extraction concept is based on Convolutional Neural Network, the main reference paper could be found at:
https://cvarlab.icg.tugraz.at/pubs/wohlhart_cvpr15.pdf
The author provided Codes on Theano on:
https://cvarlab.icg.tugraz.at/projects/3d_object_detection/
I implemented the training and feature extraction codes mainly based on CAFFE project which will be compiled as libcaffe for the cnn_3dobj OpenCV module, codes are mainly concentrating on triplet and pair-wise jointed loss layer, the training data arrangement is also important which basic training information.
Codes about my triplet version of caffe are released on GIthub, you can git it through:
```
$ git clone https://github.com/Wangyida/caffe/tree/cnn_triplet.
```
============================================
#Module Building Process:
###Prerequisite for this module: protobuf and caffe, for the libcaffe installation, you can install it on standard system path for the aim of being able to be linked by this OpenCV module when compiling and function using. Using: -D CMAKE_INSTALL_PREFIX=/usr/local as an building option when you cmake, the building process on Caffe on system could be like this:
```
$ cd <caffe_source_directory>
$ mkdir biuld
$ cd build
$ cmake -D CMAKE_INSTALL_PREFIX=/usr/local ..
$ make all
$ make install
$ make all -j4
$ sudo make install
```
###After all these steps, the headers and libs of caffe will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobif, leveldb, glog, gflags and caffe will be recognized as already installed while building.
###After all these steps, the headers and libs of CAFFE will be set on /usr/local/ path, and when you compiling opencv with opencv_contrib modules as below, the protobuf and caffe will be recognized as already installed while building. Protobuf is
#Compiling OpenCV
```
......@@ -22,7 +32,14 @@ $ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_T
$ make -j4
$ sudo make install
```
##Tips on compiling problems:
###If you encouter the no declaration errors when you 'make', it might becaused that you have installed the older version of cnn_3dobj module and the header file changed in a newly released version of codes. This problem is the cmake and make can't detect the header should be updated and it keeps the older header remains in /usr/local/include/opencv2 whithout updating. This error could be solved by remove the installed older version of cnn_3dobj module by:
```
$ cd /
$ cd usr/local/include/opencv2/
$ sudo rm -rf cnn_3dobj.hpp
```
###And redo the compiling steps above again.
================================================
#Building samples
```
......@@ -34,41 +51,43 @@ $ make
```
=============
#Demo1:
###Imagas generation from different pose, 4 models are used, there will be 276 images in all which each class contains 69 iamges
#Demos
##Demo1: training data generation
###Imagas generation from different pose, by default there are 4 models used, there will be 276 images in all which each class contains 69 iamges, if you want to use additional .ply models, it is necessary to change the class number parameter to the new class number and also give it a new class label.
```
$ ./sphereview_test -plymodel=../3Dmodel/ape.ply -label_class=0
```
###press q to start
###press 'Q' to start 2D image genaration
```
$ ./sphereview_test -plymodel=../3Dmodel/ant.ply -label_class=1
```
###press q to start
###press 'Q' to start
```
$ ./sphereview_test -plymodel=../3Dmodel/cow.ply -label_class=2
```
###press q to start
###press 'Q' to start
```
$ ./sphereview_test -plymodel=../3Dmodel/plane.ply -label_class=3
```
###press q to start, when all images are created in images_all folder as a collection of images for network tranining and feature extraction, then proceed on.
###After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'.
###We could start triplet tranining using Caffe
###press 'Q' to start
###When all images are created in images_all folder as a collection of training images for network tranining and as a gallery of reference images for the classification part, then proceed on.
###After this demo, the binary files of images and labels will be stored as 'binary_image' and 'binary_label' in current path, you should copy them into the leveldb folder in Caffe triplet training, for example: copy these 2 files in <caffe_source_directory>/data/linemod and rename them as 'binary_image_train', 'binary_image_test' and 'binary_label_train', 'binary_label_train'. Here I use the same as trianing and testing data, you can use different data for training and testing the performance in the CAFFE training process. It's important to observe the loss of testing data to check whether training data is suitable for the your aim. Loss should be obseved as keep decreasing and remain on a much smaller number than the initial loss.
###You could start triplet tranining using Caffe like this:
```
$ cd
$ cd <caffe_source_directory>
$ ./examples/triplet/create_3d_triplet.sh
$ ./examples/triplet/train_3d_triplet.sh
```
###After doing this, you will get .caffemodel files as the trained net work. I have already provide the net definition .prototxt files and the trained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/samples/build folder, you could just use them without training in caffe. If you are not interested on feature analysis with the help of binary files provided in Demo2, just skip to Demo3 for feature extraction or Demo4 for classifier.
###After doing this, you will get .caffemodel files as the trained parameter of net work. I have already provide the net definition .prototxt files and the pretrained .caffemodel in <opencv_contrib>/modules/cnn_3dobj/samples/build/data folder, you could just use them without training in caffe.
==============
#Demo4:
##Demo2: feature extraction and classification
```
$ cd
$ cd <opencv_contrib>/modules/cnn_3dobj/samples/build
```
###Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. Demo2 should be used in advance to generate a file name list for the prediction list. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. Just run:
###Classifier, this will extracting the feature of a single image and compare it with features of gallery samples for prediction. This demo uses a set of images for feature extraction in a given path, these features will be a reference for prediction on target image. Just run:
```
$ ./classify_test
```
......
......@@ -90,92 +90,93 @@ namespace cnn_3dobj
/** @brief Icosohedron based camera view generator.
The class create some sphere views of camera towards a 3D object meshed from .ply files @cite hinterstoisser2008panter .
*/
class CV_EXPORTS_W IcoSphere
{
private:
float X;
float Z;
public:
std::vector<float> vertexNormalsList;
std::vector<float> vertexList;
std::vector<cv::Point3d> CameraPos;
std::vector<cv::Point3d> CameraPos_temp;
float radius;
float diff;
IcoSphere(float radius_in, int depth_in);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
CV_WRAP void norm(float v[]);
/** @brief Add new view point between 2 point of the previous view point.
*/
CV_WRAP void add(float v[]);
/** @brief Generating new view points from all triangles.
*/
CV_WRAP void subdivide(float v1[], float v2[], float v3[], int depth);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
CV_WRAP static uint32_t swap_endian(uint32_t val);
/** @brief Suit the position of bytes in 4 byte data structure for particular system.
*/
CV_WRAP cv::Point3d getCenter(cv::Mat cloud);
/** @brief Get the center of points on surface in .ply model.
*/
CV_WRAP float getRadius(cv::Mat cloud, cv::Point3d center);
/** @brief Get the proper camera radius from the view point to the center of model.
*/
CV_WRAP static void createHeader(int num_item, int rows, int cols, const char* headerPath);
/** @brief Create header in binary files collecting the image data and label.
*/
CV_WRAP static void writeBinaryfile(string filenameImg, const char* binaryPath, const char* headerPath, int num_item, int label_class, int x, int y, int z);
/** @brief Write binary files used for training in other open source project.
*/
};
class CV_EXPORTS_W Feature
{
private:
caffe::Net<float>* net_;
cv::Size input_geometry_;
int num_channels_;
cv::Mat mean_;
std::vector<string> labels_;
void SetMean(const string& mean_file);
/** @brief Load the mean file in binaryproto format.
*/
void WrapInputLayer(std::vector<cv::Mat>* input_channels);
/** @brief Wrap the input layer of the network in separate cv::Mat objects(one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer.
*/
void Preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels, bool mean_subtract);
/** @brief Convert the input image to the input image format of the network.
*/
public:
Feature();
void list_dir(const char *path,std::vector<string>& files,bool r);
/** @brief Get the file name from a root dictionary.
*/
void NetSetter(const string& model_file, const string& trained_file, const string& mean_file, const string& cpu_only, int device_id);
/** @brief Initiate a classification structure.
*/
void GetLabellist(const std::vector<string>& name_gallery);
/** @brief Get the label of the gallery images for result displaying in prediction.
*/
std::vector<std::pair<string, float> > Classify(const cv::Mat& reference, const cv::Mat& target, int N);
/** @brief Make a classification.
*/
void FeatureExtract(InputArray inputimg, OutputArray feature, bool mean_subtract, std::string feature_blob);
/** @brief Extract a single featrue of one image.
*/
std::vector<int> Argmax(const std::vector<float>& v, int N);
/** @brief Find the N largest number.
*/
};
//! @}
}}
*/
class CV_EXPORTS_W IcoSphere
{
private:
float X;
float Z;
public:
std::vector<float> vertexNormalsList;
std::vector<float> vertexList;
std::vector<cv::Point3d> CameraPos;
std::vector<cv::Point3d> CameraPos_temp;
float radius;
float diff;
IcoSphere(float radius_in, int depth_in);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
CV_WRAP void norm(float v[]);
/** @brief Add new view point between 2 point of the previous view point.
*/
CV_WRAP void add(float v[]);
/** @brief Generating new view points from all triangles.
*/
CV_WRAP void subdivide(float v1[], float v2[], float v3[], int depth);
/** @brief Make all view points having the some distance from the focal point used by the camera view.
*/
CV_WRAP static uint32_t swap_endian(uint32_t val);
/** @brief Suit the position of bytes in 4 byte data structure for particular system.
*/
CV_WRAP cv::Point3d getCenter(cv::Mat cloud);
/** @brief Get the center of points on surface in .ply model.
*/
CV_WRAP float getRadius(cv::Mat cloud, cv::Point3d center);
/** @brief Get the proper camera radius from the view point to the center of model.
*/
CV_WRAP static void createHeader(int num_item, int rows, int cols, const char* headerPath);
/** @brief Create header in binary files collecting the image data and label.
*/
CV_WRAP static void writeBinaryfile(string filenameImg, const char* binaryPath, const char* headerPath, int num_item, int label_class, int x, int y, int z);
/** @brief Write binary files used for training in other open source project.
*/
};
class CV_EXPORTS_W DescriptorExtractor
{
private:
caffe::Net<float>* net_;
cv::Size input_geometry_;
int num_channels_;
cv::Mat mean_;
std::vector<string> labels_;
void SetMean(const string& mean_file);
/** @brief Load the mean file in binaryproto format.
*/
void WrapInputLayer(std::vector<cv::Mat>* input_channels);
/** @brief Wrap the input layer of the network in separate cv::Mat objects(one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer.
*/
void Preprocess(const cv::Mat& img, std::vector<cv::Mat>* input_channels, bool mean_subtract);
/** @brief Convert the input image to the input image format of the network.
*/
public:
DescriptorExtractor();
void list_dir(const char *path,std::vector<string>& files,bool r);
/** @brief Get the file name from a root dictionary.
*/
bool SetNet(const string& cpu_only, int device_id);
/** @brief Initiate a classification structure.
*/
bool LoadNet(bool netsetter, const string& model_file, const string& trained_file, const string& mean_file);
/** @brief Initiate a classification structure.
*/
void GetLabellist(const std::vector<string>& name_gallery);
/** @brief Get the label of the gallery images for result displaying in prediction.
*/
std::vector<std::pair<string, float> > Classify(const cv::Mat& reference, const cv::Mat& target, int N);
/** @brief Make a classification.
*/
void Extract(bool net_ready, InputArray inputimg, OutputArray feature, bool mean_subtract, std::string feature_blob);
/** @brief Extract a single featrue of one image.
*/
std::vector<int> Argmax(const std::vector<float>& v, int N);
/** @brief Find the N largest number.
*/
};
//! @}
}
}
......
#ifndef __OPENCV_CNN_3DOBJ_CONFIG_HPP__
#define __OPENCV_CNN_3DOBJ_CONFIG_HPP__
// HAVE CAFFE
#define HAVE_CAFFE
#endif
......@@ -67,11 +67,12 @@ int main(int argc, char** argv)
string device = parser.get<string>("device");
int dev_id = parser.get<int>("dev_id");
cv::cnn_3dobj::Classification classifier;
classifier.NetSetter(network_forIMG, caffemodel, mean_file, device, dev_id);
cv::cnn_3dobj::DescriptorExtractor descriptor;
bool set_succeed = descriptor.SetNet(device, dev_id);
descriptor.LoadNet(set_succeed, network_forIMG, caffemodel, mean_file);
std::vector<string> name_gallery;
classifier.list_dir(src_dir.c_str(), name_gallery, false);
classifier.GetLabellist(name_gallery);
descriptor.list_dir(src_dir.c_str(), name_gallery, false);
descriptor.GetLabellist(name_gallery);
for (unsigned int i = 0; i < name_gallery.size(); i++) {
name_gallery[i] = src_dir + name_gallery[i];
}
......@@ -80,7 +81,7 @@ int main(int argc, char** argv)
for (unsigned int i = 0; i < name_gallery.size(); i++) {
img_gallery.push_back(cv::imread(name_gallery[i], -1));
}
classifier.FeatureExtract(img_gallery, feature_reference, false, feature_blob);
descriptor.FeatureExtract(img_gallery, feature_reference, false, feature_blob);
std::cout << std::endl << "---------- Prediction for "
<< target_img << " ----------" << std::endl;
......@@ -92,9 +93,9 @@ int main(int argc, char** argv)
for (unsigned int i = 0; i < feature_reference.rows; i++)
std::cout << feature_reference.row(i) << endl;
cv::Mat feature_test;
classifier.FeatureExtract(img, feature_test, false, feature_blob);
descriptor.FeatureExtract(img, feature_test, false, feature_blob);
std::cout << std::endl << "---------- Featrue of target image: " << target_img << "----------" << endl << feature_test << std::endl;
prediction = classifier.Classify(feature_reference, feature_test, num_candidate);
prediction = descriptor.Classify(feature_reference, feature_test, num_candidate);
// Print the top N prediction.
std::cout << std::endl << "---------- Prediction result(Distance - File Name in Gallery) ----------" << std::endl;
for (size_t i = 0; i < prediction.size(); ++i) {
......
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment