2. Tesseract configure script may fail to detect leptonica, so you may have to edit the configure script - comment off some if's around this message and retain only "then" branch.
3. You are encouraged to search the Net for some better pre-trained classifiers, as well as classifiers for other languages.
Word spotting CNN
=================
Intro
-----
A word spotting CNN is a CNN that takes an image assumed to contain a single word and provides a probabillity over a given vocabulary.
Although other backends will be supported, for the moment only the Caffe backend is supported.
Instalation of Caffe backend
----------------------------
The caffe wrapping backend has the requirements caffe does.
* Caffe can be built against OpenCV, if the caffe backend is enabled, a circular bependency arises.
The simplest solution is to build caffe without support for OpenCV.
* Only the OS supported by Caffe are supported by the backend.
The scripts describing the module have been developed in ubuntu 16.04 and assume such a system.
Other UNIX systems including OSX should be easy to adapt.
Takes an image and a mask (where each connected component corresponds to a segmented character)
on input and returns recognized text in the output_text parameter. Optionally
provides also the Rects for individual text elements found (e.g. words), and the list of those
text elements with their confidence values.
* Takes an image and a mask (where each connected component corresponds to a
* segmented character) on input and returns recognized text in the
* output_text parameter. Optionally provides also the Rects for individual
* text elements found (e.g. words), and the list of those text elements with
* their confidence values.
@param image Input image CV_8UC1 or CV_8UC3 with a single text line (or word).
@param mask Input binary image CV_8UC1 same size as input image. Each connected component in mask corresponds to a segmented character in the input image.
* @param image Input image CV_8UC1 or CV_8UC3 with a single text line
* (or word).
@param output_text Output text. Most likely character sequence found by the HMM decoder.
* @param mask Input binary image CV_8UC1 same size as input image. Each
* connected component in mask corresponds to a segmented character in the
* input image.
@param component_rects If provided the method will output a list of Rects for the individual
text elements found (e.g. words).
* @param output_text Output text. Most likely character sequence found by
* the HMM decoder.
@param component_texts If provided the method will output a list of text strings for the
recognition of individual text elements found (e.g. words).
* @param component_rects If provided the method will output a list of Rects
* for the individual text elements found (e.g. words).
@param component_confidences If provided the method will output a list of confidence values
for the recognition of individual text elements found (e.g. words).
* @param component_texts If provided the method will output a list of text
* strings for the recognition of individual text elements found (e.g. words)
* .
@param component_level Only OCR_LEVEL_WORD is supported.
* @param component_confidences If provided the method will output a list of
* confidence values for the recognition of individual text elements found
* (e.g. words).
* @param component_level Only OCR_LEVEL_WORD is supported.
decoder_modemode=OCR_DECODER_VITERBI);// HMM Decoding algorithm (only Viterbi for the moment)
CV_WRAPstaticPtr<OCRHMMDecoder>create(constPtr<OCRHMMDecoder::ClassifierCallback>classifier,// The character classifier with built in feature extractor
constString&vocabulary,// The language vocabulary (chars when ascii english text)
// size() must be equal to the number of classes
InputArraytransition_probabilities_table,// Table with transition probabilities between character pairs
// cols == rows == vocabulari.size()
InputArrayemission_probabilities_table,// Table with observation emission probabilities
// cols == rows == vocabulari.size()
CV_WRAPstaticPtr<OCRHMMDecoder>create(
constPtr<OCRHMMDecoder::ClassifierCallback>classifier,// The character classifier with built in feature extractor
constString&vocabulary,// The language vocabulary (chars when ascii english text) size() must be equal to the number of classes
InputArraytransition_probabilities_table,// Table with transition probabilities between character pairs cols == rows == vocabulari.size()
/** @brief Utility function to create a tailored language model transitions table from a given list of words (lexicon).
*
/** @brief Utility function to create a tailored language model transitions
* table from a given list of words (lexicon).
* @param vocabulary The language vocabulary (chars when ascii english text).
*
* @param lexicon The list of words that are expected to be found in a particular image.
*
* @param transition_probabilities_table Output table with transition probabilities between character pairs. cols == rows == vocabulary.size().
*
* The function calculate frequency statistics of character pairs from the given lexicon and fills the output transition_probabilities_table with them. The transition_probabilities_table can be used as input in the OCRHMMDecoder::create() and OCRBeamSearchDecoder::create() methods.
* @param transition_probabilities_table Output table with transition
* probabilities between character pairs. cols == rows == vocabulary.size().
* The function calculate frequency statistics of character pairs from the given
* lexicon and fills the output transition_probabilities_table with them. The
* transition_probabilities_table can be used as input in the
* OCRHMMDecoder::create() and OCRBeamSearchDecoder::create() methods.
* @note
* - (C++) An alternative would be to load the default generic language transition table provided in the text module samples folder (created from ispell 42869 english words list) :
@param beam_size Size of the beam in Beam Search algorithm.
*/
staticPtr<OCRBeamSearchDecoder>create(constPtr<OCRBeamSearchDecoder::ClassifierCallback>classifier,// The character classifier with built in feature extractor
conststd::string&vocabulary,// The language vocabulary (chars when ascii english text)
// size() must be equal to the number of classes
...
...
@@ -441,6 +534,44 @@ public:
intmode=OCR_DECODER_VITERBI,// HMM Decoding algorithm (only Viterbi for the moment)
intbeam_size=500);// Size of the beam in Beam Search algorithm
/** @brief This method allows to plug a classifier that is derivative of TextImageClassifier in to
* OCRBeamSearchDecoder as a ClassifierCallback.
@param classifier A pointer to a TextImageClassifier decendent
@param alphabet The language alphabet one char per symbol. alphabet.size() must be equal to the number of classes
of the classifier. In future editinons it should be replaced with a vector of strings.
@param transition_probabilities_table Table with transition probabilities between character
pairs. cols == rows == alphabet.size().
@param emission_probabilities_table Table with observation emission probabilities. cols ==
rows == alphabet.size().
@param windowWidth The width of the windows to which the sliding window will be iterated. The height will
be the height of the image. The windows might be resized to fit the classifiers input by the classifiers
preprocessor.
@param windowStep The step for the sliding window
@param mode HMM Decoding algorithm (only Viterbi for the moment)
@param beam_size Size of the beam in Beam Search algorithm
*/
// CV_WRAP static Ptr<OCRBeamSearchDecoder> create(const Ptr<TextImageClassifier> classifier, // The character classifier with built in feature extractor
// String alphabet, // The language alphabet one char per symbol
// // size() must be equal to the number of classes
// InputArray transition_probabilities_table, // Table with transition probabilities between character pairs
// // cols == rows == alphabet.size()
// InputArray emission_probabilities_table, // Table with observation emission probabilities
// // cols == rows == alphabet.size()
// int windowWidth, // The width of the windows to which the sliding window will be iterated.
// // The height will be the height of the image. The windows might be resized to
// // fit the classifiers input by the classifiers preprocessor
// int windowStep = 1 , // The step for the sliding window
// int mode = OCR_DECODER_VITERBI, // HMM Decoding algorithm (only Viterbi for the moment)
// int beam_size = 500); // Size of the beam in Beam Search algorithm