The text module now have a text detection and recognition using deep CNN. The text detector deep CNN that takes an image which may contain multiple words. This outputs a list of Rects with bounding boxes and probability of text there. The text recognizer provides a probabillity over a given vocabulary for each of these rects.
Two backends are supported 1) caffe 2) opencv-dnn
Instalation of Caffe backend
----------------------------
* Please note a custom caffe based on SSD branch is required, the link of the custom caffe is provided below
The caffe wrapping backend has the requirements caffe does.
* Caffe can be built against OpenCV, if the caffe backend is enabled, a circular bependency arises.
The simplest solution is to build caffe without support for OpenCV.
* Only the OS supported by Caffe are supported by the backend.
The scripts describing the module have been developed in ubuntu 16.04 and assume such a system.
Other UNIX systems including OSX should be easy to adapt.
* Takes an image and a mask (where each connected component corresponds to a
* segmented character) on input and returns recognized text in the
* output_text parameter. Optionally provides also the Rects for individual
* text elements found (e.g. words), and the list of those text elements with
* their confidence values.
* @param image Input image CV_8UC1 or CV_8UC3 with a single text line
* (or word).
Takes an image and a mask (where each connected component corresponds to a segmented character)
on input and returns recognized text in the output_text parameter. Optionally
provides also the Rects for individual text elements found (e.g. words), and the list of those
text elements with their confidence values.
* @param mask Input binary image CV_8UC1 same size as input image. Each
* connected component in mask corresponds to a segmented character in the
* input image.
@param image Input image CV_8UC1 or CV_8UC3 with a single text line (or word).
@param mask Input binary image CV_8UC1 same size as input image. Each connected component in mask corresponds to a segmented character in the input image.
* @param output_text Output text. Most likely character sequence found by
* the HMM decoder.
@param output_text Output text. Most likely character sequence found by the HMM decoder.
* @param component_rects If provided the method will output a list of Rects
* for the individual text elements found (e.g. words).
@param component_rects If provided the method will output a list of Rects for the individual
text elements found (e.g. words).
* @param component_texts If provided the method will output a list of text
* strings for the recognition of individual text elements found (e.g. words).
@param component_texts If provided the method will output a list of text strings for the
recognition of individual text elements found (e.g. words).
* @param component_confidences If provided the method will output a list of
* confidence values for the recognition of individual text elements found
* (e.g. words).
@param component_confidences If provided the method will output a list of confidence values
for the recognition of individual text elements found (e.g. words).
* @param component_level Only OCR_LEVEL_WORD is supported.
/** @brief Utility function to create a tailored language model transitions table from a given list of words (lexicon).
*
* @param vocabulary The language vocabulary (chars when ASCII English text).
*
* @param lexicon The list of words that are expected to be found in a particular image.
* @param transition_probabilities_table Output table with transition
* probabilities between character pairs. cols == rows == vocabulary.size().
* The function calculate frequency statistics of character pairs from the given
* lexicon and fills the output transition_probabilities_table with them. The
* transition_probabilities_table can be used as input in the
* OCRHMMDecoder::create() and OCRBeamSearchDecoder::create() methods.
*
* @param transition_probabilities_table Output table with transition probabilities between character pairs. cols == rows == vocabulary.size().
*
* The function calculate frequency statistics of character pairs from the given lexicon and fills the output transition_probabilities_table with them. The transition_probabilities_table can be used as input in the OCRHMMDecoder::create() and OCRBeamSearchDecoder::create() methods.
* @note
* - (C++) An alternative would be to load the default generic language
* transition table provided in the text module samples folder (created
* - (C++) An alternative would be to load the default generic language transition table provided in the text module samples folder (created from ispell 42869 english words list) :
@param beam_size Size of the beam in Beam Search algorithm.
*/
staticPtr<OCRBeamSearchDecoder>create(constPtr<OCRBeamSearchDecoder::ClassifierCallback>classifier,// The character classifier with built in feature extractor
conststd::string&vocabulary,// The language vocabulary (chars when ASCII English text)
// size() must be equal to the number of classes
...
...
@@ -598,29 +502,10 @@ public:
intmode=OCR_DECODER_VITERBI,// HMM Decoding algorithm (only Viterbi for the moment)
intbeam_size=500);// Size of the beam in Beam Search algorithm
/** @brief Creates an instance of the OCRBeamSearchDecoder class. Initializes HMMDecoder from the specified path.
@overload
@param filename path to a character classifier file
@param vocabulary The language vocabulary (chars when ASCII English text). vocabulary.size()
must be equal to the number of classes of the classifier..
@param transition_probabilities_table Table with transition probabilities between character
pairs. cols == rows == vocabulary.size().
@param emission_probabilities_table Table with observation emission probabilities. cols ==
rows == vocabulary.size().
@param mode HMM Decoding algorithm (only Viterbi for the moment)
@param beam_size Size of the beam in Beam Search algorithm
*/
CV_WRAPstaticPtr<OCRBeamSearchDecoder>create(constString&filename,// The character classifier file
constString&vocabulary,// The language vocabulary (chars when ASCII English text)
...
...
@@ -631,7 +516,6 @@ public:
// cols == rows == vocabulary.size()
intmode=OCR_DECODER_VITERBI,// HMM Decoding algorithm (only Viterbi for the moment)