This topic shows how to run speech recognition, demonstrates acoustic model inference and Weighted Finite State Transducer (WFST) language model decoding based on Kaldi\* acoustic neural models, Intel® Rockhopper Trail language models, and speech feature vectors.
## How It Works
The workflow is as follows:
1. The application reads command-line parameters
and loads a Kaldi-trained neural network along with a Kaldi `.ark` speech feature vector file to the Inference Engine plugin.
2. The application performs inference and passes acoustic scores vectors to decoding stage, and
Intel® Rockhopper Trail decoder translates them into a text transcription.
3. The application prints recognized text on a screen.
### Acoustic and Language Model Setup
Pretrained models are available at [Intel® Open Source Technology Center](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi) and [Intel® OpenVINO™ Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader). For this sample, we use models from `librispeech\s5_ext` folder.
To train models from scratch, refer to a shell-script Kaldi training recipe `lspeech_s5_ext_run.sh` and corresponding documentation `lspeech_s5_ext.md`.
To convert a Kaldi acoustic model into an Intermediate Representation (IR) format acceptable by this sample, use the following Model Optimizer command: