Unified Acoustic Modeling for Continuous Speech Recognition

01 January 2000

New Image

Usually the speech and the silence models are trained together depending upon the type of recognition task. For example, id the recognition task is only on connected-digits then the corresponding digit models are built using only the connected-digit training corpus. Similarly for large-vocabulary recognition tasks, the subword or the phoneme models are generated using only the subword training set. Further the alphabet models are separately trained using the alphabet training data for letter recognition. In certain applications the developer need to perform mixed-mode operations like alphabet followed by digits, digits succeeded by keywords, letters preceded by keywords etc.,. So there is a need to robustly design a speech recognizer for such kind of specific applications. In that context, we propose several acoustic modeling techniques to improve the unified model performance for applications that require mixed-mode operations.