Towards Knowledge-Based Features for HMM Based Large Vocabulary Automatic Speech Recognition
13 May 2002
This paper describes an attempt to design a knowledge-based large vocabulary speech recognition system. Our motivation is to replace features based on the short-term spectra, such as Mel-frequency cepstral coefficients (MFCC), by features that explicitly represent some of the distinctive features of the speech signal. However, rather than attempting to compute acoustic correlates of these distinctive features, we have engineered an approach where neural networks are trained to map short-term spectral features to the posterior probability of some distinctive features. These probabilities are then used as features in a large vocabulary tied-state HMM-based recognizer. Experimental results on the Wall Street Journal Task show that such a system, while not outperforming a MFCC-based system, generates very different error patterns. After combining the results of a baseline MFCC system with the results of several systems based on the proposed approach, we were able to obtain reductions in word error rates of 19% and 10% on the 5K and 20K tasks respectively over our best MFCC-based systems.