Hierarchical Stochastic Feature Matching with Application to Hands-Free Speech Recognition
01 January 2001
In this paper, we improve hands-free speech recognition performance with a new "blind" feature compensation technique, using only hands-free test utterances and pre-trained HMM models (mismatched or matched) for compensation. Due to the complex nature of distortions in hands-free speech signals and limited available data for compensation, we propose to use a hierarchical transformation to compensate features. Several different methods are investigated to estimate the transformation parameters only from test utterances, including the standard ML (maximum likelihood) estimation in stochastic matching and two new methods: i) sequential MAP (maximum a posteriori) estimation and ii) structural MAP (SMAP). Compared to the ML method, the two methods attempt to explore some auxiliary information to solve data sparseness problem in transformation estimation. Hands-free speech recognition experiments using a car database demonstrate the effectiveness of the proposed techniques. Recognition performance of hands-free speech is significantly improved especially when serious mismatches exist between training and testing conditions, e.g., approximately 10% to 20% relative improvements in recognition accuracy have been observed in experiments.