Use of Voicing Features in HMM-Based Speech Recognition
01 July 2002
We investigate a class of speech recognition features related to voicing parameters that indicate whether the vocal chords are vibrating. We describe two such features, periodicity and jitter, as a powerful class of voicing discriminators. Features describing voicing characteristics of speech signals are integrated with an existing 38-dimensional feature vector consisting of first and second order time derivatives of the frame energy and of the cepstral coefficients with their first and second derivatives. HMM-based connected digit and large vocabulary recognition experiments comparing the traditional and extended feature sets show that voicing features and spectral information are complementary and that improved speech recognition performance is obtained by combining the two sources of information. We further conclude that the difference in performance with and without voicing becomes more significant when minimum string error (MSE) training is used than when maximum likelihood (ML) training is used.