Speech Enhancement Based Upon Hidden Markov Modeling
01 January 1989
A maximum a-posteriori approach for enhancing speech signals which have been degraded by statistically independent additive noise is proposed. The approach is based upon statistical modeling of the clean speech signal and the noise process using long training sequences from the two processes. Hidden Markov models (HMM's) with mixtures of Gaussian autoregressive (AR) output probability distributions are used to model the clean speech signal. The model for the noise process depends on its nature. For Gaussian noise with a theoretically flat power spectral density considered here, a low order Gaussian AR model is used. The parameter set of the HMM model is estimated using the Baum or the EM (estimation-maximization) algorithm. The enhancement of the noisy speech is done by means of reestimation of the clean speech waveform using the EM algorithm. An approximate improvement of 4.0-6.0 dB in signal to noise ratio (SNR) is achieved at 10 dB input SNR.