Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis

15 March 1999

New Image

In an effort to increase the naturalness of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied prosodic and spectral characteristics in the database, it is not desirable to have variable voice quality. We present an automatic method for voice quality assessment and correction, whenever necessary, of large speech databases for concatenative speech synthesis. The proposed method is based on the use of a Gaussian mixture model, GMM, to model the acoustic space of the speaker of the database and on autoregressive filters for compensation. An objective method to measure the effectiveness of the database correction based on a likelihood function for the speaker's GMM, is presented as well. Both objective and subjective results show that the proposed method succeeds in detecting voice quality problems and successfully corrects them. Results show a 14.2% improvement of the log-likelihood function after compensation