Automatic Generation of Phonetic Units for Continuous Speech Recognition
01 January 1989
Several procedures for context dependent phone modeling are investigated in this paper. The most important issue associated with context dependent phone models is the amount of training data needed to obtain reliable model estimates. In our system we initially design a single model for each defined phone unit then, using a clustering algorithm, we increase the number of models per unit until a desired maximum is reached or the training data is no longer sufficient to give reliable model parameter estimates.
A word learning procedure is used to assign weights to different models of a unit in order to account for the context of the unit within individual words and thereby provide more discrimination among words. The model estimation and word learning algorithms have been evaluated using a 450 sentence training database provided by one male talker and a set of test sentences spoken by the same talker with words drawn from a 252 word vocabulary.