Unsupervised Bootstrapping of Diphone-Like Templates for Connected Speech Recognition

01 January 1987

New Image

An unsupervised template bootstrapping procedure has been implemented for use in a connected speech recognition system in which words are represented as sequences of diphone-like sub-word units. In this recognition system diphone-like units are regarded as phonetic labels whose "probability" or "intensity" is continually measured in the input speech signal. Words are represented as sequences of units with specifications of the maximum allowable separations between adjacent units. The recognition of an utterance is made by "spotting" occurrences of the units specified by the word models in the unknown input signal, which in turn is done by measuring the dissimilarity with templates corresponding to the units.