Unsupervised Bootstrapping of Diphone-like Templates for Connected Speech Recognition
01 January 1987
This paper describes an unsupervised procedure for the construction of template sets for connected speech recognition. The procedure has been developed for use in a speech recognition system based on a "segment spotting" approach, where the segments are diphone- like units. The procedure makes use of both phonetic and acoustic knowledge: the former consists of a model of all the words in the task language in terms of the chosen units; the latter is implicitly represented by an initial set of "training" templates.