The Phonetic Relevance of Temporal Decomposition

New Image

Articulatory phonetics describes speech as a sequence of overlapping articulatory gestures, each of which may be associated with a characteristic ideal target spectrum. In normal speech, the idealized target gestures for each speech sound are often never attained, and the speech signal exhibits only transitions between such (implicit) targets. It has been suggested that the underlying speech sounds can be recovered from the acoustic signal only by reference to detailed knowledge of the gestures by which individual speech sounds are produced. We will show that it is possible to decompose the speech signal into a sequence of overlapping 'temporal transition functions' using techniques which make no assumption about the phonetic structure of the signal or the articulatory constraints used in speech production. Previous work has shown that these techniques can produce a large reduction in the information rate needed to represent the spectral information in speech signals. [B.S. Atal, Proc. ICASSP 83, 2.6, 81-84 (1983)]. It will be shown that these methods are able to derive speech components of low bandwidths which vary on a time scale closely related to that of traditional phonetic events. Implications for speech perception and the application of such techniques both for speech coding and as a possible front-end for speech recognition will be dicussed.