Nonlinear Dynamical Modeling of Speech Using Neural Networks
01 January 1988
The standard model of speech assumes that speech is generated by a stochastic or periodic excitation, fed into a linear all-pole filter. Though supported by the physical structure of the vocal tract, this model ignores all possible nonlinearities in the speech production system. The relevance of such nonlinearities can be examined by directly measuring the dimensionality of an underlying attractor of the speech generating dynamical system. By embedding the signal samples in high dimensional Euclidean spaces, the speech signal is shown to lie on a relatively low dimensional attractor. The dimension of this attractor, D, varies between 2 to 5 for voiced speech sounds and 4 to 9 for unvoiced speech. This suggests an alternative to the stochastic autoregressive models, by modeling speech with a low dimensional, time dependent, nonlinear dynamical system: every block of p+1 samples is approximately related by a nonlinear functional relation.