Matrix Quantization for Very Low Data Rate Voice Coding
Matrix Quantization extends the theory of vector quantization for voice coding to take advantage of the phonological and phonotactic properties of speech. The speech signal is first modeled by LPC spectra at 10 msec time intervals. By adopting an acoustically defined (and demisyllable like) automatic segmentation procedure, consecutive LPC spectra are grouped into LPC coefficient matrices. A distortion measure that consists of non-linear time warping and a standard LPC distance metric is also developed for comparing the "likeliness" of two matrices. The matrix codebook is generated from a speech database using a simple mini-max procedure and the matrix distortion measure. A nearest neighbor full search procedure is used during speech encoding. Data rates under 100 bit/sec can be achieved for the LPC spectra. Results for a single speaker experiment based on the matrix quantization method and subsequent research on very similar techniques will be discussed.