Pitch-Adaptive DPCM Coding of Speech With Two-Bit Quantization and Fixed Spectrum Prediction
01 March 1977
An important subclass of speech waveform encoders is characterized by the use of adaptive quantization and predictive (DPCM) encoding.1 Time-invariant spectrum predictors are simple to implement and robust in the context of coarse quantization. The benefits of adaptive prediction are, however, well recognized and documented,2,3 and the greatest 439 achievements in bit-rate reduction have in fact depended on the use of adaptive short-term (spectrum) prediction as well as adaptive long-term (pitch) prediction, as seen in the paper by Atal and Schroeder.4 This paper is concerned with the relatively less documented combination of adaptive pitch prediction and nonadaptive spectrum prediction. The study of this kind of prediction is motivated by the observation that speech waveforms abound in highly periodic segments and by the conjecture that the use of this periodicity may provide a prediction potential that is substantial enough to obviate the need for adaptive short-term (spectrum) prediction. The attraction in this approach will evidently depend on the complexity of pitch detection itself. The pitch detectors used in this paper are based on autocorrelation and A M D F (average magnitude difference function) and are quite simple to implement; they are indeed much simpler than the mean-squared-errorminimizing pitch detector described in Ref. 4. Moreover, as discussed in Section IV, the success of pitch-adaptive D P C M does not depend critically on accurate pitch detection in the sense in which the term is used in formal speech research.5 A thesis by Trottier 6 considers the possibility of simplifying the Atal-Schroeder encoder.4 Among other things, this thesis discusses simple pitch-detection algorithms, the criticality of a well-designed adaptive quantizer, and the inefficiency of approaches seeking to simplify adaptive spectrum prediction through the use of very few predictor taps, say two.