Frequency Scaling of Speech Signals by Transform Techniques

01 November 1981

New Image

Frequency scaling of speech signals is a useful method for reducing the bandwidth requirements in analog and digital speech transmission systems.1"5 In analog systems the frequency compressed signal is transmitted at reduced bandwidth. In digital systems the frequency * On leave from the Electrical Engineering Department, Technion-Israel Institute of Technology, Haifa, Israel. 2107 compressed signal is waveform coded to provide reduced bit-rate transmission.4"7 A general block diagram of such a digital system is shown in Fig. 1. In this figure, the schematic spectral representation of the input speech signal is shown to consist of a spectral envelope with pronounced resonances (formant peaks) and of a fine structure because of pitch harmonics in voiced speech. The spectral envelope of the compressed signal is a scaled version of the input spectral envelope. However, different frequency scaling techniques may result in different fine structures. Since we do not refer at this point to any specific technique, Fig. 1 does not show the fine structure of the compressed signal. This suggests that frequency scaling techniques can be classified according to the way the fine spectral structure is scaled. In particular, one can distinguish between narrow-band techniques, such as the phase vocoder2 and time-domain harmonic scaling (TDHS)4 techniques, which aim at separating and scaling the individual pitch harmonics, and wide-band techniques, such as the analytic signal rooting (ASR) technique3 and the more recent constant Q transform (CQT) method,8,9 which aim at directly scaling the spectral envelope.