Computer Synthesis of Speech by Concatenation of Formant-Coded Words

01 May 1971

New Image

If computers could speak with sophisticated vocabularies they could provide a variety of automatic information services. Machines could be interrogated from conventional Touch-Tone® telephones and stored d a t a could be accessed by voice. N a t u r a l l y spoken speech messages can of course be prerecorded and stored. However, the digital storage required for sizeable amounts of natural speech is inordinate. Further, elements of natural speech in one context cannot be realistically assembled into a different message. With individual pieces of the signal waveform there is no practical way of making natural transitions from one element to the next. In certain messages of highly limited context--notably the Automatic Intercept System--individual words are adequately abutted by having 1541 1556 THE BELL SYSTEM TECH? ICAL JOURNAL, M A Y - J U N E 1971 more t h a n one spoken version of each word. In general, however, sentence-length material cannot be satisfactorily produced in this manner. For answer-back purposes, requiring sizeable vocabularies, an efficient means of storing and accessing speech information is required. This requirement implies low bit-rate representation of vocabulary elements and a flexible means for assembling the vocabulary elements into any message specified by the answer-back program. Toward this requirement, we have devised a synthesis method based upon formantcoded vocabulary elements. F o r m a n t s are the resonances, or eigenfrequencies, of the vocal tract.