Speech/Silence Segmentation for Real-Time Coding Via Rule Based Adaptive Endpoint Detection
A new algorithmic technique is presented for efficiently implementing the end-point decisions necessary to separate and segment speech from noisy background environments. The algorithm utilizes a set of computationally efficient production rules that are used to generate speech and noise metrics continuously from the input speech waveform. These production rules are based on statistical assumptions about the characteristics of the speech and noise waveforms and are generated via time-domain processing to achieve a zero delay decision. An end-pointer compares the speech and silence metrics using an adaptive thresholding scheme with a hysteresis characteristic to control the switching speed of the speech/silence decision.