ASA/J 96

Brian Strope

Dynamic auditory representations and statistical speech recognition

B. Strope and A. Alwan

The most common spectral estimation algorithm used for automatic speech recognition incorporates rough approximations of basic aspects of auditory modeling: frequency selectivity and magnitude compression. Attempting to improve the robustness and overall performance of ASR, researchers have proposed more sophisticated auditory models as the spectral estimation front end, with generally modest success at best. One common concern throughout these efforts is that the representation derived from an auditory model may not be a good match for typical statistical recognition algorithms. Recently we have derived and implemented a dynamic auditory model that emphasizes changing local spectral peaks, and has improved recognition robustness compared to other common front ends. The present work uses specific case examples to show how the perceptual representation leads to a softening of the resulting statistical models. The work also proposes a simple mechanism to adapt dynamic spectral features into a form more suitable for segmentally static statistical charachterization. The mechanism is based on approximating the temporal derivative of the frequency position of local spectra peaks. We discuss the impact of our auditory model with this processing mechanism on robust recognition performance. [Work supported by NIH Grant No. 1 R29 DC 02033-01A1].

Available in postscript:

[UCLA] [EE] [SPAPL] [bps] [research]

bps@ucla.edu