Brian Strope
Dynamic auditory representations and statistical speech recognition
B. Strope and A. Alwan
The most common spectral estimation algorithm used for automatic speech
recognition incorporates rough approximations of basic aspects of
auditory modeling: frequency selectivity and magnitude compression.
Attempting to improve the robustness and overall performance of ASR,
researchers have proposed more sophisticated auditory models as the
spectral estimation front end, with generally modest success at best.
One common concern throughout these efforts is that the representation
derived from an auditory model may not be a good match for typical
statistical recognition algorithms. Recently we have derived and
implemented a dynamic auditory model that emphasizes changing local
spectral peaks, and has improved recognition robustness compared to
other common front ends. The present work uses specific case examples
to show how the perceptual representation leads to a softening of the
resulting statistical models. The work also proposes a simple mechanism
to adapt dynamic spectral features into a form more suitable for
segmentally static statistical charachterization. The mechanism is
based on approximating the temporal derivative of the frequency position
of local spectra peaks. We discuss the impact of our auditory model
with this processing mechanism on robust recognition performance.
[Work supported by NIH Grant No. 1 R29 DC 02033-01A1].
Available in postscript: