Dynamic Auditory Perception

Brian Strope

A Model of Dynamic Auditory Perception

Speech changes drastically with time. Natural sounds are non-stationary, or dynamic. Unfortunately, the majority of our understanding of the auditory system is based on (psychophysical and physiological) measurements of the perception of static sounds. Although the resulting auditory models may apply well to the perception of static sounds, it is not clear how these models should extend to the perception of more realistic, changing sounds.

The auditory system is also constantly adapting to a changing acoustic environment. Auditory perception is, therefore, also dynamic: how we hear sounds now is largely a function of the sounds we just heard. The goal of this work is to derive a computational model of dynamic auditory perception that predicts aspects of this dynamic sensitivity.

The structure of the model is similar to a multi-channel compression hearing aid: additive adaptation stages follow each output of critical-band-like filter bank. The filter bank models auditory frequency selectivity by separating sounds into appropriate bands. Following each filter, an adaptation stage adjusts as a function of the changing level of the signal in that band. The adaptation stages provide an increasing offset for quiet sounds and a decreasing offset for louder sounds.

We use perceptual pure-tone forward masking data to parameterize the model. Experiments which vary masker levels, and probe delays, lead to recovery (upward adaptation) parameters, while those that vary the masker duration are used to derive attack (downward adaptation) parameters.

After the adaptation stages we use a peak isolation algorithm based on raised sin cepstral liftering to isolate and identify dynamic spectral peaks.

Bark-frequency scale spectrograms of the words "nine six one three" before and after the adaptive processing of the model are available, as well as an overview of the performance improvements obtained using this model for robust speech recognition.

Publications

IEEE SAP 97: B. Strope and A. Alwan, "A model of dynamic auditory perception and its application to robust word recognition," IEEE Trans. on Speech and Audio Proc., Vol 5. No. 5, pp. 451-464, Sept. 1997.

1996 ICASSP: article and poster .

The 1995 NIH Conference on Hearing Aid Research: abstract and poster .

[UCLA] [EE] [SPAPL] [bps] [research]

bps@ucla.edu