We almost always listen to speech which is degraded by the addition of competing speech and non-speech signals. Fortunately, we are remarkably adept at isolating a specific speech signal from the background noise and understanding what is said. The purpose of this study is to contribute to a broad research program whose aim is to understand and model human perception of speech in noise.
Developing quantitative models of speech perception in noise is important for providing insights into our cognitive abilities and into the perceptual mechanisms of the hearing impaired. People suffering from hearing loss often have the greatest difficulty understanding speech in noisy environments. Quantitative models describing why healthy hearing manages so well in noisy environments are imperative to design hearing aids that would begin to recover the noise robustness lost with hearing impairment.
The study will also be useful in the development of robust automatic speech recognition algorithms. Currently, the performance of automatic speech recognizers deteriorates significantly at signal-to-noise ratios high enough for humans to hear and understand perfectly.
Work progresses on two fronts: [1] developing fully-parameterized models which predict human perception in noisy environments, and [2] incorporating these models, or aspects of them, into automatic speech recognition systems and speech coders, to improve the systems' performance in noise.
In a recent study, perceptual experiments were conducted to derive a fully-parameterized model of dynamic auditory perception. The dynamic model predicts the saliency of different parts of changing sounds, providing one possible key to understanding the perception of dynamic speech within static noise backgrounds. Initial evaluation of this model incorporated in a simple speech recognition system shows promise improving recognition noise-robustness.
In another set of perceptual experiments, we attempt to quantify the relationship between masked thresholds of signals within noise as a function of signal center frequency, duration, bandwidth, and signal type. These data emphasize that perceptually-based analysis of speech in noise must account for durational, bandwidth, and signal-type effects.
Work supported by NIH-NIDCD 5 R29 DC 02033-02 and NSF.
Masking, Speech-in-Noise, Auditory Models.
