Amplitude Modulation Cues

Brian Strope

Pitch-Rate Amplitude Modulation Cues

Perceiving speech in an acoustically noisy environment requires intelligent use of redundant multi-dimensional cues spread over wide-ranging time scales. Speech perception research has often focused on spectral cues (from approximately 400-8000 Hz) that are available through a critical-band filtering mechanism. This approach is currently used in most automatic speech recognition (ASR) systems, together with a hierarchy of non-stationary stochastic models operating at the progressively slower rates of: the speech-frame, phoneme, word, phrase and even sentence.

Most notably for the current research, ASR systems typically ignore pitch-rate (80-300 Hz) information by integrating short-term spectral estimates over multiple pitch periods. The pitch-rate periodicities of voiced speech are likely to play a key role in the identification of spectro-temporal regions of voiced speech within the context of other non-correlated (or differently-correlated) background noises.

Simulations show that a running autocorrelogram can predict both temporal modulation transfer functions (TMTFs) and the perceptual distinction of high-pass filtered [s] and [z] in noise. Other simulations that use modulation filtering and envelope statistics, predict the TMTFs, but not the [s] [z] distinction.

Publications

ASA 98: B. Strope and A. Alwan, "Amplitude Modulation Cues for Perceptual Voicing Distinctions in Noise," to appear in Proc. of ASA, June 1998.

[UCLA] [EE] [SPAPL] [bps] [research]

bps@ucla.edu