Brian Strope
Pitch-Rate Amplitude Modulation Cues
Perceiving speech in an acoustically noisy environment requires
intelligent use of redundant multi-dimensional cues spread over
wide-ranging time scales. Speech perception research has often focused
on spectral cues (from approximately 400-8000 Hz) that are available
through a critical-band filtering mechanism. This approach is currently
used in most automatic speech recognition (ASR) systems, together with a
hierarchy of non-stationary stochastic models operating at the
progressively slower rates of: the speech-frame, phoneme, word, phrase
and even sentence.
Most notably for the current research, ASR systems typically ignore
pitch-rate (80-300 Hz) information by integrating short-term spectral
estimates over multiple pitch periods. The pitch-rate periodicities of
voiced speech are likely to play a key role in the identification of
spectro-temporal regions of voiced speech within the context of other
non-correlated (or differently-correlated) background noises.
Simulations show that a running autocorrelogram can predict both
temporal modulation transfer functions (TMTFs) and the perceptual
distinction of high-pass filtered [s] and [z] in noise. Other simulations
that use modulation filtering and envelope statistics, predict the
TMTFs, but not the [s] [z] distinction.
Publications
[UCLA]
[EE]
[SPAPL]
[bps]
[research]
bps@ucla.edu