Recognition Performance

Brian Strope

Word Recognition Performance

A DTW-based recognizer used for a talker-dependent digit recognition task degrades less quickly in noise if we use the dynamic model with peak isolation.

(LPCC: LPC-based cepstral coefficients; MFCC: Mel-Frequency cepstral coefficients; MFCCA: MFCC with adaptation stages on the output of the Mel-Filters; MFCCAP: MFCCA with peak isolation.)

With an HMM-based recognizer and a talker-independent digit recognition task, the dynamic model with peak isolation remains the most robust front end.

(The peak isolation algorithm was evaluated with each of the front ends: LPCC, MFCC, and MFCCA. Cirles (o) indicate the front ends without peak isolation, and pluses (+) mark the front ends with peak isolation.)

When we train a clean and a noisy HMM for each word we have a mechanism to account for the context dependence of the dynamic model. With this system, MFCCA has a greater improvement over MFCC, and MFCCAP remains the most robust.

(Cirles (o) indicate the front ends without peak isolation, and pluses (+) mark the front ends with peak isolation.)

We also compare the recognition peformance with RASTA front-ends. The first graph shows the performance degration with clean models, and the second with the combination of clean and noisy models. Both the dynamic model (MFCCA) and RASTA (MFCC-RASTA) are evaluated with (+) and without (o) the peak isolation algorithm.

For all systems, the dynamic model shows an improvement to noise robustness over more standard front ends. The dynamic model with peak isolation reduces the error rate over standard MFCC by a factor of 2-3.

More about the dynamic model.

[UCLA] [EE] [SPAPL] [bps] [research]

bps@ucla.edu