No accepted standard system exists for describing pathological voice qualities. Qualities are labeled based on the perceptual judgments of individual clinicians, a procedure plagued by inter- and intra-rater inconsistencies and terminological confusions. Synthetic pathological voices could be useful as an element in a standard protocol for quality assessment. In this project, we develop guidelines for synthesizing some kinds of severely pathological voice qualities in the hope of making synthesis less of a subjective art, as it currently is, and more of a science.
Speech synthesizers with the ability to model a range of vocal qualities have many applications, including high-quality synthesis, improved vocal prostheses, and analysis and coding of natural-sounding speech. Accordingly, source models have received increasing attention in the literature. Recent studies have focused on variations in normal quality, rather than on pathology. With the exception of studies by Childers and colleagues, attempts to synthesize pathological voices have not been reported, and synthesis for such voices is not well developed.
An analysis-by-synthesis approach using the Klatt formant synthesizer was applied to study 24 tokens of the vowel /a/ spoken by males and females with severe voice disorders. Voice qualities included rough and rough-breathy; bifurcated; rough-bifurcated; strained-rough; and strained-breathy. Both temporal and spectral features of the natural waveforms were analyzed and the results were used to guide synthesis. These results led us to suggest a number of modifications to the Klatt synthesizer to facilitate synthesis of pathological voices.
Research supported in part by NIDCD grant DC 01797.
Analysis-by-Synthesis; Voice Quality; Pathological Voices.
B. Gabelman, J. Kreiman, B. Gerratt, N. Antonanzas-Barroso, and A. Alwan, `` Perceptually motivated modeling of noise in pathological voices,'' JASA, May 1998, Vol. 103, Issue 5, p. 2894
B. Gerratt, J. Kreiman, N. Antonanzas-Barroso, B. Gabelman, and A. Alwan,``Source modeling of severely pathological voices,'' JASA, May 1998, Vol. 103, Issue 5, p. 2892
P. Bangayan, C. Long, A. Alwan, J. Kreiman, B. Gerratt, "Analysis by synthesis of pathological voices using the Klatt synthesizer," Speech Communication (22)4 (1997) pp. 343-368.
B. Gabelman, J. Kreiman, B. Gerratt, N. Antonanzas-Barroso, and A. Alwan, ``LF source model adequacy for pathological voices,'' JASA, Nov. 1997, Vol. 102, Issue 5, p. 32
B. Gabelman, J. Kreiman, B. Gerratt, and A. Alwan, ``Optimization for source waveform synthesis of pathological voices,'' JASA, April 1996, Vol. 99, Issue 4, p. 2549
A. Alwan, P. Bangayan, J. Kreiman, and C. Long, " Time and Frequency Synthesis Parameters for Severe Pathological Voice Qualities," Int. Con. Phonet. Sci. (ICPhS) Proc., Stockholm, Sweden, August 1995, Vol. 2, 250-253.
P. Bangayan, A. Alwan, J. Kreiman, and C. Long, " Synthesis of Severely Pathological Voices," JASA, May 1994, Vol. 95, No. 5, 1pSP5.
C. Long, P. Bangayan, and A. Alwan, " Acoustic Analysis and Synthesis of Pathological Voice Qualities," JASA, Vol. 93, No. 3, Pt. 2, 2aSP9, Oct. 1993.
Demonstrations of the natural and synthesized voices are included with the papers below:
Back to SPAPL H ome Page.
Philbert Bangayan (bangayan@icsl.ucla.edu)