Quantitative models of the human speech production system are needed for a better understanding of our cognitive abilities and for the development of high-quality speech synthesizers and automatic speech recognition systems. In previous studies, information regarding the vocal tract geometry during speech production has been mainly derived from lateral x-ray data. The main limitations of x-rays include radiation risks and difficulty in accurately deducing the cross-sectional morphology from mid-sagittal profiles.
Magnetic resonance imaging (MRI) is a powerful tool in obtaining the vocal-tract geometry and does not involve any known radiation risks. The images have good signal to noise ratio, are amenable to computerized 3-D modeling, and provide excellent structural differentiation. In addition, the tract (airway) area and volume can be directly calculated. The low image sampling rate, however, has restricted MRI use to the study of sustained speech sounds, corresponding to `static' tract shapes. In addition, the high expense associated with using MRI equipment, has restricted its use in speech research. Previous MRI studies have been mostly limited to vowels and nasal consonants.
In this study, articulatory data are obtained from Magnetic Resonance Images (MRI) and Dynamic Electropalatography (EPG). MRI reveals the 3D geometry of the vocal tract while EPG is important for studying articulatory dynamics. The modeling approach is based on estimation theory, acoustics, and signal-processing techniques and uses the data obtained from the unified set of measurements described above.
We have gained access to the Medical Imaging Facilities at Cedars Sinai Hospital and have collected MR images in the sagittal, coronal, and axial planes using a GE 1.5 Tesla SIGNA machine. Four phonetically-trained, native American English speakers [2 males (MI, SC) and 2 females (AK, PK)] served as subjects. Coronal and axial scans were used to obtain area functions of the front and back regions, respectively, while sagittal scans were used for length measurements. 3D models were used to measure the volumes of the sublingual cavities, piriform sinuses, and the entire vocal tract.
In addition to obtaining valuable estimates of the area functions and volumes, the MRI data illustrated inter-speaker differences in tongue shapes. So far, we have obtained images while speakers sustained vowels, fricatives, and liquids.
Supported by NSF grant number IRI-9503089.
MRI, Speech Production Modeling.
S. Narayanan and A. Alwan, "Articulatory-acoustic models for fricative consonants,'' IEEE Transactions on Speech and Audio Processing,Vol. 8, No. 3, p. 328-344, May 2000.
A. Alwan, S. Narayanan, B. Strope, and A. Shen, ``Speech production and perception models and their applications to synthesis, recognition, and coding,'' an invited chapter in the book Speech Processing, Recognition, and Artificial Neural Networks, Chollet, DiBenedetto, Esposito, and Marinaro ed., p. 138-161, Springer-Verlag, UK, 1999.
Amit Rane, ``Forward and Inverse Mapping of the Vocal Tract,'' M.S. thesis, EE Dept., UCLA, July 1998.
A. Rane, D. Wei, L. Falkson and A. Alwan, "Modeling the Transitory Behavior of Speech Using a Time-Varying Transmission Line Model", Proceedings of the ICA/ASA Conference, Seattle, pages 261-262, June 1998.
S. Narayanan, A. Alwan, and Y. Song, "New Results in Vowel Production: MRI and EPG data,'' Proceedings of Eurospeech , Vol.2 pp. 1007-1009, Patras, Greece, September 1997.
S. Narayanan, A. Alwan, and K. Haker. ``Towards articulatory-acoustic models for liquid consonants based on MRI and EPG data. Part I: The laterals.'' JASA, Vol. 101, No. 2, pages 1064-1077, February 1997.
A. Alwan, S. Narayanan, and K. Haker. ``Towards articulatory-acoustic models for liquid consonants based on MRI and EPG data. Part II: The rhotics.''JASA, Vol. 101, No. 2, pages 1078-1089, February 1997.
P. Bangayan, A. Alwan, and S. Narayanan, ``A transmission-line model of the lateral approximants'', Proc. of the Acous. Societies of Amer. and Japan, Vol. 100, No. 4, Dec. 1996.
P. Bangayan, A. Alwan, and S. Narayanan, `` From MRI and Acoustic Data to Articulatory Synthesis: a Case Study of the Laterals'', ICSLP Proc., Philadelphia, Oct. 1996 (Invited), pages 793-796.
S. Narayanan, A. Kaun, D. Byrd, P. Ladefoged, and A. Alwan, ``Liquids in Tamil,'' Proc. ICSLP, Philadelphia, Oct. 1996, pages 797-800.
Philbert Bangayan, ``A Transmission-line model of /l/ based on MRI-derived data,'' M.S. thesis, Department of Electrical Engineering, UCLA, July 1996.
S. Narayanan and A. Alwan, ``Parametric Hybrid Source Models for Voiced and Voiceless Fricative Consonants'', ICASSP 96 Proceedings, Vol. I, 337-340, Atlanta, May 96.
S. Narayanan and A. Alwan, ``Imaging Applications in Speech Production Research,'' SPIE 96 Medical Imaging Proceedings, 2709, 120-131, Newport Beach, Feb. 96 (Invited).
A. Alwan, S. Narayanan, B. Strope, and A. Shen, ``Speech Production and Perception Models and their Applications to Synthesis, Recognition, and Coding,'' Proc. ISSSE, Oct. 1995.
S. Narayanan, A. Alwan, and K. Haker, ``An Articulatory Study of Fricative Consonants using MRI,'' in Jour. Acous. Soc. Amer. (JASA), Vol. 98, No. 3, Sep. 1995, 1325-1347.
S. Narayanan, A. Alwan, and K. Haker, ``An Articulatory Study of Liquid Consonants in American English,'' Int. Con. Phon. Sci. (ICPhS) Proc., Stockholm, Sweden, August 1995, Vol. 3, 576-579.
Yong Song,``Finite Time Difference Simulations of Speech Production,'' M.S. thesis, Department of Electrical Engineering, UCLA, July 1995.
Shrikanth Narayanan,``Fricative Consonants: An Articulatory, Acoustic, and Systems Study'', PhD thesis, Department of Electrical Engineering, UCLA, June 1995.
S. Narayanan, A. Alwan, and K. Haker, ``Three Dimensional Tongue Shapes of Sibilant Fricatives,'' JASA, Nov. 1994, Vol. 96, (5), 3342 (A).
S. Narayanan, A. Alwan, and K. Haker, ``An MRI Study of Fricative Consonants,'' Proc. Int. Con. Spoken Lang. Proc. (ICSLP), Japan, Sep. 1994, Vol. 2, 627-630.
Back to SPAPL Home Page.