UCLA Speech Processing and Auditory Perception Laboratory

Collaborative Research: Improving speech technology for better learning outcomes: the case of AAE child speakers


The goal of this project is to develop new spoken language processing technology to enable interactive dialog between children and a virtual agent to support literacy learning and assessment, with a focus on serving underrepresented communities. Many AAE-speaking children struggle with literacy but spoken language systems that could deliver effective interventions are much less effective when used with AAE speakers, as they are seldom included in the samples used to train speech recognition or TTS systems. While our focus is on one dialect (AAE), the goal is to develop methods that can be applied to other dialects, so we focus on the scenario of learning from limited data. Since studies have shown that ASR performance on adult AAE is much worse than that for GAE, and we know that recognizing children’s speech is more difficult than adults, our assessment of the technology impact on learning leverages a constrained dialog task with initial experiments in a Wizard-of-Oz (WoZ) setting. (details)

Voice Source Project


In voiced speech, the vocal folds open and close quasi-periodically and thus convert the glottal air flow (air volume velocity) into a train of flow pulses which is referred to as the voice source excitation signal.
Early models of the source signal used a simple impulse train for modeling voiced excitation. None of these models has been calibrated with direct observations of glottal area changes which are the proximal cause of the air pressure changes that we hear as sound.The effective study of the voice source thus requires both more accurate source models and a comprehensive set of underlying observations on which to base the models. The primary goal of the proposed research is to develop and evaluate a new, more powerful source model based on direct observations of vocal fold vibrations... (details)

The Subglottal Resonances: Research and Applications

↑Top

    During the past few decades, research efforts in the area of speech processing have focused on the extraction of reliable acoustic features for applications such as automatic speech recognition, speaker identification, and speech coding, among many others. These acoustic features are related either to the vocal tract (filter), or to the glottal air flow (source) that drives it. Although the mechanics of the supraglottal (above the glottis) system have been well understood, the subglottal (below the glottis) system and its properties have not been explored in great detail. Unlike the supraglottal tract, the configuration of the subglottal system remains fairly constant during speech production, which makes its properties very interesting and useful. In particular, its resonant frequencies, through subtle interactions with the speech signal, are believed to have the potential to minimize acoustic differences among speakers and also to provide valuable information about a speaker's identity... ( details)

    An application to estimate your height from your voice:
    http://ucla-voice-and-height.herokuapp.com/ 

Modeling Speech Perception in Noise and Noise Robust ASR

↑Top

    We almost always listen to speech which is degraded by the addition of competing speech and non-speech signals. Fortunately, we are remarkably adept at isolating a specific speech signal from the background noise and understanding what is said. The purpose of this study is to contribute to a broad research program whose aim is to understand and model human perception of speech in noise... (details)

Voice-based Depression Study

↑Top

    Major Depressive Disorder (MDD) affects almost one in five women and one in twelve men in their lifetime and was recently recognized as the world's leading cause of disability. Yet current pharmacological and psychological therapies provide limited efficacy... (details)

Noise Robust Bird Song Classification, Recognition, and Detection

Bird songs are important in the communication between birds of specific species. A bird can listen to other birds and classify them as conspecific or heterospecific, neighbor or stranger, mate or non-mate, kin or non-kin. It can also sing to other birds for mate attraction, danger alert, or territory defense. Behavioral and ecological studies could benefit from automatically detecting and identifying species from acoustic recordings.

Technologically Based Assessment of Language and Literacy (TBALL)

↑Top

    The TBALL project aims to advance the state of the art in speech processing, datamining, and human-computer interface design. It integrates these technologies with a progressive understanding of the components of academic performance to develop an effective, child-friendly literacy assessment system.

    The project is studying the impact of this approach with native American English speakers and non-native speakers of Mexican-Spanish background, longitudinally from K-2. It will:

    * Analyze children's speech as they grow
    * Develop speech recognition algorithms for automated assessments
    * Create a query-based, longitudinal database for each student
    * Derive instructional guidance from the analysis of an ongoing professional development program for teachers of native and non-native speakers
    * Allow teachers to make more timely and appropriate decisions about curriculum and instructional interventions.

    The project fosters interdisciplinary activities at:
    UCLA - Electrical Engineering, Computer Science, Education
    USC - Electrical Engineering, Linguistics, Neuroscience, Psychology
    UCB - Education ... (details)

From MRI and Acoustic Data to Articulatory Synthesis

↑Top

    Quantitative models of the human speech production system are needed for a better understanding of our cognitive abilities and for the development of high-quality speech synthesizers and automatic speech recognition systems. In previous studies, information regarding the vocal tract geometry during speech production has been mainly derived from lateral x-ray data. The main limitations of x-rays include radiation risks and difficulty in accurately deducing the cross-sectional morphology from mid-sagittal profiles... (details)

Speech Coding and Echo Cancellation for Wireless Communication

↑Top


    Design of high quality speech coders and echo-cancelation schemes for wireless networks is a challenging task since good quality should be maintained with low power consumption under time-varying channel conditions and limited bandwidth. The design should account for a number of parameters such as bit rate, delay, power consumption, complexity, and quality of coded speech. Available bandwidth will depend on network protocols. Depending on the application, a set of parameters is optimized... (details)

Analysis by Synthesis of Severely Pathological Voices

↑Top

    No accepted standard system exists for describing pathological voice qualities. Qualities are labeled based on the perceptual judgments of individual clinicians, a procedure plagued by inter- and intra-rater inconsistencies and terminological confusions. Synthetic pathological voices could be useful as an element in a standard protocol for quality assessment. In this project, we develop guidelines for synthesizing some kinds of severely pathological voice qualities in the hope of making synthesis less of a subjective art, as it currently is, and more of a science... (details)

An Analysis of Acoustic Feedback in Hearing Aids

↑Top

    Acoustic feedback can cause oscillations and instability which lead to a howling sound produced by the hearing aid. Even motion near the hearing aid can cause changes in the acoustic feedback path. The purpose of this analysis is to quantify the acoustic path transfer function (APTF) of in-the-ear (ITE) hearing aids under both static and dynamic conditions... (details) ( demo)

The Smart Kindergarten Project

↑Top

    In Smart Kindergarten, we target the early childhood education environment as a testbed, where we try to provide parents and teachers with the abilities to comprehensively investigate students’ learning processes. The kind of questions that we hope to answer ranges from evaluations of students progress such as “How well is student A reading the story book B?”, “Is student C spending too much time on one learning area?”, to evaluations of students social behavior such as “Does student A tend to confront other students?”, “Is student B usually isolated?”. The infrastructure of SmartKG was designed to collect, manage, and fuse the information of the sensors to interpret and present the information in a logical and user friendly manner... (details)
spacer spacer