Collaborative Research: Improving speech technology for better learning outcomes: the case of AAE child speakers

NSF Award# 2202585

[ Project Description | Collaborators | Project References]


Project Description

The goal of this project is to develop new spoken language processing technology to enable interactive dialog between children and a virtual agent to support literacy learning and assessment, with a focus on serving underrepresented communities. Many AAE-speaking children struggle with literacy but spoken language systems that could deliver effective interventions are much less effective when used with AAE speakers, as they are seldom included in the samples used to train speech recognition or TTS systems. While our focus is on one dialect (AAE), the goal is to develop methods that can be applied to other dialects, so we focus on the scenario of learning from limited data. Since studies have shown that ASR performance on adult AAE is much worse than that for GAE, and we know that recognizing children’s speech is more difficult than adults, our assessment of the technology impact on learning leverages a constrained dialog task with initial experiments in a Wizard-of-Oz (WoZ) setting.

Specifically, we build a dialog system around the GORT-5, which is a diagnostic test of literacy that involves having the child read a short passage aloud and then answer questions about what they read. The GORT-5 test allows us to evaluate the use of ASR for assessing fluency and comprehension as well as explore strategies for response generation in dialog interactions associated with comprehension. This scenario also has the advantage of being relatively constrained, making it possible to focus on acoustic modeling of AAE, including morphology, pronunciation and prosody differences relative to GAE. Within this context, the research will explore several questions related to student learning which, despite their importance, no study has quantified the answers to before for AAE children:

The major goals of this project are as follows. 

(1) Does a gender and/or dialect matched voice (of a human and/or a machine-generated voice) impact student performance, engagement and/or confidence in an assessment setting? The hypothesis is that both gender matching and the use of AAE dialect will improve student performance. However, it is not known to what extent these factors are impactful and whether gender matching has similar effects for both girls and boys in terms of performance, engagement and confidence. 

(2) Does translanguaging (switching from AAE to GAE and vice versa or switching from one AAE density to another) at certain stages of the interaction impact student performance, engagement and/or confidence? The hypothesis is that the use of AAE will improve student performance, but it is not known whether the use of AAE is needed throughout the assessment or only in some areas such as in the introduction, conversational extensions, or in open-ended questions/prompts, whether the different dialectal densities are equally impactful, and whether the impact on boys and girls is similar.

We will develop ASR technology that assesses student performance in terms of reading accuracy, rate, fluency, and comprehension, and develop TTS technology that is able to adapt to the AAE dialect. Hence, the technology-related research questions are:

(3)What methods of adaptation (or transfer learning) and data augmentation are needed to improve the ASR performance for AAE child speakers?

(4) What methods of TTS adaptation are most effective for capturing dialect differences? Standard TTS voice building methods exist for characterizing speaker and speaking style differences and provide a baseline against which we compare the added benefits of explicit pronunciation modeling and more complex models of prosody..

A final research question relates to both learning and technology:

(5) an we leverage prosodic cues to improve adaptations/usefulness of SLS in educational applications? The hypothesis is that prosodic cues can improve SLS in educational settings.


Students and Collaborators

UCLA

-Abeer Alwan*

-Alison Bailey**

-Alexander Johnson

-Natarajan Balaji Shankar

University of California, Irvine

-Julie Washington*

-Bryan Murray

-Alaria Long

University of Washington

-Mari Ostendorf*

-Junkai Yu

Georgia State University

-Robin Morris*

*Indicates PI
**Indicates Co-PI


Project References

A. Johnson, N. B. Shankar, M. Ostendorf, and A. Alwan “An Exploratory Study on Dialect Density Estimation for Children and Adult's African American English” Accepted for publication, Journal of the Acoustical Society of America, 2024

Natarajan Balaji Shankar, Ruchao Fan and Abeer Alwan, SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR," accepted for publication at ICASSP Workshops 2024

Alexander Johnson, Christina Chance, Kaycee Stiemke, Hariram Veeramani, Natarajan Balaji Shankar, and Abeer Alwan, "An Analysis of Large Language Models for African American English Speaking Children's Oral Language Assessment," Journal of Black Excellence in Engineering Science, and Technology, Vol 1, 2023

Ruchao Fan, Natarajan Balaji Shankar, and Abeer Alwan, "UniEnc-CASSNAT: An Encoder Only Non-autoregressive ASR with Self-supervised Pretrained Speech Models," IEEE Signal Processing Letters, vol. 31, pp. 711-715, 2024, doi: 10.1109/LSP.2024.3365036.

H. Veeramani, N. Balaji Shankar, A. Johnson, and A.Alwan, “Towards Automatically Assessing Children's Oral Picture Description Tasks,” in The 9th Workshop on Speech and Language Technology in Education, 2023

A. Johnson, H. Veeramani, N. B. Shankar, and A. Alwan, "An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities," in Interspeech 2023

A. Johnson, V. Shetty, M. Ostendorf, and A. Alwan, “Towards Automatic Dialect Identification of African American English in Adults' and Children's Speech,” in IEEE ICASSP 2023

A. Johnson, J. Washington, R. Morris, M. Ostendorf, A. Bailey, and A. Alwan, "Towards Effective Speech-based AI in the Classroom: The Case of AAE-Speaking Children" in Black in AI Workshop at NeurIPs 2023

A. Johnson, K. Everson, V. Ravi, A. Gladney, M. Ostendorf, and A. Alwan, "Automatic Dialect Density Estimation for African American English," in Interspeech 2022, 1283-1287

A. Johnson, R. Fan, R. Morris, and A. Alwan, "LPC AUGMENT: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children’s Dialects," in ICASSP 2022, doi:https://doi.org/10.1109/ICASSP43922.2022.9746281, page 8577-8581


Back to SPAPL Home Page.

Abeer Alwan (alwan@seas.ucla.edu)