The goal of this project is to develop new spoken language processing technology to enable interactive dialog between children and a virtual agent to support literacy learning and assessment, with a focus on serving underrepresented communities. Many AAE-speaking children struggle with literacy but spoken language systems that could deliver effective interventions are much less effective when used with AAE speakers, as they are seldom included in the samples used to train speech recognition or TTS systems. While our focus is on one dialect (AAE), the goal is to develop methods that can be applied to other dialects, so we focus on the scenario of learning from limited data. Since studies have shown that ASR performance on adult AAE is much worse than that for GAE, and we know that recognizing children’s speech is more difficult than adults, our assessment of the technology impact on learning leverages a constrained dialog task with initial experiments in a Wizard-of-Oz (WoZ) setting.
Specifically, we build a dialog system around the GORT-5, which is a diagnostic test of literacy that involves having the child read a short passage aloud and then answer questions about what they read. The GORT-5 test allows us to evaluate the use of ASR for assessing fluency and comprehension as well as explore strategies for response generation in dialog interactions associated with comprehension. This scenario also has the advantage of being relatively constrained, making it possible to focus on acoustic modeling of AAE, including morphology, pronunciation and prosody differences relative to GAE. Within this context, the research will explore several questions related to student learning which, despite their importance, no study has quantified the answers to before for AAE children:
The major goals of this project are as follows.
(1) Does a gender and/or dialect matched voice (of a human and/or a machine-generated voice) impact student performance, engagement and/or confidence in an assessment setting? The hypothesis is that both gender matching and the use of AAE dialect will improve student performance. However, it is not known to what extent these factors are impactful and whether gender matching has similar effects for both girls and boys in terms of performance, engagement and confidence.
(2) Does translanguaging (switching from AAE to GAE and vice versa or switching from one AAE density to another) at certain stages of the interaction impact student performance, engagement and/or confidence? The hypothesis is that the use of AAE will improve student performance, but it is not known whether the use of AAE is needed throughout the assessment or only in some areas such as in the introduction, conversational extensions, or in open-ended questions/prompts, whether the different dialectal densities are equally impactful, and whether the impact on boys and girls is similar.
We will develop ASR technology that assesses student performance in terms of reading accuracy, rate, fluency, and comprehension, and develop TTS technology that is able to adapt to the AAE dialect. Hence, the technology-related research questions are:
(3)What methods of adaptation (or transfer learning) and data augmentation are needed to improve the ASR performance for AAE child speakers?
(4) What methods of TTS adaptation are most effective for capturing dialect differences? Standard TTS voice building methods exist for characterizing speaker and speaking style differences and provide a baseline against which we compare the added benefits of explicit pronunciation modeling and more complex models of prosody..
A final research question relates to both learning and technology:
(5) an we leverage prosodic cues to improve adaptations/usefulness of SLS in educational applications? The hypothesis is that prosodic cues can improve SLS in educational settings.
-Abeer Alwan*
-Alexander Johnson
-Natarajan Balaji Shankar
-Bryan Murray
-Alaria Long
-Junkai Yu
Anyu Ying, Natarajan Balaji Shankar, Chyi-Jiunn Lin, Mohan Shi, Pu Wang, Hye-jin Shim, Siddhant Arora, Hugo Van hamme, Abeer Alwan, and Shinji Watanabe, "Benchmarking Training Paradigms, Dataset Composition, and Model Scaling for Child ASR in ESPnet", Proc. Workshop on Child Computer Interaction - WOCCI 2025, 6-10, doi: 10.21437/WOCCI.2025-2
Natarajan Balaji Shankar, Kaiyuan Zhang, Andre Mai, Mohan Shi, Alaria Long, Julie Washington, Robin Morris, and Abeer Alwan, "Leveraging ASR and LLMs for Automated Scoring and Feedback in Children’s Spoken Language Assessments", Proc. 10th Workshop on Speech and Language Technology in Education (SLaTE), 1-5, doi: 10.21437/SLaTE.2025-1
Natarajan Balaji Shankar, Zilai Wang, Kaiyuan Zhang, Mohan Shi, and Abeer Alwan, "CHSER: A Dataset and Case Study on Generative Speech Error Correction for Child ASR", Proc. Interspeech 2025, 2895-2899, doi: 10.21437/Interspeech.2025-1019
Alison L. Bailey, Alexander Johnson, Natarajan Balaji Shankar, Hariram Veeramani, Julie A. Washington, and Abeer Alwan, "Addressing Bias in Spoken Language Systems Used in the Development and Implementation of Automated Child Language-Based Assessment", Journal of Educational Measurement, 2025. https://doi.org/10.1111/jedm.12435
Laura Wagner, Sharifa Alghowinhem, Abeer Alwan, Kristina Bowdrie, Cynthia Breazeal, Cynthia G. Clopper, Eric Fosler-Lussier, Izabela A. Jamsek, Devan Lander, Rajiv Ramnath, and Jory Ross, "The Ohio Child Speech Corpus", Speech Communication, Volume 170, 2025, 103206, ISSN 0167-6393, https://doi.org/10.1016/j.specom.2025.103206.
Natarajan Balaji Shankar, Zilai Wang, Eray Eren, and Abeer Alwan, "Selective Attention Merging for low resource tasks: A case study of Child ASR", ICASSP 2025 – IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–5, April 2025, https://doi.org/10.1109/ICASSP49660.2025.10887889
Natarajan Balaji Shankar, Amber Afshan, Alexander Johnson, Aurosweta Mahapatra, Alejandra Martin, Haolun Ni, Hae Won Park, Marlen Quintero Perez, Gary Yeung, Alison Bailey, Cynthia Breazeal, and Abeer Alwan. "The JIBO Kids Corpus: A speech dataset of child-robot interactions in a classroom environment", JASA Express Lett. 1 November 2024; 4 (11): 115201, https://doi.org/10.1121/10.0034195
Ruchao Fan, Natarajan Balaji Shankar, and Abeer Alwan. "Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models", Proc. Interspeech 2024, 5173-5177, https://doi.org/10.21437/Interspeech.2024-1353
Hariram Veeramani, Surendrabikram Thapa, Natarajan Balaji Shankar, and Abeer Alwan. "Large Language Model-based Pipeline for Item Difficulty and Response Time Estimation for Educational Assessments". NAACL 2024 Workshops - Proceedings of the Nineteenth Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 561–566.
Alexander Johnson, Natarajan Balaji Shankar, Mari Ostendorf, and Abeer Alwan "An Exploratory Study on Dialect Density Estimation for Children and Adult's African American English", Journal of the Acoustical Society of America, 2024, 155 (4), pp. 2836-2848, https://doi.org/10.1121/10.0025771
Natarajan Balaji Shankar, Ruchao Fan, and Abeer Alwan, "SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR" 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Seoul, Korea, Republic of, 2024, pp. 560-564, doi: 10.1109/ICASSPW62465.2024.10625884.
Alexander Johnson, Christina Chance, Kaycee Stiemke, Hariram Veeramani, Natarajan Balaji Shankar, and Abeer Alwan, "An Analysis of Large Language Models for African American English Speaking Children's Oral Language Assessment," Journal of Black Excellence in Engineering Science, and Technology, Vol 1, 2023
Natarajan Balaji Shankar, Alexander Johnson, Christina Chance, Hariram Veeramani, and Abeer Alwan, "CORAAL QA: A Dataset and Framework for Open Domain Spontaneous Speech Question Answering from Long Audio Files," ICASSP 2024, pp. 13371-13375, doi: 10.1109/ICASSP48485.2024.10447109.
Ruchao Fan, Natarajan Balaji Shankar, and Abeer Alwan, "UniEnc-CASSNAT: An Encoder Only Non-autoregressive ASR with Self-supervised Pretrained Speech Models," IEEE Signal Processing Letters, vol. 31, pp. 711-715, 2024, doi: 10.1109/LSP.2024.3365036.
Hariram Veeramani, Natarajan Balaji Shankar, Alexander Johnson, and Abeer Alwan, "Towards Automatically Assessing Children's Oral Picture Description Tasks," Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 119–120.
Alexander Johnson, Hariram Veeramani, Natarajan Balaji Shankar, and Abeer Alwan, "An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities," Proc. INTERSPEECH 2023, 4608-4612, doi: 10.21437/Interspeech.2023-1257
A. Johnson, V. Shetty, M. Ostendorf, and A. Alwan, "Leveraging Multiple Sources in Automatic African American English Dialect Detection for Adults and Children," in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10096614
A. Johnson, J. Washington, R. Morris, M. Ostendorf, A. Bailey, and A. Alwan "Towards Effective Speech-based AI in the Classroom: The Case of AAE-Speaking Children," in Black in AI Workshop at NeurIPs 2023
p style="margin: 0in 0in 12pt;"> A. Johnson, K. Everson, V. Ravi, A. Gladney, M. Ostendorf, and A. Alwan, "Automatic Dialect Density Estimation for African American English," in Interspeech 2022, 1283-1287, doi: 10.21437/Interspeech.2022-796A. Johnson, R. Fan, R. Morris, and A. Alwan, "LPC AUGMENT: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children’s Dialects," in ICASSP 2022, doi:https://doi.org/10.1109/ICASSP43922.2022.9746281, page 8577-8581
Back to SPAPL Home Page.
Abeer Alwan (alwan@seas.ucla.edu)