Comparing Automatic Speech Recognition and Teacher Assessments of Japanese Learners\' Mandarin Chinese Pronunciation: Accuracy, Agreement, and Pronunciation Difficulty Detection

Huang, W.; Kashiwagi, H.; Kang, M.

A double blind peer reviewed online publication with in-print supplement since 2010 ISSN: 1949-260X

JTCLT Abstract

Volume 17 Number 1, 2026
Full issue PDF

Huang, W., Kashiwagi, H., & Kang, M. (2026). Comparing Automatic Speech Recognition and Teacher Assessments of Japanese Learners' Mandarin Chinese Pronunciation: Accuracy, Agreement, and Pronunciation Difficulty Detection. Journal of Technology and Chinese Language Teaching, 17(1), 48-65.
[黄暐勋, 柏木治美, & 康敏. (2026). 自动语音识别与教师对日本汉语学习者普通话发音评估的比较：准确性、一致性及发音困难识别. 科技与中文教学 (Journal of Technology and Chinese Language Teaching), 17(1), 48-65.]

Full paper

Abstract/摘要：

Computer-assisted pronunciation training (CAPT) increasingly incorporates automatic speech recognition (ASR) to provide pronunciation assessment and feedback. However, the extent to which ASR systems evaluate non-native Mandarin Chinese speech in a manner comparable to human teachers remains unclear. This study compares the assessments generated by three ASR systems—Whisper, Azure, and Gladia—with ratings provided by native Chinese-speaking teachers for the word-level Mandarin Chinese pronunciation of 31 Japanese learners. Two research questions are addressed: (1) To what extent do these ASR systems assess learner pronunciation comparably to teachers? (2) Can ASR assessments help identify learners' pronunciation difficulties? A three-point scoring scheme was developed to evaluate learners' productions of 20 Mandarin Chinese words. Comparative analyses were conducted from the perspectives of learner proficiency and pronunciation characteristics. The results showed that all three ASR systems generally underestimated learner performance relative to teacher ratings, although Whisper produced assessments that were most consistent with those of the teachers. The agreement between ASR and teacher assessments also varied according to learner proficiency. Furthermore, ASR performance was strongly influenced by initial–final combinations, suggesting that ASR assessments can help identify specific pronunciation difficulties. These findings support the potential of ASR as a complementary tool for pronunciation assessment in Mandarin Chinese CAPT.

随着计算机辅助发音训练（Computer-Assisted Pronunciation Training, CAPT）的发展，自动语音识别（Automatic Speech Recognition, ASR）系统日益广泛地应用于发音评估与反馈。然而，ASR系统对非母语者普通话发音的评估能否达到与教师相近的水平，仍缺乏充分的实证研究。本研究比较Whisper、Azure和Gladia三种ASR系统与中文母语教师对31名日本学习者20个汉语词语发音的评估结果，以探讨：（1）ASR系统在词语层面的发音评估与教师评分具有多大程度的一致性？（2）ASR评估是否有助于识别学习者的发音困难？本研究建立三级评分标准，对学习者发音分别进行ASR评分与教师评分，并从学习者水平和发音特征两个层面比较两者的评估结果。研究结果显示，三种ASR系统均倾向于低估学习者的发音表现，其中Whisper与教师评分的一致性最高。此外，ASR与教师评分的一致程度会因学习者水平而有所不同。进一步分析发现，ASR评估结果受到声母—韵母组合的显著影响，表明ASR评估有助于识别学习者的具体发音困难。本研究结果支持ASR作为普通话计算机辅助发音训练中辅助发音评估工具的应用潜力。

This website is supported by
Department of World Languages, Literatrues, and Cultures, Middle Tennessee State University
Page last updated: 2020-12-31