Using Automatic Speech Recognition Technology to Reverse Analyze Communication Strategies between Non-Native Speakers in a Chinese Long Distance Group Discussion

Sunaoka, K.

A double blind peer reviewed online publication with in-print supplement since 2010 ISSN: 1949-260X

JTCLT Abstract

Volume 9 Number 2, 2018
Full issue PDF

Sunaoka, K. (2018). Using Automatic Speech Recognition Technology to Reverse Analyze Communication Strategies between Non-Native Speakers in a Chinese Long Distance Group Discussion. Journal of Technology and Chinese Language Teaching, 9(2), 61-82.
[砂冈和子. (2018). 以语音识别技术逆向分析汉语远场群体讨论中非母语者的交互策略. 科技与中文教学 (Journal of Technology and Chinese Language Teaching), 9(2), 61-82.]

Full paper

Abstract/摘要：

In recent years, the accuracy of Automatic Speech Recognition (ASR) has greatly improved in terms of its practical applications. An ASR test on a real corpus of a multi-person long-distance group discussion between non-native speakers (NNS) and native speakers (NS) of Chinese was used to compare the accuracy of ASR with NNS speakers. It was found that the latest ASR has very high recognition accuracy for single speakers, native speakers, and the standard spoken language. ASR has now reached a new level in field applications. However, for both NS and NNS, the recognition rate of ASR significantly decreased while capturing emotional and multi-channel speech. Therefore, it is difficult to apply to far-field, multi-channel, or multi-modal speech. In contrast, NNS participants who made full use of multiple information channels and modalities were able to successfully communicate and interact, although their Chinese pronunciation was not standard and included fragmented statements. This paper also discusses trends in future ASR technology and introduces several “ASR + Chinese teaching” methods to explore how they may better coexist with smart language tools.

近几年，语音识别技术（Automatic Speech Recognition: ASR）的精度大幅提升，已突破了从技术走向实用的门槛。本文对非汉语母语者（NNS）与母语者（NS）的远场群体讨论语料用ASR技术进行识别精度。逆向验证了最新ASR对单一发言人、母语者、标准口语的识别精度非常高，已达到现场应用的水平。但不管对NS还是NNS，对含有情感及多声源等干扰等语音，ASR识别率都出现大幅度下降。因此很难应用到识别具有远场(far-field）,多通道(multi-channel）,多模态（multi-modal）特征的语音。相比之下，参加群体讨论会的NNS存在汉语口音不够标准，语句碎片化等问题，但充分利用多个信息通道、多种沟通模态来与NS进行互动共享信息。最后简介未来ASR技术的趋势，同时显示几种“ASR＋汉语教学”模式，从而探讨如何更好地与智能语言工具互补共存。

This website is supported by
Department of World Languages, Literatrues, and Cultures, Middle Tennessee State University
Page last updated: 2020-12-31