A prototype of Chinese aspirated consonants pronunciation training system based on multi-resolution cochleagram
Treść / Zawartość
Many Mandarin Chinese learners, especially those whose mother tongue’s phonological system differs significantly from Chinese phonological system, find it challenging to learn pronunciation of Chinese phonemes. Yet pronunciation training in language class settings is limited. It is therefore essential to develop computeraided training system to help learners practice Chinese pronunciation without teacher’s assistance. In this article I introduce a prototype of Chinese pronunciation training system that specifically focuses on phoneme substitution errors related to aspiration of consonants. I describe feature extraction process based on multiresolution cochleagram (MRCG), a psychoacoustic model of basilar membrane excitation pattern, and architecture of recurrent neural network (RNN) used for mispronounced phonemes detection. The system achieves 96.12% and 98.58% accuracy rate in detecting phoneme substitution errors and determining aspiration length respectively. Proposed system may be particularly useful for learners of Slavic and Romance origin, since in their mother tongues aspiration is not a distinctive feature.
Bibliogr. 24 poz., il. kolor.
- 1. W. Li et al., Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling, IEEE International Conference on Acoustics, Speech and Signal Processing, (2016) 6135 - 6139.
- 2. H.-C. Liao et al., A prototype of an adaptive Chinese pronunciation training system, System, 45 (2014) 52 - 66.
- 3. C. Molina et al., ASR based pronunciation evaluation with automatically generated competing vocabulary and classifier fusion, Speech Communication, 51 (2009) 485 - 498.
- 4. X. Qian, H. Meng, F. Soong, A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24.6 (2016) 1020 - 1028.
- 5. S. M. Witt, S. J. Young, Phone-level pronunciation scoring and assessment for interactive language learning, Speech Communication, 30 (2000) 95 - 108.
- 6. B. Mak et al., PLASER: pronunciation learning via automatic speech recognition, Proceedings of Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, (2003) 23 - 29.
- 7. A. Neri, C. Cucchiarini, H. Strik, ASR corrective feedback on pronunciation: does it really work?, Proceedings of Interspeech, (2006) 1982 - 1985.
- 8. J. Zheng et al., Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation, IEEE International Conference on Acoustics, Speech, and Signal Processing, (2007) 201 - 204.
- 9. C. Cucchiarini, A. Neri, H. Strik, Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback, Speech Communication, 51 (2009) 853 - 863.
- 10. H. Strik et al., Comparing different approaches for automatic pronunciation error detection, Speech Communication, 51 (2009) 845 - 852.
- 11. W. Hu et al., Improved Mispronunciation Detection with Deep Neural Network Trained Acoustic Models and Transfer Learning based Logistic Regression Classifiers, Speech Communication, 67 (2015) 154 - 166.
- 12. G. Huang et al., English Mispronunciation Detection Based on Improved GOP Methods for Chinese Students, Proceedings of International Conference on Progress in Informatics and Computing, (2017) 425 - 429.
- 13. Sh. Mao et al., Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech, IEEE International Conference on Acoustics, Speech and Signal Processing, (2018) 6254 - 6258.
- 14. A. M. Harrison et al., Implementation of an Extended Recognition Network for Mispronunciation Detection and Diagnosis in Computer-Assisted Pronunciation Training, Proceedings of ISCA International Workshop on Speech and Language Technology in Education, (2009) 45 - 48.
- 15. W.-K. Lo, Sh. Zhang, H. Meng, Automatic Derivation of Phonological Rules for Mispronunciation Detection in a Computer-Assisted Pronunciation Training System, Proceedings of Interspeech, (2010) 765 - 768.
- 16. Sh. Mao et al., Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis, IEEE International Conference on Acoustics, Speech and Signal Processing, (2018) 6244 - 6248.
- 17. K. Li, X. Qian, H. Meng, Mispronunciation detection and diagnosis in L2 English speech using multidistribution deep neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25.1 (2017) 193 - 207.
- 18. N. F. Chen et al., Large-scale characterization of non-native Mandarin Chinese spoken by speakers of European origin: Analysis on iCALL, Speech Communication, 84 (2016) 46 - 56.
- 19. J. Chen, Y. Wang, D. Wang, A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios, IEEE International Conference on Acoustics, Speech and Signal Processing, (2014) 1993 - 2002.
- 20. L.-H. Wee, M. Li, Modern Chinese Phonology In W. S.-Y. Wang, Ch. Sun, The Oxford handbook of Chinese linguistics, Oxford University Press, (2015) 474 - 489.
- 21. H. Bu et al., AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline, 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, (2017) 1 - 5.
- 22. X.-L. Zhang, D. Wang, Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection, Proceedings of Interspeech, (2014) 1534 - 1538.
- 23. K. K. Y. Lam, C. K. S. To, Speech sound disorders or differences: Insights from bilingual children speaking two Chinese languages, Journal of Communication Disorders, 70 (2017) 35 - 48.
- 24. Ch.-Y. Lin, H.-Ch. Wang, Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection, Journal of the Acoustical Society of America, 130.1 (2011) 514 - 525.