Quality assessment of synthetic speech

Brachmański, Stefan; Kin, Maurycy; Kozłowski, Piotr

doi:10.24425/ijet.2025.153612

Artykuł - szczegóły

Tytuł artykułu

Quality assessment of synthetic speech

Autorzy

Brachmański Stefan , Kin Maurycy , Kozłowski Piotr

Treść / Zawartość

Pełne teksty:

IJET_2025_71_3_BRACHMAŃSKI_Quality assessment.pdf

Pobierz

Identyfikatory

DOI

10.24425/ijet.2025.153612

Warianty tytułu

Języki publikacji

Abstrakty

This paper presents the results of a subjective study of the quality assessment of several selected speech synthesizers. The subjects of the study were logatom intelligibility and overall speech signal quality evaluation. Synthesizers generating both male and female voices were used for the study. An attempt was also made to apply objective quality assessment methods used to test the quality of transmission in telecommunications channels. The results of these attempts, however, showed the impossibility of using the PESQ method to assess the quality of synthetic speech, mainly due to the lack of temporal synchronization between the test signal and the reference signal.

Słowa kluczowe

speech quality speech synthesis logatom intelligibility

Wydawca

Polish Academy of Sciences, Committee of Electronics and Telecommunication

Czasopismo

International Journal of Electronics and Telecommunications

Rocznik

2025

Tom

Vol. 71, No. 3

Strony

Opis fizyczny

Bibliogr. 33 poz., tab., rys.

Twórcy

autor

Brachmański Stefan

stefan.brachmanski@pwr.edu.pl

Wroclaw University of Science and Technology, Poland

autor

Kin Maurycy

maurycy.kin@pwr.edu.pl

Wroclaw University of Science and Technology, Poland

autor

Kozłowski Piotr

piotr.kozlowski@pwr.edu.pl

Wroclaw University of Science and Technology, Poland

Bibliografia

[1] H. Dudley, “The carrier nature of speech”, Bell System Technical Journal, vol. 19, no 4, pp 495-515, 1940, https://doi.org/10.1121/1.1916020
[2] H. Dudley, “Fundamentals of speech synthesis”. Journal of the Audio Engineering Society, vol. 3, no 4, pp 170-185, 1955.
[3] J. Benesty, M. M. Sondhi, Y Huang, (Eds.), “Springer handbook of speech processing,” Berlin: Springer, 2008
[4] R. E. Donovan, “Trainable speech synthesis”. Doctoral dissertation, University of Cambridge, 1996, https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=63011431015dd41881987e37f138c4060975442c (07.08.2024)
[5] M. Stone, C. H. Shadle, “A history of speech production research.” Acoustics Today, vol. 12, no 4, pp. 48-55, 2016, https://www.isca-archive.org/hscr_2019/hoffmann19_hscr.pdf (27.07.2024).
[6] T. Dutoit, T.. “High-quality text-to-speech synthesis: An overview”. Journal Of Electrical And Electronics Engineering Australia, vol. 17, no 1, pp 25 - 36, 1997.
[7] S. R. Mache, M. R. Baheti, C. N. Mahender, C. N. “Review on text-to-speech synthesizer.” International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no 8, pp. 54-59, 2015, https://doi.org/10.17148/IJARCCE.2015.4812
[8] S. Furui, “Digital speech processing: synthesis, and recognition.” CRC Press. 2018, https://doi.org/10.1201/9781482270648
[9] F. Khanam, F. A. Munmun, N. A. Ritu, A. K. Saha, M. Firoz,. „Text to speech synthesis: a systematic review, deep learning based architecture and future research direction.” Journal of Advances in Information Technology, vol. 13, no 5, pp. 398 - 412, 2022, https://www.jait.us/uploadfile/2022/0831/20220831054604906.pdf (07.08.2024), https://doi.org/10.12720/jait.13.5.398-412
[10] X. Tan et al., "Natural Speech: End-to-End Text-to-Speech Synthesis With Human-Level Quality," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 6, pp. 4234-4245, 2024, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10409539, (07.08.2024), https://doi.org/10.1109/TPAMI.2024.3356232
[11] E. V. Raghavendra, P. Vijayaditya and K. Prahallad, "Speech synthesis using artificial neural networks," 2010 National Conference On Communications (NCC), Chennai, India, 2010, pp. 1-5, https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5430190 (07.08.2024), https://doi.org/10.1109/NCC.2010.5430190
[12] V. J. van Heuven, R. van Bezooijen, “Quality Evaluation of Synthesized Speech”, in: W.B. Kleijn, K.K. Paliwal, (eds.), Speech coding and synthesis, pp. 707-708, 1995. Elsevier, Amsterdam.
[13] N. Kitawaki, H. Nagabuchi, “Quality assessment of speech coding and speech synthesis systems”, IEEE Communications Magazine, October, pp. 36 - 44, 1988.
[14] S. Brachmanski, “Selected problems of speech transmission quality assessment” (Wybrane zagadnienia oceny jakości transmisji sygnału mowy), Wrocław University of Science and Technology Edition, 2015, (in Polish).
[15] S. Brachmański, “Automation of subjective measurements of speech intelligibility in analogue telecommunication channels”, Archives of Acoustics. vol. 33, no 3, pp. 341 - 350, 2008, https://acoustics.ippt.pan.pl/index.php/aa/article/viewFile/536/46726.07.2024
[16] M. Daniluk, A. P. Pietrzak, “Comparative analysis of natural and synthesized Polish speech”. Int. Journal of Electronics and Telecommunication, vol. 70, no 2, pp. 361-366 2024, https://doi.org/10.24425/ijet.2024.149553
[17] Cooper, E., Huang, W. C., Tsao, Y., Wang, H. M., Toda, T., J. Yamagishi, “A review on subjective and objective evaluation of synthetic speech.” Acoustical Science and Technology, vol. 45, no. 4, pp. 12 - 24, 2024. https://doi.org/10.1250/ast.e24.12
[18] R. J. Beaton, J. G. Beerends, M. Keyhl, W. C. Treurniet, “Objective perceptual measurement of audio quality”, In Audio Engineering Society Conference: Collected Papers on Digital Audio Bit-Rate Reduction. Audio Engineering Society, 1996.
[19] F. Holly, I. Scott, O. Eunmi, “Objective Measures of Voice Quality for Mobile Handsets”, 140 AES Convention. Audio Engineering Society, Paper 9532, 2016.
[20] M. Torcoli, T. Kastner, J. Herre, “ Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, Vol. 29, pp. 1530 - 1541, https://doi.org/10.1109/TASLP.2021.3069302
[21] ITU-T Recommendation P.800, “Methods for subjective determination of transmission quality.”, Geneva, Switzerland, 1996.
[22] ITU-T Recommendation P.862, “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs”, Geneva, Switzerland, 2001.
[23] ITU-T Recommendation P.863, “Perceptual objective listening quality assessment”, Geneva, Switzerland, 2018.
[24] ITU-R Recommendation. 1534-1: Method for the Subjective Assessment of Intermediate Sound Quality (MUSHRA)," Geneva, Switzerland, 2001.
[25] S. Brachmanski, “Test material used to assess speech quality in Poland”, in: Acoustics, acoustoelectronics and electrical engineering, F. Witos (ed.), Gliwice, pp. 65-79, 2021.
[26] S. Brachmanski, M. Kin, P. Zemankiewicz, “Subjective Assessment of the Speech Signal Quality Broadcasted by Local Digital Radio in Selected Locations in Wroclaw under Studio and Home Conditions”, Int. Journal of Electronics and Telecommunications, vol. 68, no. 4, pp. 687 - 693, 2022, https://doi.org/10.24425/ijet.2022.141290
[27] S. Brachmanski, M. Kin, N. Rurzyńska, “Objective Assessment of the Speech Quality Broadcasted by Local Digital Radio in Selected Locations in Wroclaw”, Int. Journal of Electronics and Telecommunications, vol. 70, no. 3, pp. 603 - 608, 2024, doi: 10.24425/ijet.2024.149585
[28] F. Holly, I. Scott, O. Eunmi, “Objective Measures of Voice Quality for Mobile Handsets”, 140 AES Convention. Audio Engineering Society, Paper 9532, 2016
[29] M. Kin, S. Brachmański, “Quality assessment of musical and speech signals broadcasted via Single Frequency Network DAB+”. Int. Journal of Electronics and Telecommunications, vol. 66, no. 1, pp. 139 - 144, 2020, https://doi.org/10.24425/ijet.2020.131855
[30] W. Myślecki, W. Majewski, “Relations between subjective and objective measures of speech transmission quality evaluation”, in: Proceedings of 6th FASE Symposium, Sopron, Budapest, pp. 137-141, 2-6 September 1986.
[31] T. Thiede, W. C. Treurniet, R. Bitto, C. Schmidmer, T. Sporer, J. G. Beerends, C. Colomes, “PEAQ-The ITU standard for objective measurement of perceived audio quality”, Journal of the Audio Engineering Society, vol. 48, no. 1/2, pp. 3 - 29, 2000.
[32] P. Wagner, J. Beskow, S. Betz, J. Edlund, J. Gustafson, G. E. Henter, S. LeMaguer, Z. Malisz, E. Szekely, Ch. Tannander, J. Voβe, “Speech synthesis evaluation - State-of-the-art assessment and suggestion for novel research program”, in: Proceedings of 10th ISCA Speech Synthesis Workshop, Vienna, pp. 105 - 110, September 2019. https://doi.org/10.21437/SSW.2019-19
[33] R. E. Zezario, S.-W. Fu, F. Chen, C.-S. Fuh, H.-M. Wang, Y. Tsao, “Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features”, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 31, pp. 54-70, 2022. https://doi.org/10.1109/TASLP.2022.3205757

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c21b7f2c-b6bd-4494-a03c-b2879d4f7bb6