BIGOS - Benchmark Intended Grouping of Open Speech Corpora for Polish Automatic Speech Recognition

Junczyk, Michał

doi:10.15439/2023F1609

Artykuł - szczegóły

Tytuł artykułu

BIGOS - Benchmark Intended Grouping of Open Speech Corpora for Polish Automatic Speech Recognition

Autorzy

Junczyk Michał

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2023F1609

Warianty tytułu

Języki publikacji

Abstrakty

This paper presents a Benchmark Intended Grouping of Open Speech (BIGOS), a new corpus designed for Polish Automatic Speech Recognition (ASR) systems. This initial version of the benchmark leverages 1,900 audio recordings from 71 distinct speakers, sourced from 10 publicly available speech corpora. Three proprietary ASR systems and five open-source ASR systems were evaluated on a diverse set of recordings and the corresponding original transcriptions. Interestingly, it was found that the performance of the latest open-source models is on par with that of more established commercial services. Furthermore, a significant influence of the model size on system accuracy was observed, as well as a decrease in scenarios involving highly specialized or spontaneous speech. The challenges of using public datasets for ASR evaluation purposes and the limitations based on this inaugural benchmark are critically discussed, along with recommendations for future research. BIGOS corpus and associated tools that facilitate replication and customization of the benchmark are made publicly available.

Słowa kluczowe

computer science benchmark testing audio automatic speech recognition

informatyka testy porównawcze audio automatyczne rozpoznawanie mowy

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2023

Tom

Vol. 35

Strony

585--590

Opis fizyczny

Bibliogr. 33 poz., tab.

Twórcy

autor

Junczyk Michał

michal.junczyk@amu.edu.pl

Adam Mickiewicz University

Bibliografia

1. Alëna Aksënova et al. “How Might We Create Better Benchmarks for Speech Recognition?” In: Association for Computational Linguistics, 2021, pp. 22–34. http://dx.doi.org/10.18653/v1/2021.bppf-1.4.
2. Piotr Szymański et al. “WER we are and WER we think we are”. In: Association for Computational Linguistics, 2020, pp. 3290–3295. http://dx.doi.org/10.18653/v1/2020.findings-emnlp.295.
3. Johannes Wirth and Rene Peinl. “ASR in German: A Detailed Error Analysis”. In: (2022). http://dx.doi.org/10.48550/arXiv.2204.05617.
4. Miguel Del Rio et al. “Earnings-21: A Practical Benchmark for ASR in the Wild”. In: (2021).
5. Miguel Del Rio et al. “Earnings-22: A Practical Benchmark for Accents in the Wild”. In: (Mar. 2022). http://dx.doi.org/10.48550/arXiv.2203.15591.
6. Sanchit Gandhi, Patrick von Platen, and Alexander M. Rush. “ESC: A Benchmark For Multi-Domain End-to-End Speech Recognition”. In: (Oct. 2022). http://dx.doi.org/10.48550/arXiv.2210.13352.
7. Malgorzata Anna Ulasik et al. “CEASR: A corpus for evaluating automatic speech recognition”. In: 2020, pp. 6477–6485.
8. Péter Mihajlik et al. “BEA-Base: A Benchmark for ASR of Spontaneous Hungarian”. In: 2022 Language Resources and Evaluation Conference, LREC 2022 (Feb. 2022), pp. 1970–1977. DOI : 10.48550/arXiv.2202.00601.
9. Vassil Panayotov et al. LIBRISPEECH: AN ASR CORPUS BASED ON PUBLIC DOMAIN AUDIO BOOKS.
10. Vineel Pratap et al. “MLS: A Large-Scale Multilingual Dataset for Speech Research”. In: Proc. Interspeech 2020. 2020, pp. 2757–2761. http://dx.doi.org/10.21437/Interspeech.2020-2826.
11. François Hernandez et al. “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation”. In: (2018). http://dx.doi.org/10.1007/978-3-319-99579-3_21.
12. Heidi Christensen et al. “The CHiME corpus: a resource and a challenge for computational hearing in multi-source environments”. In: ISCA, 2010, pp. 1918–1921. DOI : 10.21437/Interspeech.2010-552.
13. Rosana Ardila et al. “Common Voice: A Massively-Multilingual Speech Corpus”. In: (2020). DOI : 10.48550/arXiv.1912.06670.
14. Christian Gaida et al. “Comparing Open-Source Speech Recognition Toolkits”. In: 2014.
15. Meredith Moore et al. “Say What? A Dataset for Exploring the Error Patterns That Two ASR Engines Make”. In: 2019, pp. 2528–2532. http://dx.doi.org/10.21437/Interspeech.2019-3096.
16. Ingo Siegert et al. Recognition Performance of Selected Speech Recognition APIs – A Longitudinal Study. 2020. DOI : 10.1007/978-3-030-60276-5_50.
17. Binbin Xu et al. “A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect”. In: (2021).
18. Vered Silber Varod et al. “A cross-language study of speech recognition systems for English, German, and Hebrew”. In: Online Journal of Applied Knowledge Management (2021), pp. 1–15. DOI : 10.36965/OJAKM. 2021.9(1)1-15.
19. Morgane Riviere, Jade Copet, and Gabriel Synnaeve. “ASR4REAL: An extended benchmark for speech models”. In: (2021).
20. Martha Maria Papadopoulou, Anna Zaretskaya, and Ruslan Mitkov. “Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis”. In: INCOMA Ltd., 2021, pp. 199–207.
21. Alëna Aksënova et al. “Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data”. In: (2022). http://dx.doi.org/10.48550/arXiv.2205.08014.
22. Regis Pires Magalhães et al. “Evaluation of Automatic Speech Recognition Approaches”. In: Journal of Information and Data Management 13 (3 Sept. 2022). http://dx.doi.org/10.5753/jidm.2022.2514.
23. Marcin Pacholczyk. Przegląd I porównanie rozwiazań rozpoznawania mowy pod kątem rozpoznawania zbioru komend głosowych. 2018.
24. Danijel Koržinek. “Task 5: Automatic speech recognition PolEval 2019 competition”. In: (2019). URL: http://2019.poleval.pl/files/2019/11.pdf.
25. Nahuel Unai et al. “Development and evaluation of a Polish ASR system using the TLK toolkit”. 2019.
26. Danijel Koržinek, Krzysztof Marasek, and Łukasz Brocki. Polish Read Speech Corpus for Speech Tools and Services. 2016.
27. Piotr Pęzik. “Spokes – a search and exploration service for conversational corpus data”. In: 2015.
28. Piotr Pęzik. “Increasing the Accessibility of Time-Aligned Speech Corpora with Spokes Mix”. In: European Language Resources Association (ELRA), 2018.
29. Krzysztof Marasek, Danijel Korzinek, and Łukasz Brocki. “System for Automatic Transcription of Sessions of the Polish Senate”. In: (2014).
30. Piotr Pęzik et al. DiaBiz - an Annotated Corpus of Polish Call Center Dialogs, pp. 20–25.
31. Piotr Pęzik and Michał Adamczyk. Automatic Speech Recognition for Polish in 2022. University of Łódź, 2022. URL: https://clarin-pl.eu/dspace/bitstream/handle/11321/894/ASR_PL_report_2022.pdf.
32. Alec Radford et al. “Robust Speech Recognition via Large-Scale Weak Supervision”. In: (2022). http://dx.doi.org/10.48550/arXiv.2212.04356.
33. Piotr Kozierski et al. “Acoustic Model Training, using Kaldi, for Automatic Whispery Speech Recognition”. In: 2018. DOI : 10.15439/2018F255.

Uwagi

1. Thematic Tracks Regular Papers

2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6bf96921-456d-4e9d-8a52-c9a11a1e552b