Speech sound detection employing deep learning

Polak, Cezary; Mańkowski, Jakub; Uciński, Wiktor; Schramka, Patryk; Mysiakowski, Mikołaj; Kurowski, Adam

doi:10.15439/2021F146

Artykuł - szczegóły

Tytuł artykułu

Speech sound detection employing deep learning

Autorzy

Polak Cezary , Mańkowski Jakub , Uciński Wiktor , Schramka Patryk , Mysiakowski Mikołaj , Kurowski Adam

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2021F146

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (16 ; 02-05.09.2021 ; online)

Języki publikacji

Abstrakty

The primary way of communication between people is speech, both in the form of everyday conversation and speech signal transmitted and recorded in numerous ways. The latter example is especially important in the modern days of the global SARS-CoV-2 pandemic when it is often not possible to meet with people and talk with them in person. Streaming, VoIP calls, live podcasts are just some of the many applications that have seen a significant increase in usage due to the necessity of social distancing. In our paper, we provide a method to design, develop, and test the deep learning-based algorithm capable of performing voice activity detection in a manner better than other benchmark solutions like the WebRTC VAD algorithm, which is an industry standard based mainly on a classic approach to speech signal processing.

Słowa kluczowe

communication speech signal global pandemic streaming VoIP calls podcast deep learning voice activity detection WebRTC VAD algorithm

komunikacja sygnał mowy globalna pandemia streaming połączenia VoIP podcast głębokie uczenie wykrywanie aktywności głosowej algorytm WebRTC VAD

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2021

Tom

Vol. 26

Strony

221--222

Opis fizyczny

Bibliogr. 6 poz., tab.

Twórcy

autor

Polak Cezary

s165516@student.pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

autor

Mańkowski Jakub

s172466@student.pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

autor

Uciński Wiktor

s160299@student.pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

autor

Schramka Patryk

s168827@student.pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

autor

Mysiakowski Mikołaj

s165771@student.pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

autor

Kurowski Adam

adakurow@pg.edu.pl

Gdańsk University of Technology, Faculty of Electronics Telecommunication and Informatics, Multimedia Systems Department, Gabriela Narutowicza 11/12, 80-233 Gdańsk, Poland

Bibliografia

1. H. Haneche, B. Boudraa, and A. Ouahabi, “A new way to enhance speech signal based on compressed sensing,” Measurement, vol. 151, p. 107117, 2020. http://dx.doi.org/https://doi.org/10.1016/j.measurement.2019.107117. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0263224119309832
2. K. Paciorek. Andrzej Duda o: LGBT, TVP, koronawirusie, głosach po Bosaku i o szansach w starciu z Trzaskowskim (in Polish). Youtube (Imponderabilia channel). [Online]. Available: https://www.youtube.com/watch?v=Izxj72bg4A4
3. Freesound. Party Sounds recording from the online royalty free recordings archive. [Online]. Available: https://freesound.org/people/FreqMan/sounds/23153/
4. B. McFee, C. Raffel, D. Liang, D. P. Ellis, M. McVicar, E. Battenberg, and O. Nieto, “librosa: Audio and music signal analysis in python,” in Proceedings of the 14th python in science conference, vol. 8, 2015.
5. GitHub. Python interface to the WebRTC voice activity detector. [Online]. Available: https://github.com/wiseman/py-webrtcvad
6. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro et al., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. [Online]. Available: https://www.tensorflow.org/

Uwagi

Track 5: Young Researchers Workshop on Artificial Intelligence and Cybersecurity

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-ea1c30ff-71aa-4ff8-8ed2-eebc5d69a519