Wyniki wyszukiwania - BazTech

Ograniczanie wyników

Znaleziono wyników: 3

Liczba wyników na stronie

Wyniki wyszukiwania

Wyszukiwano:
w słowach kluczowych: voice activity detection

Sortuj według:

Ogranicz wyniki do:

Speech emotion recognition using wavelet packet reconstruction with attention-based deep recurrent neutral networks

Meng Hao, Yan Tianhao, Wei Hongwei, Ji Xun

Bulletin of the Polish Academy of Sciences. Technical Sciences

2021

Vol. 69, nr 1

art. no. e136300

Speech emotion recognition (SER) is a complicated and challenging task in the human-computer interaction because it is difficult to find the best feature set to discriminate the emotional state entirely. We always used the FFT to handle the raw signal in the process of extracting the low-level description features, such as short-time energy, fundamental frequency, formant, MFCC (mel frequency cepstral coefficient) and so on. However, these features are built on the domain of frequency and ignore the information from temporal domain. In this paper, we propose a novel framework that utilizes multi-layers wavelet sequence set from wavelet packet reconstruction (WPR) and conventional feature set to constitute mixed feature set for achieving the emotional recognition with recurrent neural networks (RNN) based on the attention mechanism. In addition, the silent frames have a disadvantageous effect on SER, so we adopt voice activity detection of autocorrelation function to eliminate the emotional irrelevant frames. We show that the application of proposed algorithm significantly outperforms traditional features set in the prediction of spontaneous emotional states on the IEMOCAP corpus and EMODB database respectively, and we achieve better classification for both speaker-independent and speaker-dependent experiment. It is noteworthy that we acquire 62.52% and 77.57% accuracy results with speaker-independent (SI) performance, 66.90% and 82.26% accuracy results with speaker-dependent (SD) experiment in final.

Speech sound detection employing deep learning

Polak Cezary, Mańkowski Jakub, Uciński Wiktor, Schramka Patryk, Mysiakowski Mikołaj, Kurowski Adam

Annals of Computer Science and Information Systems

2021

Vol. 26

221--222

The primary way of communication between people is speech, both in the form of everyday conversation and speech signal transmitted and recorded in numerous ways. The latter example is especially important in the modern days of the global SARS-CoV-2 pandemic when it is often not possible to meet with people and talk with them in person. Streaming, VoIP calls, live podcasts are just some of the many applications that have seen a significant increase in usage due to the necessity of social distancing. In our paper, we provide a method to design, develop, and test the deep learning-based algorithm capable of performing voice activity detection in a manner better than other benchmark solutions like the WebRTC VAD algorithm, which is an industry standard based mainly on a classic approach to speech signal processing.

Influence of subband signal denoising for voice activity detection

Marciniak T., Dąbrowski A.

Elektronika : konstrukcje, technologie, zastosowania

2009

Vol. 50, nr 3

67-70

In this paper a new approach is proposed for the socalled voice activity detection (VAD) and word endpoint detection (EPD) both under assumption that the analyzed speech signal is recorded in the presence of noise. The described VAD and ERP methods contain a special stage of the wavelet subband denoising. We present effectiveness of the algorithm with this stage of processing for automatic recognition of isolated words by means experimental results.

W artykule zaproponowano nowe ujęcie detekcji sygnału mowy (VAD - Voice Activity Detection) i wyznaczania początków i końców słów (EPD - Endpoint Detection) w przypadku sygnałów mowy zarejestrowanych w obecności szumu. Opisane metody wykorzystują specjalny etap odszumiania sygnału w podpasmach z użyciem transformacji zafalowaniowej. Na podstawie wyników eksperymentalnych zaprezentowano skuteczność algorytmów zawierających ten etap przetwarzania w przypadku automatycznego rozpoznawania izolowanych słów.