Wyniki wyszukiwania - BazTech

1

Short Utterance Speaker Recognition Based on Speech High Frequency Information Compensation and Dynamic Feature Enhancement Methods

Zi Yunfei, Xiong Shengwu

Archives of Acoustics

|

2024

|

Vol. 49, nr 1

37--48

EN

This work aims to further compensate for the weaknesses of feature sparsity and insufficient discriminative acoustic features in existing short-duration speaker recognition. To address this issue, we propose the Bark-scaled Gauss and the linear filter bank superposition cepstral coefficients (BGLCC), and the multidimensional central difference (MDCD) acoustic feature extracted method. The Bark-scaled Gauss filter bank focuses on low-frequency information, while linear filtering is uniformly distributed, therefore, the filter superposition can obtain more discriminative and richer acoustic features of short-duration audio signals. In addition, the multi-dimensional central difference method captures better dynamics features of speakers for improving the performance of short utterance speaker verification. Extensive experiments are conducted on short-duration text-independent speaker verification datasets generated from the VoxCeleb, SITW, and NIST SRE corpora, respectively, which contain speech samples of diverse lengths, and different scenarios. The results demonstrate that the proposed method outperforms the existing acoustic feature extraction approach by at least 10% in the test set. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.

2

Impact of the Passage of Time on the Correct Identification of the Speaker Using the Auditory Method

Brachmanski Stefan, Hus Bartosz, Staroniewicz Piotr

Archives of Acoustics

|

2024

|

Vol. 49, nr 1

141–147, 2024

EN

Courts in Poland, as well as in most countries in the world, allow for the identification of a person on the basis of his/her voice using the so-called voice presentation method, i.e., the auditory method. This method is used in situations where there is no sound recording and the perpetrator of the criminal act was masked and the victim heard only his or her voice. However, psychologists, forensic acousticians, as well as researchers in the field of auditory perception and forensic science more broadly describe many cases in which such testimony resulted in misjudgement. This paper presents the results of an experiment designed to investigate, in a Polish language setting, the extent to which the passage of time impairs the correct identification of a person. The study showed that 31 days after the speaker’s voice was first heard, the correct identification for a female voice was 30% and for a male voice 40%.

3

Effect of sleepiness in the voice on speaker recognition performance

Staroniewicz Piotr

Vibrations in Physical Systems

|

2021

|

Vol. 32, nr 2

art. no. 2021214

EN

The issue of the influence of speaker state on voice recognition has been analysed mainly in relation to forensics and biometric security systems. Sleepiness in the voice is a rather under-researched problem, and the few works in this area focus almost exclusively on the recognition of sleepiness rather than on its influence on the change of the speaker's voice characteristics. This paper discusses the issue of the influence of the speaker's state on voice recognition, describes the acquisition method of the acoustic database of voice drowsiness recordings used in the tests. It also discusses the subjective sleepiness scales used in the study and presents the results of the influence of sleepiness on the effectiveness of automatic speaker recognition based on a classical system using the Mel-Frequency Cepstral Coefficients parameterisation and the Gaussian Mixture Models classification.

4

Voice authentication based on the Russian-language dataset, MFCC method and the anomaly detection algorithm

Sidorova Anna, Kogos Konstantin

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

537--540

EN

Almost all people's data is stored on their personal devices. For this reason, there is a need to protect information from unauthorized access by means of user authentication. PIN codes, passwords, tokens can be forgotten, lost, transferred, brute-force attacked. For this reason, biometric authentication is gaining in popularity. Biometric data are unchanged for a long time, different for users, and can be measured. This paper explores voice authentication due to the ease of use of this technology, since obtaining voice characteristics of users doesn't require an equipment in addition to the microphone, which is built into almost all devices. The method of voice authentication based on an anomaly detection algorithm has been proposed. The software module for text-independent authentication has been developed on the Python language. It's based on a new Mozilla's open source voice dataset "Common voice". Experimental results confirmed the high accuracy of authentication by the proposed method.

5

Agentowa struktura wielomodalnego interfejsu do Narodowej Platformy Cyberbezpieczeństwa, część 2

Kasprzak Włodzimierz, Szynkiewicz Wojciech, Stefańczyk Maciej, Dudek Wojciech, Figat Maksym, Węgierek Maciej, Seredyński Dawid, Zieliński Cezary

Pomiary Automatyka Robotyka

|

2019

|

R. 23, nr 4

5--18

PL

Ten dwuczęściowy artykuł przedstawia interfejs do Narodowej Platformy Cyberbezpieczeństwa (NPC). Wykorzystuje on gesty i komendy wydawane głosem do sterowania pracą platformy. Ta część artykułu przedstawia strukturę interfejsu oraz sposób jego działania, ponadto prezentuje zagadnienia związane z jego implementacją. Do specyfikacji interfejsu wykorzystano podejście oparte na agentach upostaciowionych, wykazując że podejście to może być stosowane do tworzenia nie tylko systemów robotycznych, do czego było wykorzystywane wielokrotnie uprzednio. Aby dostosować to podejście do agentów, które działają na pograniczu środowiska fizycznego i cyberprzestrzeni, należało ekran monitora potraktować jako część środowiska, natomiast okienka i kursory potraktować jako elementy agentów. W konsekwencji uzyskano bardzo przejrzystą strukturę projektowanego systemu. Część druga tego artykułu przedstawia algorytmy wykorzystane do rozpoznawania mowy i mówców oraz gestów, a także rezultaty testów tych algorytmów.

EN

This two part paper presents an interface to the National Cybersecurity Platform utilising gestures and voice commands as the means of interaction between the operator and the platform. Cyberspace and its underlying infrastructure are vulnerable to a broad range of risk stemming from diverse cyber-threats. The main role of this interface is to support security analysts and operators controlling visualisation of cyberspace events like incidents or cyber-attacks especially when manipulating graphical information. Main visualization control modalities are gesture- and voice-based commands. Thus the design of gesture recognition and speech-recognition modules is provided. The speech module is also responsible for speaker identification in order to limit the access to trusted users only, registered with the visualisation control system. This part of the paper focuses on the structure and the activities of the interface, while the second part concentrates on the algorithms employed for the recognition of: gestures, voice commands and speakers.

6

Agentowa struktura wielomodalnego interfejsu do Narodowej Platformy Cyberbezpieczeństwa, część 1

Kasprzak Włodzimierz, Szynkiewicz Wojciech, Stefańczyk Maciej, Dudek Wojciech, Figat Maksym, Węgierek Maciej, Seredyński Dawid, Zieliński Cezary

Pomiary Automatyka Robotyka

|

2019

|

R. 23, nr 3

41--54

PL

Ten dwuczęściowy artykuł przedstawia interfejs do Narodowej Platformy Cyberbezpieczeństwa (NPC). Wykorzystuje on gesty i komendy wydawane głosem do sterowania pracą platformy. Ta część artykułu przedstawia strukturę interfejsu oraz sposób jego działania, ponadto prezentuje zagadnienia związane z jego implementacją. Do specyfikacji interfejsu wykorzystano podejście oparte na agentach upostaciowionych, wykazując że podejście to może być stosowane do tworzenia nie tylko systemów robotycznych, do czego było wykorzystywane wielokrotnie uprzednio. Aby dostosować to podejście do agentów, które działają na pograniczu środowiska fizycznego i cyberprzestrzeni, należało ekran monitora potraktować jako część środowiska, natomiast okienka i kursory potraktować jako elementy agentów. W konsekwencji uzyskano bardzo przejrzystą strukturę projektowanego systemu. Część druga tego artykułu przedstawia algorytmy wykorzystane do rozpoznawania mowy i mówców oraz gestów, a także rezultaty testów tych algorytmów.

EN

This two part paper presents an interface to the National Cybersecurity Platform utilising gestures and voice commands as the means of interaction between the operator and the platform. Cyberspace and its underlying infrastructure are vulnerable to a broad range of risk stemming from diverse cyber-threats. The main role of this interface is to support security analysts and operators controlling visualisation of cyberspace events like incidents or cyber-attacks especially when manipulating graphical information. Main visualization control modalities are gesture- and voice-based commands. Thus the design of gesture recognition and speech-recognition modules is provided. The speech module is also responsible for speaker identification in order to limit the access to trusted users only, registered with the visualisation control system. This part of the paper focuses on the structure and the activities of the interface, while the second part concentrates on the algorithms employed for the recognition of: gestures, voice commands and speakers.

7

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Kaur G., Srivastava M., Kumar A.

Journal of Telecommunications and Information Technology

|

2018

|

nr 2

23--31

EN

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

8

An embedded system for real-time speaker recognition using Raspberry Pi platform

Weychan R., Marciniak T., Dąbrowski A.

Elektronika : konstrukcje, technologie, zastosowania

|

2016

|

Vol. 57, nr 4

3--6

EN

The paper presents an embedded system, which realizes real time speaker recognition from the internet radio broadcasts. The proposed solution was developed with the use of the open source Python programming language. It was first tested within the Windows environment, then adapted to the Unix operating system in order to use is on the Raspberry Pi 2 platform. We analyzed available libraries to select the most convenient solutions for individual blocks of the speaker recognition task. In the paper we also indicate parameters, for which the algorithm exhibits the greatest efficiency. The prepared software is available on the Github file repository.

PL

Artykuł prezentuje system realizujący rozpoznawanie mówcy z radia internetowego. Zaproponowane rozwiązanie wykorzystuje narzędzia udostępnione w ramach ogólnie dostępnego oprogramowania dla języka Python. Prezentowane oprogramowanie zostało przetestowane w środowisku Windows a następnie zostało zaadaptowane do uruchomienia na platformie Raspberry Pi 2, zarządzanej przez system Linux. W artykule przeanalizowano dostępne biblioteki, które posłużyły do implementacji algorytmów ekstrakcji cech oraz modelowania sygnału mowy. Przeprowadzone eksperymenty pozwoliły na dobranie parametrów systemu, przy których uzyskuje się najlepszą skuteczność identyfikacji i jednocześnie największą szybkość przetwarzania danych. Przygotowane oprogramowanie jest dostępne w repozytorium Github.

9

Identyfikacja głosowa w otwartym zbiorze mówców

Kamiński K., Dobrowolski A. P., Majda-Zdancewicz E.

Przegląd Elektrotechniczny

|

2015

|

R. 91, nr 10

206-210

PL

W artykule zaprezentowano wyniki badań systemu automatycznego rozpoznawania mówcy, przeprowadzane z wykorzystaniem komercyjnej bazy głosów TIMIT. Głównym celem badań było rozszerzenie funkcjonalności systemu rozpoznawania mówcy poprzez dodanie układu progowego, a tym samym umożliwienie identyfikacji w otwartym zbiorze mówców. Przedstawiono różne warianty zastosowanego układu progowego oraz dokonano próby wzbogacenia wektora cech dystynktywnych o różnicę częstotliwości podstawowej wyznaczanej dwiema różnymi metodami.

EN

In the article there are presented the test results of the automatic speaker recognition system, conducted while using the commercial voice basis TIMIT. The main purpose of the test was to extend the functionality of the speaker recognition system by adding the threshold based system, and consequently to enable the identification in the open set of speakers. There are presented different application variants of the threshold based system and there is an attempt to enrich the vector of distinctive features with the fundamental frequency difference determined with two different methods.

10

Real time recognition of speakers from internet audio stream

Weychan R., Marciniak T., Stankiewicz A., Dabrowski A.

Foundations of Computing and Decision Sciences

|

2015

|

Vol. 40, No. 3

223--233

EN

In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.

11

Zastosowanie rozpoznawania mówcy w automatycznej translacji mowy typu speech-to-speech

Kłosowski P., Dustor A., Izydorczyk J

Studia Informatica

|

2014

|

Vol. 35, nr 3

71--81

PL

Przedstawiony artykuł dotyczy zagadnień związanych z funkcjonowaniem systemów automatycznej translacji mowy ciągłej. W systemach tych wykorzystuje się techniki przetwarzania języka naturalnego realizowane z wykorzystaniem algorytmów automatycznego rozpoznawania mowy, automatycznej translacji tekstów oraz zamiany tekstu na mowę za pomocą syntezy mowy. W artykule zaproponowano także metodę usprawnienia procesu automatycznej translacji mowy przez zastosowanie algorytmów automatycznej identyfikacji mówcy, pozwalających na automatyczną segmentację mowy pochodzącej od różnych mówców.

EN

This paper concerns the machine translation of continuous speech. These systems use machine language processing techniques implemented using algorithms of automatic speech recognition, automatic text translation and text-to-speech conversion using speech synthesis.

12

System kontroli dostępu oparty na biometrycznej weryfikacji głosu

Gałka J., Mąsior M., Salasa S.

Przegląd Elektrotechniczny

|

2014

|

R. 90, nr 11

248--255

PL

Artykuł przedstawia koncepcję głosowego, biometrycznego systemu dostępowego zrealizowanego jako system wbudowany. Zaprezentowano najważniejsze wymagania dotyczące systemów kontroli dostępu oraz wynikające z nich założenia projektowe. Opisano architekturę utworzonego systemu, jego funkcjonalność oraz zastosowane metody weryfikacji mówcy wraz z omówieniem podstawowych metod optymalizacji czasowej implementacji. Całość poprzedzona jest zarysem zagadnienia biometrii głosu oraz automatycznego przetwarzania mowy.

EN

The paper presents the concept of embedded solution for voice biometric access system. The most important requirements for access control systems are presented, as well as the resulting design intent. The architecture of the created system, its functionality and the methods used to verify the speakers is described along with a discussion of basic time-optimization methods of implementation. The entirety is preceded by an outline of the issues of voice biometrics and automatic speech processing.

13

Ocena funkcjonalności systemu rozpoznawania mówcy dla zdegradowanej jakości sygnału głosowego

Kamiński K., Dobrowolski A.P., Majda E.

Przegląd Elektrotechniczny

|

2014

|

R. 90, nr 8

164-167

PL

W artykule przedstawiono wyniki badań automatycznego systemu rozpoznawania mówcy (ASR – ang. Automatic Speaker Recognition), przeprowadzonych na podstawie komercyjnej bazy głosów TIMIT. Badania prowadzone były pod kątem zastosowania ASR jako systemu automatycznego rozpoznawania rozmówcy telefonicznego. Przedstawiono również wpływ liczebności bazy głosów oraz stopień oddziaływania kompresji stratnej MP3 na skuteczność rozpoznawania mówcy.

EN

The article presents the results of tests of an automatic speaker recognition system (ASR) conducted on the basis of the TIMIT commercial voice database. The research was conducted with the aim of using ASR as a system for automatic recognition of telephone callers. The impact of the number of voices in the database and the effect of lossy MP3 compression on the effectiveness of speaker recognition has also been shown.

14

TEO-CFCC Characteristic Parameter Extraction Method for Speaker Recognition in Noisy Environments

Li L., An D., Zhao D., Rong C., Ma S.

Przegląd Elektrotechniczny

|

2013

|

R. 89, nr 2a

118--121

EN

This paper proposes TEO-CFCC characteristic parameter extraction method. Signal phase matching is applied to eliminate speech noise on the basis of CFCC characteristic parameter, and then Teager energy operator is added to the acquisition of CFCC characteristic parameter. In this way TEO-CFCC characteristic parameter is obtained and the energy of speech becomes one of the characteristic parameters for speaker recognition. Experiment results show that the recognition accuracy can reach to 83.2% in a -5dB SNR of vehicle interior noise environment by using TEO-CFCC characteristic parameter.

PL

W artykule przedstawiono metodę wyznaczania parametrów charakterystycznych filtru TEO-CFCC. Zastosowano tu dopasowywanie fazowe sygnału, dla eliminacji z mowy szumów oraz operator Teagera do wyrugowania parametrów. Badania eksperymentalne pokazuję, że dokładność rozpoznania głosu wynosi 83,2% przy -5dB SNR we wnętrzu pojazdu.

15

System automatycznego rozpoznawania mówcy z wykorzystaniem techniki cepstralnej i modeli mieszanin gaussowskich

Kamiński K., Dobrowolski A.P.

Przegląd Elektrotechniczny

|

2013

|

R. 89, nr 9

83-93

PL

W niniejszym artykule zaprezentowano zaimplementowany w środowisku Matlab system automatycznego rozpoznawania mówcy, wykorzystujący do opisu głosu unikatowy wektor cech, tzw. „odcisk głosu” (VP – ang. Voice Print). System używa w procesie klasyfikacji tzw. modele mieszanin Gaussowskich (GMM – ang. Gaussian Mixture Model). W końcowej części artykułu przedstawione są badania skuteczności rozpoznawania mówców dla różnych wariantów systemu oraz w różnych konfiguracjach jego parametrów.

EN

The paper discusses the system of automatic speaker recognition, implemented in Matlab environment and using a unique vector of features, the so-called voice print (VP) for voice description. The system uses the so-called Gaussian Mixture Models (GMM) for the classification process. The final section of the paper presents the studies on the efficiency of speaker recognition for various system versions and for different system parameter configurations.

16

Speaker Recognition System Based on GMM Multivariate Probability Distributions built-in a Digital Watermarking Token

Lenarczyk P., Piotrowski Z.

Przegląd Elektrotechniczny

|

2013

|

R. 89, nr 2a

59--63

PL

Przedstawiony poniżej artykuł opisuje system rozpoznawania mówcy na podstawie mowy ciągłej, wykorzystując wielowariancyjne rozkłady prawdopodobieństwa GMM. Opisane zostały procesy ekstrakcji cech dystynktywnych głosu oraz tworzenia modeli statystycznych. Algorytm został zaimplementowany w systemie Linux w celu poprawy funkcjonalności identyfikacji użytkownika Zaufanego Osobistego Terminalu PTT.

EN

The article describes a speaker recognition system based on continuous speech using GMM multivariate probability distributions. A theoretical model of the system including the extraction of distinctive features and statistical modeling is described. The efficiency of the system implemented in the Linux operating system was determined. The system is designed to support the functionality of the Personal Trusted Terminal PTT in order to uniquely identify a subscriber using the device.

17

Combining Multiple Sound Sources Localization Hybrid Algorithm and Fuzzy Rule Based Classification for Real-time Speaker Tracking Application

Ibala C, Astapov S, Bettens F, Escobar F, Chang X, Valderrama C, Riid A

International Journal of Microelectronics and Computer Science

|

2013

|

Vol. 4, nr 1

12--25

EN

This work present a novel approach to track a specific speaker among multiple using the Minimum Variance Distortionless Response (MVDR) beamforming and fuzzy logic ruled based classification for speaker recognition. The Sound sources localization is performed with an improve delay and sum beamforming (DSB) computation methodology. Our proposed hybrid algorithm computes first the Generalized Cross Correlation (GCC) to create a reduced search spectrum for the DSB algorithm. This methodology reduces by more than 70% the DSB localization computation burden. Moreover for high frequencies Sound sources beamforming, the DSB will be preferred to the MVDR for logic and power consumption reduction.

18

Układy elektroniczne jako elementy ludzkiego ciała i człowiek jako element układów elektronicznych

Dąbrowski A., Kardyś P., Portalski M., Cetnarowicz D., Drgas S., Pawłowski P.

Elektronika : konstrukcje, technologie, zastosowania

|

2013

|

Vol. 54, nr 9

53-57

PL

W artykule przedstawiono integrację układów elektronicznych z ciałem ludzkim na przykładzie badań prowadzonych w Pracowni Układów Elektronicznych i Przetwarzania Sygnałów (PUEPS) w Politechnice Poznańskiej. Omówiono poprawę zrozumiałości mowy, w tym osób laryngektomowanych, testy audiometryczne, generację wielotonów nieharmonicznych, badania bioimpedancyjne, detekcję punktów akupunkturowych, terapię dźwiękiem oraz diagnostykę akustyczną, a także automatyczne rozpoznawanie mówcy.

EN

In this article an integration of electronic systems with human body has been presented. It is based on the research conducted by the Division of Signal Processing and Electronic Systems (DSP&ES) at Poznan University of Technology. Some essential issues have been discussed such as: speech intelligibility enhancement including laryngectomees’ pseudospeech/pseudowhisper, audiometric tests, non-harmonic multitones generation, bioimpedance studies, acupuncture points detection, sound therapy and acoustic diagnostics, as well as automatic speaker recognition.

19

Subscriber authentication using GMM and TMS320C6713DSP

Piotrowski Z., Wojtuń J., Kamiński K.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 12a

127-130

EN

The article presents the theoretical basis for the implementation of Gaussian Mixture Models and implementation of a word recognition system on the basis of DSK TMS302C6713 DSP from Texas Instruments. The effectiveness of the algorithm based on Gaussian Mixture Model has been demonstrated. The system was developed as a software module for voice authentication of a subscriber in a Personal Trusted Terminal (PTT). The PIN of a subscriber is verified through an utterance in the Personal Trusted Terminal.

PL

W artykule zaprezentowano teoretyczne podstawy realizacji Modeli Mikstur Gausowskich oraz implementację systemu rozpoznawania słów z wykorzystaniem zastawu uruchomieniowego DSK TMS302C6713 DSP firmy Texas Instruments. Zobrazowano skuteczność działania algorytmu opartego na Modelach Mikstur Gausowskich. System został opracowany jako moduł programowy na potrzeby głosowego uwierzytelniania abonenta w Osobistym Zaufanym Terminalu (PTT). Poprzez wypowiedzenie głosem swojego PIN-u abonent jest weryfikowany w Osobistym Zaufanym Terminalu.

20

Speaker recognition based on telephone quality short Polish sequences with removed silence

Marciniak. T., Krzykowska A., Weychan R.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 6

42-46

EN

This paper presents the effectiveness of speaker identification based on short Polish sequences. An impact of automatic removal of silence on the speaker recognition accuracy is considered. Several methods to detect the beginnings and ends of the voice signal have been used. Experimental research was carried out in Matlab environment with the use of a specially prepared database of short speech sequences in Polish. The construction of speaker models was realized with two techniques: Vector Quantization (VQ) and Gaussian Mixture Models (GMM). We also tested the influence of the sampling rate reduction on the speaker recognition performance.

PL

Artykuł przedstawia badania efektywności rozpoznawania mówcy opartego na krótkich wypowiedziach w języku polskim. Sprawdzono wpływ automatycznego wykrywania i usuwania ciszy na jakość rozpoznawania mówcy. Przebadano kilka różnych metod wykrywania początku i końca fragmentów mowy w wypowiadanych sekwencjach. Eksperymenty zostały przeprowadzone z użyciem środowiska Matlab i specjalnie utworzonej bazy krótkich wypowiedzi w języku polskim. Do budowy modeli mówców wykorzystano kwantyzacja wektorowa (VQ) oraz Gaussian Mixture Models (GMM). Podczas badań sprawdzono także wpływ obniżenia szybkości próbkowania na skuteczność identyfikacji mówcy.