Wyniki wyszukiwania - BazTech

1

Parametryzacja sygnału mowy w algorytmach rozpoznawania mowy

Wojtuń J., Ośka J., Piotrowski Z., Bernat M.

Elektronika : konstrukcje, technologie, zastosowania

|

2015

|

Vol. 56, nr 2

34-39

PL

Historia systemów automatycznego rozpoznawania mowy ma już kilkadziesiąt lat. Pierwsze prace badawcze z tego zakresu pochodzą z lat 50. XX wieku (prace w laboratoriach Bella oraz MIT). Pomimo iż zagadnieniem tym zajmuje się wiele zespołów badawczych na całym świecie, problem automatycznego rozpoznawania mowy nie został definitywne rozwiązany. Dostępne systemy rozpoznawania mowy nadal charakteryzują się gorszą skutecznością w porównaniu do umiejętności człowieka. W artykule przedstawiono schemat systemu rozpoznawania mowy na przykładzie rozpoznawania izolowanych słów języka polskiego. Zaprezentowano szczegółowy opis wyznaczania cech dystynktywnych sygnału mowy w oparciu o współczynniki mel – cepstralne oraz cepstralne współczynniki liniowej predykcji. Przedstawiono wyniki skuteczności rozpoznawania poszczególnych fraz.

EN

The first research in automatic speech recognition systems dates back to the fifties of the 20th century (the works of Bell Labs and MIT). Although this issue has been treated by many research teams, the problem of automatic speech recognition has not been definitively resolved and remains open. Available voice recognition systems still have a poorer efficiency compared to human skills. This article presents a diagram of speech recognition system for isolated words of the Polish language. A detailed description of the determination of distinctive features of the speech signal is presented based on the mel-frequency cepstral coefficient and linear predictive cepstral coefficients. Efficiency results are also presented.

2

Subscriber authentication using GMM and TMS320C6713DSP

Piotrowski Z., Wojtuń J., Kamiński K.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 12a

127-130

EN

The article presents the theoretical basis for the implementation of Gaussian Mixture Models and implementation of a word recognition system on the basis of DSK TMS302C6713 DSP from Texas Instruments. The effectiveness of the algorithm based on Gaussian Mixture Model has been demonstrated. The system was developed as a software module for voice authentication of a subscriber in a Personal Trusted Terminal (PTT). The PIN of a subscriber is verified through an utterance in the Personal Trusted Terminal.

PL

W artykule zaprezentowano teoretyczne podstawy realizacji Modeli Mikstur Gausowskich oraz implementację systemu rozpoznawania słów z wykorzystaniem zastawu uruchomieniowego DSK TMS302C6713 DSP firmy Texas Instruments. Zobrazowano skuteczność działania algorytmu opartego na Modelach Mikstur Gausowskich. System został opracowany jako moduł programowy na potrzeby głosowego uwierzytelniania abonenta w Osobistym Zaufanym Terminalu (PTT). Poprzez wypowiedzenie głosem swojego PIN-u abonent jest weryfikowany w Osobistym Zaufanym Terminalu.

3

Hierarchical Classification of Environmental Noise Sources Considering the Acoustic Signature of Vehicle Pass-Bys

Valero X., Alias F.

Archives of Acoustics

|

2012

|

Vol. 37, No. 4

423-434

EN

This work is focused on the automatic recognition of environmental noise sources that affect humans’ health and quality of life, namely industrial, aircraft, railway and road traffic. However, the recognition of the latter, which have the largest influence on citizens’ daily lives, is still an open issue. Therefore, although considering all the aforementioned noise sources, this paper especially focuses on improving the recognition of road noise events by taking advantage of the perceived noise differences along the road vehicle pass-by (which may be divided into different phases: approaching, passing and receding). To that effect, a hierarchical classification scheme that considers these phases independently has been implemented. The proposed classification scheme yields an averaged classification accuracy of 92.5%, which is, in absolute terms, 3% higher than the baseline (a traditional flat classification scheme without hierarchical structure). In particular, it outperforms the baseline in the classification of light and heavy vehicles, yielding a classification accuracy 7% and 4% higher, respectively. Finally, listening tests are performed to compare the system performance with human recognition ability. The results reveal that, although an expert human listener can achieve higher recognition accuracy than the proposed system, the latter outperforms the non-trained listener in 10% in average.

4

Automatyczna weryfikacja mówcy oparta na cechach prozodycznych

Drgaś S., Cetnarowicz D., Dąbrowski A.

Elektronika : konstrukcje, technologie, zastosowania

|

2009

|

Vol. 50, nr 3

21-24

PL

W artykule oceniano skuteczność systemu automatycznej weryfikacji mówcy opartego na cechach prozodycznych. Poprawność rozpoznawania mówcy zbadano za pomocą modeli opartych na bigramach. Na podstawie uzyskanych rezultatów wykazano, że rytm prozodii niesie istotne informacje zależne od mówcy. Ponadto opracowano metodę doboru liczby poziomów kwantyzacji w zależności od czasów trwania segmentów.

EN

In this paper accuracy of the speaker verificatipn system based on prosodic features was evaluated. Efficiency of bigram models for the speaker recognition was assessed. The results showed that in speech the prosody rhythm carries valuable speaker specific information. Appropriate numbers of quantization levels in relation to segments' duration was determined.

5

HFCC based recognition of bird species

Wielgat R., Zieliński T. P., Potempa T., Lisowska-Lis A., Król D.

Elektronika : konstrukcje, technologie, zastosowania

|

2008

|

Vol. 49, nr 4

90-94

EN

Results from preliminary research on recognition of Polish birds' species are presented in the paper. Bird voices were recorded in a highly noised municipal environment. High 96 kHz sampling frequency has been used. As a feature set standard mel-frequency cepstral coefficients (MFCC) and recently proposed human-factor cepstral coefficients (HFCC) parameters were selected. Superior performance of the HFCC features over MFCC ones has been observed. Proper limiting of the maximal frequency during HFCC feature extraction results in increasing accuracy of birds' species recognition. Good initial results are very promising for practical application of the methods described in the paper in monitoring of protected birds' area.

PL

W artykule zaprezentowano wyniki wstępnych badań dotyczących rozpoznawania głosów ptaków. Nagrania cyfrowe ptaków dokonano z częstotliwością próbkowania 96 kHz w zaszumionym środowisku miejskim. Jako cech użyto współczynników mel-cepstralnych (MFCC) oraz ostatnio zaproponowanych współczynników human-cepstralnych (HFCC). Zaobserwowano większą skuteczność rozpoznawania, prowadzonego z użyciem tych drugich. Pokazano, że odpowiednie ograniczenie maksymalnej częstotliwości podczas wyznaczania współczynników HFCC prowadzi do podniesienia efektywności rozpoznawania. Uzyskane obiecujące wyniki są dobrym prognostykiem do planowanego, praktycznego zastosowania opisanych metod do monitorowania ostoi ptaków.