Wyniki wyszukiwania - Biblioteka Nauki

1

A hybrid method of person verification with use independent speech and facial asymmetry

100%

Kubanek M. , Rydzek S.

|

tom nr 4 (Tom 17)

91--99

EN

In a person identification or verification, the prime interest is not in recognizing the words but determining who is speaking the words. In systems of person identification, a test of signal from an unknown speaker is compared to all known speaker signals in the set. The signal that has the maximum probability is identified as the unknown speaker. In security systems based on person identification and verification, faultless identification has huge meaning for safety. In systems of person verification, a test of signal from a known speaker is compared to recorded signals in the set, connected with a known tested persons label. There are more than one recorded signals for every user in the set. In aim of increasing safety, in this work it was proposed own approach to person verification, based on independent speech and facial asymmetry. Extraction of the audio features of person's speech is done using mechanism of cepstral speech analysis. The idea of improvement of effectiveness of face recognition technique was based on processing information regarding face asymmetry in the most informative parts of the face the eyes region.

2

Coding effects on changes in formant frequencies in Japanese speech signals

86%

Kucharski M. , Brachmański S.

|

tom Vol. 30, nr 1

art. no. 2019131

EN

This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.

3

Analysis of signal of audio speech in process of speech recognition

72%

Kubanek M.

|

2006

|

tom Vol. 2, nr 1

55-64

EN

The purpose of this work is to explain the theoretical issues and implementational techniques related to the fascinating field of speech recognition. The topic of discussion are focused on some of the well-established and widely used speech coding standards, required to speech recognition and speaker identification. By studying the most successful standards and understanding their principles, performance and limitations, it is possible to apply a particular technique to a given situation according to the underlying constraints - with the ultimate goal being the development of next-generation algorithms, with improvements in all aspects. This document contains own created methods to determine the beginning and end of isolated words in audio speech. To extraction of the audio features of person's speech, in this work it was applied the mechanism of cepstral speech analysis. Finally, the paper will show results of speech coding.

4

Perceptually motivated approaches to speech enhancement. Part 2, Psychoacoustic optimization of spectral weighting rules

72%

Borowicz A. , Petrovsky A. A.

|

tom Vol. 50, z. 3

395-409

EN

This paper focuses on the class of speech enhancement systems, which capitalize on psychoacoustic properties of the human ear. More advanced psychoacoustically motivated spectral weighting rules are described. Presented systems are analyzed and classified according to their similarity with a human auditory model. Especially, a comparison of improvements in musical noise cancellation and increasing speech intelligibility is performed. Moreover, advantages of the perceptual approaches over conventional ones are focused. Finally, perspectives of integrated psychoacoustically motivated speech enhancement and coding systems are discussed. Paper shows that integration of subband coder with speech enhancement system based on non-uniformly spaced filter bank leads to most promissing combined scheme.

PL

Dokonano przeglądu oraz porównania metod uzdatniania sygnału mowy motywowanych perceptualnie. Wskazano na niedoskonałość rozwiązań psychoakustycznych wykorzystujących klasyczne metody wag widmowych. Opierając się na literaturze zaprezentowano różne sposoby psychoakustycznej optymalizacji tych metod. Prezentowane systemy sklasyfikowano według stopnia zgodności z modelem słuchowym człowieka. Jednocześnie zestawiono wyniki zastosowań rozwiązań psychoakustycznych pod kątem możliwości tłumienia szumu środowiskowego i zapobiegabia zniekształceń sygnału mowy. W zestawieniu uwzględniono także połączone systemy eliminacji echa i redukcji szumów. Ostatecznie przedstawiono perspektywy integracji systemu uzdatniania sygnału mowy z systemem kodowania podpasmowego uwydatniając wykorzystanie modeli psychoakustycznych jako element wspólny obu systemów.

5

Praktyczne aspekty wykorzystywania systemów rozpoznawania mowy opartych na HMM

72%

Mietła A. , Iwaniec M.

|

2010

|

tom T. 9, nr 40

171-178

PL

W artykule poruszono problem tworzenia systemów automatycznego rozpoznawania mowy zbudowanych na bazie ukrytych modeli Markowa. Przedstawiono matematyczne podstawy HMM oraz odniesiono je do rzeczywistego problemu. Wykazano, że niezwykle istotny jest odpowiedni dobór liczby stanów oraz rozkładów w systemie. Zaprezentowano także wyniki testów stwierdzające przewagę współczynników RASTA-PLP nad MFCC oraz konieczność stosowania parametrów delta oraz delta-delta.

EN

Article discusses problems associated with automatic speech recognition systems based on Hidden Markov Model. Mathematical basis of HMM have been presented and it is shown how it can be applied to the real problem. Extremely important is the proper selection of the quantity of states and Gaussian distributions. Test results indicating the advantage of RASTA-PLP coefficients over MFCCs and necessity of using delta and delta-delta parameters are presented.