Wyniki wyszukiwania - BazTech

1

Coding effects on changes in formant frequencies in Japanese speech signals

Kucharski Mateusz, Brachmański Stefan

Vibrations in Physical Systems

|

2019

|

Vol. 30, nr 1

art. no. 2019131

EN

This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.

2

Support Region Determination of the Quasilogarithmic Quantizer for LAPLACIAN source

Aleksić D., Perić Z., Nikolić J.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 7a

130-132

EN

This paper proposes the method for the support region determination of the quasilogarithmic quantizer designed for the Laplacian source and an arbitrary variance. The method is based on the minimal distortion criteria and it implies some approximations yielding the asymptotic formula for the optimal support region threshold of the quasilogarithmic quantizer. This formula shows which parameters influence the support region threshold of the quasilogarithmic quantizer which is of great importance for practical implementation of the considered quantizer in the high-quality quantization of signals, which, as well as speech signals, have statistics modeled by the Laplacian probability density function.

PL

W artykule zaproponowano metodę określania region pomocniczego (support region) w quasi logarytmicznym kwantyzerze zaprojektowanym do źródeł typu Laplasian o dowolnej wariancji. Kwantyzer może być zastosowany między innymi do kodowania mowy w systemach komunikacyjnych.

3

Analysis of differences between MFCC after multiple GSM transcodings

Weychan R., Marciniak T.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 6

24-29

EN

This paper presents results of studies on the effects of multiple speech transcoding operations in the case of GSM standard with 8 kSps and 16 kSps sampling rate. Differences between the MFCC coefficients obtained by successive transcoding were considered. The aim of comparisons is to check the possibility for separation and detection of the used GSM encoder. During the research we used the TIMIT database recordings, transcoded four times by GSM codecs. A possibility of encoder type detection was analyzed based on differences between the curvilinear approximations of the MFCC coefficient errors.

PL

Artykuł prezentuje rezultaty badań nad wpływem wielokrotnego transkodowania sygnału audio próbkowanego z szybkością 8 kSps dla standardu GSM, oraz 16 kSps. Przeanalizowane zostały uzyskane różnice między współczynnikami MFCC, otrzymane w wyniku kolejnych transkodowań. Głównym celem porównania jest sprawdzenie możliwości separacji danych oraz detekcji wykorzystywanego w transmisji kodera GSM. Do eksperymentu wykorzystana została baza nagrań sygnału mowy TIMIT, transkodowana czterokrotnie przez kodery GSM. Przeanalizowane zostały możliwości detekcji typu kodera na podstawie różnic między aproksymatami krzywoliniowymi błędów współczynników MFCC. (Analiza wpływu wielokrotnego transkodowania GSM na różnice między współczynnikami MFCC).

4

Efficient coding of LSP parameters using Compressed Sensing on Approximate KLT Domain

Xiao Q., Chen L., Zhu T., Wang Y.

Przegląd Elektrotechniczny

|

2011

|

R. 87, nr 7

230-234

EN

An efficient LSP parameters quantization scheme is proposed using the compressed sensing (CS). The LSP parameters extracted from consecutive speech frames are compressed by CS on the approximate KLT domain to produce a measurement vector, which is quantized using the split vector quantizer. Then, from the quantized measurements, the original LSP parameters are reconstructed by the orthogonal matching pursuit method. Experiments show that the scheme can obtain "transparent quality" at 5 bits/frame with drastic bits reduction compared to other methods.

PL

Zaproponowano kwantyzację parametru LSP (Linear prediction coefficient) przy użyciu metody compressed sensing CS. Oryginalna wartośc LSP może być zrekonstruowana przy zastosowaniu metody ortogonalnego dopasowania. Uzyskano dobrą jakość ramki 5 bitów/ramka ze znacząca redukcją bitów w porównaniu z innymi metodami.

5

Praktyczne aspekty wykorzystywania systemów rozpoznawania mowy opartych na HMM

Mietła A., Iwaniec M.

Modelowanie Inżynierskie

|

2010

|

T. 9, nr 40

171-178

PL

W artykule poruszono problem tworzenia systemów automatycznego rozpoznawania mowy zbudowanych na bazie ukrytych modeli Markowa. Przedstawiono matematyczne podstawy HMM oraz odniesiono je do rzeczywistego problemu. Wykazano, że niezwykle istotny jest odpowiedni dobór liczby stanów oraz rozkładów w systemie. Zaprezentowano także wyniki testów stwierdzające przewagę współczynników RASTA-PLP nad MFCC oraz konieczność stosowania parametrów delta oraz delta-delta.

EN

Article discusses problems associated with automatic speech recognition systems based on Hidden Markov Model. Mathematical basis of HMM have been presented and it is shown how it can be applied to the real problem. Extremely important is the proper selection of the quantity of states and Gaussian distributions. Test results indicating the advantage of RASTA-PLP coefficients over MFCCs and necessity of using delta and delta-delta parameters are presented.

6

The use of speech recognition and user verification in closed-circuit television systems

Kubanek M.

Elektronika : konstrukcje, technologie, zastosowania

|

2009

|

Vol. 50, nr 11

65-68

EN

Speech recognition systems, and the verification of persons on the basis of independent speech are widely used. In the speech recognition systems, we need to know what said examined person. These recognized words we can used to controlling any devices, which is controlled by computer. In a speaker verification, the prime interest is not in recognizing the words but determining who is speaking the words. In systems of speaker verification, a test of signal from an known speaker, user gives his own login, is compared to all known speaker signals in the set. If the vocal is the same and the user login is the same, the system accepts the user. In this work, it was proposed use speech recognition method to control the movement of the camera of closed-circuit television system, and use user verification method to log on to this system. Extraction of the audio features of person's speech is done using modified mechanism of cepstral speech analysis. Speech recognition is done using hidden Markov models. The main aim of this work, excepting the practical implementation of both methods, is show, how to modify the MFCC for speech recognition and user verification.

PL

Systemy rozpoznawania mowy i weryfikacji osób na podstawie mowy niezależnej są coraz częściej powszechnie używane. W systemach rozpoznawania mowy musimy wiedzieć, co zostało wypowiedziane przez testowaną osobę. Takie rozpoznane słowa można stosować do sterowania różnymi urządzeniami, kontrolowanymi przez komputer. W przypadku weryfikacji tożsamości nie jest ważne, co zostało wypowiedziane, ale kto to wypowiedział. W systemach weryfikacji tożsamości, gdzie każdy z zarejestrowanych użytkowników posiada swój własny unikalny login, zarejestrowana wypowiedź weryfikowanego użytkownika jest porównywana z wszystkimi wypowiedziami z bazy. Jeśli login się zgadza i charakterystyki głosowe są zgodne, wówczas system akceptuje weryfikowaną osobę. W artykule zaproponowano rozpoznawanie mowy do sterowania ruchem kamery przemysłowej, oraz weryfikację użytkownika na podstawie mowy niezależnej do logowania do systemu. Do ekstrakcji i kodowania charakterystyk głosowych zastosowano analizę cepstralną mowy. Jako aparat rozpoznający przyjęto ukryte modele Markowa. Głównym zadaniem tej pracy - oprócz oczywiście praktycznej implementacji opisanych metod - jest pokazanie, w jaki sposób należy zmodyfikować mechanizm analizy cepstralnej na potrzeby rozpoznawania mowy, a w jaki sposób na potrzeby weryfikacji tożsamości na podstawie mowy niezależnej.

7

Cyfrowa fonia INMARSATU-M

Czajkowski J.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2008

|

R. 81, nr 2-3

69-72

PL

Przedstawiono założenia systemowe i charakterystykę techniczno-eksploatacyjną standardu INMARSAT- M przeznaczoną do realizacji satelitarnej cyfrowej łączności radiotelefonicznej pomiędzy publicznymi sieciami telefonicznymi oraz użytkownikami stacji ruchomych, zarówno na lądzie jak i na morzu. Omówiono ogólną strukturę systemu skupiając się na uwarunkowaniach ruchomych terminali naziemnych (MES), stacji lądowych (LES) oraz stacji koordynacyjnej (NCS). Opisano usługi telekomunikacyjne, których realizację umożliwia standard M, a więc telefonię dupleksową i simpleksową wykorzystującą cyfrowe kodowanie mowy, realizację łączności w niebezpieczeństwie (dotyczy tylko morskich terminali ruchomych, a także opcjonalnie transmisji danych).

EN

The article presents the system assumption and technical, and explo-table characteristics of standard INMARSAT-M. The system is devoted to carry out simplex and duplex telephony based on digital voice coding, beetwen public telephone network and mobile stations. The general structure of the system was describes and the characteristics of Mobile Earth Station (MES), Land Earth station (LES) and Network Coordination Station NCS).

8

Analysis of signal of audio speech in process of speech recognition

Kubanek M.

Computing, Multimedia and Intelligent Techniques

|

2006

|

Vol. 2, nr 1

55-64

EN

The purpose of this work is to explain the theoretical issues and implementational techniques related to the fascinating field of speech recognition. The topic of discussion are focused on some of the well-established and widely used speech coding standards, required to speech recognition and speaker identification. By studying the most successful standards and understanding their principles, performance and limitations, it is possible to apply a particular technique to a given situation according to the underlying constraints - with the ultimate goal being the development of next-generation algorithms, with improvements in all aspects. This document contains own created methods to determine the beginning and end of isolated words in audio speech. To extraction of the audio features of person's speech, in this work it was applied the mechanism of cepstral speech analysis. Finally, the paper will show results of speech coding.

9

Perceptually motivated approaches to speech enhancement. Part 2, Psychoacoustic optimization of spectral weighting rules

Borowicz A., Petrovsky A. A.

Kwartalnik Elektroniki i Telekomunikacji

|

2004

|

Vol. 50, z. 3

395-409

EN

This paper focuses on the class of speech enhancement systems, which capitalize on psychoacoustic properties of the human ear. More advanced psychoacoustically motivated spectral weighting rules are described. Presented systems are analyzed and classified according to their similarity with a human auditory model. Especially, a comparison of improvements in musical noise cancellation and increasing speech intelligibility is performed. Moreover, advantages of the perceptual approaches over conventional ones are focused. Finally, perspectives of integrated psychoacoustically motivated speech enhancement and coding systems are discussed. Paper shows that integration of subband coder with speech enhancement system based on non-uniformly spaced filter bank leads to most promissing combined scheme.

PL

Dokonano przeglądu oraz porównania metod uzdatniania sygnału mowy motywowanych perceptualnie. Wskazano na niedoskonałość rozwiązań psychoakustycznych wykorzystujących klasyczne metody wag widmowych. Opierając się na literaturze zaprezentowano różne sposoby psychoakustycznej optymalizacji tych metod. Prezentowane systemy sklasyfikowano według stopnia zgodności z modelem słuchowym człowieka. Jednocześnie zestawiono wyniki zastosowań rozwiązań psychoakustycznych pod kątem możliwości tłumienia szumu środowiskowego i zapobiegabia zniekształceń sygnału mowy. W zestawieniu uwzględniono także połączone systemy eliminacji echa i redukcji szumów. Ostatecznie przedstawiono perspektywy integracji systemu uzdatniania sygnału mowy z systemem kodowania podpasmowego uwydatniając wykorzystanie modeli psychoakustycznych jako element wspólny obu systemów.

10

Kodowanie i transmisja mowy w systemach telefonii komórkowej trzeciej generacji

Knapek B.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2003

|

nr 5

233-238

PL

Opisano zagadnienia związane z kodowaniem oraz transmisją sygnału mowy w systemie komórkowym trzeciej generacji UMTS. Przedstawiono mechanizmy odpowiedzialne za zapewnienie jak najlepszej jakości połączenia. Omówiono wybrane aspekty jakości połączeń głosowych.

EN

This article describes issues connected with coding and transmission of speech signals in 3rd generation cellular system - UMTS. Mechanisms improving quality are shown. Some quality-oriented aspects are taken into account.

11

Transkoder STF 2000 dla systemów komórkowych GSM

Riggs V., Fairfield R., Segura J.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

1998

|

nr 8

579-586

PL

Opisano system GSM, jego cechy, usługi, a także architekturę. Przedstawiono opis transkodera STF 2000 opracowanego przez firmę Lucent Technologies. Transkoder STF 2000 koduje i dekoduje sygnały, a także dostosowuje przepływności binarne. Jego modułowa struktura pozwala na stopniową rozbudowę i dopasowanie do zmieniających się wymagań. Transkoder ten może przetwarzać sygnał mowy kodowany z tzw. pełną przepływnością binarną oraz kodowany połówkowo.

EN

This paper describes the GSM cellular system, its features and services, and its system architecture. It also details the speech transcooding frame STF 2000, one component of Lucent Technologies GSM system. The STF 2000, which provides speech encoding and decoding and data rate adaptation, is cost effective, with a flexible, modular architecture. It has the capability for "half-rate" speech coding.