Wyniki wyszukiwania - BazTech

1

Coding effects on changes in formant frequencies in Japanese speech signals

Kucharski Mateusz, Brachmański Stefan

Vibrations in Physical Systems

|

2019

|

Vol. 30, nr 1

art. no. 2019131

EN

This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.

2

A Wide Band Speech Coding Technique using Low Delay Code Excited Linear Predictive Algorithm (LD-CELP)

Joshi Swati, Purohit Hemant

Annals of Computer Science and Information Systems

|

2017

|

Vol. 10

45--48

EN

A fair level of speech quality is desired in speech transmission for mobile voice services. The effective utilization of bandwidth and higher bit rate is must for a best quality speech coder. But at a time the both requirements are not fulfilled in desired format. The research is ongoing in the area of designing speech coder's. In general the CELP is an algorithm to design a good quality speech coder. From 80's to present the advancement in this technique is going on. In this paper a wide band speech coding technique is proposed using LD-CELP algorithm. The overall performance of LD-CELP (16Kbps) is summarized and computed on MATLAB version R2016a with parameters MSE and SNR. In conclusion we observe that SNR for LD-CELP is not much better and enhancement in this is necessary.

3

Efficient coding of LSP parameters using Compressed Sensing on Approximate KLT Domain

Xiao Q., Chen L., Zhu T., Wang Y.

Przegląd Elektrotechniczny

|

2011

|

R. 87, nr 7

230-234

EN

An efficient LSP parameters quantization scheme is proposed using the compressed sensing (CS). The LSP parameters extracted from consecutive speech frames are compressed by CS on the approximate KLT domain to produce a measurement vector, which is quantized using the split vector quantizer. Then, from the quantized measurements, the original LSP parameters are reconstructed by the orthogonal matching pursuit method. Experiments show that the scheme can obtain "transparent quality" at 5 bits/frame with drastic bits reduction compared to other methods.

PL

Zaproponowano kwantyzację parametru LSP (Linear prediction coefficient) przy użyciu metody compressed sensing CS. Oryginalna wartośc LSP może być zrekonstruowana przy zastosowaniu metody ortogonalnego dopasowania. Uzyskano dobrą jakość ramki 5 bitów/ramka ze znacząca redukcją bitów w porównaniu z innymi metodami.

4

Praktyczne aspekty wykorzystywania systemów rozpoznawania mowy opartych na HMM

Mietła A., Iwaniec M.

Modelowanie Inżynierskie

|

2010

|

T. 9, nr 40

171-178

PL

W artykule poruszono problem tworzenia systemów automatycznego rozpoznawania mowy zbudowanych na bazie ukrytych modeli Markowa. Przedstawiono matematyczne podstawy HMM oraz odniesiono je do rzeczywistego problemu. Wykazano, że niezwykle istotny jest odpowiedni dobór liczby stanów oraz rozkładów w systemie. Zaprezentowano także wyniki testów stwierdzające przewagę współczynników RASTA-PLP nad MFCC oraz konieczność stosowania parametrów delta oraz delta-delta.

EN

Article discusses problems associated with automatic speech recognition systems based on Hidden Markov Model. Mathematical basis of HMM have been presented and it is shown how it can be applied to the real problem. Extremely important is the proper selection of the quantity of states and Gaussian distributions. Test results indicating the advantage of RASTA-PLP coefficients over MFCCs and necessity of using delta and delta-delta parameters are presented.

5

The use of speech recognition and user verification in closed-circuit television systems

Kubanek M.

Elektronika : konstrukcje, technologie, zastosowania

|

2009

|

Vol. 50, nr 11

65-68

EN

Speech recognition systems, and the verification of persons on the basis of independent speech are widely used. In the speech recognition systems, we need to know what said examined person. These recognized words we can used to controlling any devices, which is controlled by computer. In a speaker verification, the prime interest is not in recognizing the words but determining who is speaking the words. In systems of speaker verification, a test of signal from an known speaker, user gives his own login, is compared to all known speaker signals in the set. If the vocal is the same and the user login is the same, the system accepts the user. In this work, it was proposed use speech recognition method to control the movement of the camera of closed-circuit television system, and use user verification method to log on to this system. Extraction of the audio features of person's speech is done using modified mechanism of cepstral speech analysis. Speech recognition is done using hidden Markov models. The main aim of this work, excepting the practical implementation of both methods, is show, how to modify the MFCC for speech recognition and user verification.

PL

Systemy rozpoznawania mowy i weryfikacji osób na podstawie mowy niezależnej są coraz częściej powszechnie używane. W systemach rozpoznawania mowy musimy wiedzieć, co zostało wypowiedziane przez testowaną osobę. Takie rozpoznane słowa można stosować do sterowania różnymi urządzeniami, kontrolowanymi przez komputer. W przypadku weryfikacji tożsamości nie jest ważne, co zostało wypowiedziane, ale kto to wypowiedział. W systemach weryfikacji tożsamości, gdzie każdy z zarejestrowanych użytkowników posiada swój własny unikalny login, zarejestrowana wypowiedź weryfikowanego użytkownika jest porównywana z wszystkimi wypowiedziami z bazy. Jeśli login się zgadza i charakterystyki głosowe są zgodne, wówczas system akceptuje weryfikowaną osobę. W artykule zaproponowano rozpoznawanie mowy do sterowania ruchem kamery przemysłowej, oraz weryfikację użytkownika na podstawie mowy niezależnej do logowania do systemu. Do ekstrakcji i kodowania charakterystyk głosowych zastosowano analizę cepstralną mowy. Jako aparat rozpoznający przyjęto ukryte modele Markowa. Głównym zadaniem tej pracy - oprócz oczywiście praktycznej implementacji opisanych metod - jest pokazanie, w jaki sposób należy zmodyfikować mechanizm analizy cepstralnej na potrzeby rozpoznawania mowy, a w jaki sposób na potrzeby weryfikacji tożsamości na podstawie mowy niezależnej.

6

A hybrid method of person verification with use independent speech and facial asymmetry

Kubanek M., Rydzek S.

Metody Informatyki Stosowanej

|

2008

|

nr 4 (Tom 17)

91--99

EN

In a person identification or verification, the prime interest is not in recognizing the words but determining who is speaking the words. In systems of person identification, a test of signal from an unknown speaker is compared to all known speaker signals in the set. The signal that has the maximum probability is identified as the unknown speaker. In security systems based on person identification and verification, faultless identification has huge meaning for safety. In systems of person verification, a test of signal from a known speaker is compared to recorded signals in the set, connected with a known tested persons label. There are more than one recorded signals for every user in the set. In aim of increasing safety, in this work it was proposed own approach to person verification, based on independent speech and facial asymmetry. Extraction of the audio features of person's speech is done using mechanism of cepstral speech analysis. The idea of improvement of effectiveness of face recognition technique was based on processing information regarding face asymmetry in the most informative parts of the face the eyes region.

7

Analysis of signal of audio speech in process of speech recognition

Kubanek M.

Computing, Multimedia and Intelligent Techniques

|

2006

|

Vol. 2, nr 1

55-64

EN

The purpose of this work is to explain the theoretical issues and implementational techniques related to the fascinating field of speech recognition. The topic of discussion are focused on some of the well-established and widely used speech coding standards, required to speech recognition and speaker identification. By studying the most successful standards and understanding their principles, performance and limitations, it is possible to apply a particular technique to a given situation according to the underlying constraints - with the ultimate goal being the development of next-generation algorithms, with improvements in all aspects. This document contains own created methods to determine the beginning and end of isolated words in audio speech. To extraction of the audio features of person's speech, in this work it was applied the mechanism of cepstral speech analysis. Finally, the paper will show results of speech coding.

8

Perceptually motivated approaches to speech enhancement. Part 2, Psychoacoustic optimization of spectral weighting rules

Borowicz A., Petrovsky A. A.

Kwartalnik Elektroniki i Telekomunikacji

|

2004

|

Vol. 50, z. 3

395-409

EN

This paper focuses on the class of speech enhancement systems, which capitalize on psychoacoustic properties of the human ear. More advanced psychoacoustically motivated spectral weighting rules are described. Presented systems are analyzed and classified according to their similarity with a human auditory model. Especially, a comparison of improvements in musical noise cancellation and increasing speech intelligibility is performed. Moreover, advantages of the perceptual approaches over conventional ones are focused. Finally, perspectives of integrated psychoacoustically motivated speech enhancement and coding systems are discussed. Paper shows that integration of subband coder with speech enhancement system based on non-uniformly spaced filter bank leads to most promissing combined scheme.

PL

Dokonano przeglądu oraz porównania metod uzdatniania sygnału mowy motywowanych perceptualnie. Wskazano na niedoskonałość rozwiązań psychoakustycznych wykorzystujących klasyczne metody wag widmowych. Opierając się na literaturze zaprezentowano różne sposoby psychoakustycznej optymalizacji tych metod. Prezentowane systemy sklasyfikowano według stopnia zgodności z modelem słuchowym człowieka. Jednocześnie zestawiono wyniki zastosowań rozwiązań psychoakustycznych pod kątem możliwości tłumienia szumu środowiskowego i zapobiegabia zniekształceń sygnału mowy. W zestawieniu uwzględniono także połączone systemy eliminacji echa i redukcji szumów. Ostatecznie przedstawiono perspektywy integracji systemu uzdatniania sygnału mowy z systemem kodowania podpasmowego uwydatniając wykorzystanie modeli psychoakustycznych jako element wspólny obu systemów.

9

Kodowanie i transmisja mowy w systemach telefonii komórkowej trzeciej generacji

Knapek B.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2003

|

nr 5

233-238

PL

Opisano zagadnienia związane z kodowaniem oraz transmisją sygnału mowy w systemie komórkowym trzeciej generacji UMTS. Przedstawiono mechanizmy odpowiedzialne za zapewnienie jak najlepszej jakości połączenia. Omówiono wybrane aspekty jakości połączeń głosowych.

EN

This article describes issues connected with coding and transmission of speech signals in 3rd generation cellular system - UMTS. Mechanisms improving quality are shown. Some quality-oriented aspects are taken into account.

10

Transkoder STF 2000 dla systemów komórkowych GSM

Riggs V., Fairfield R., Segura J.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

1998

|

nr 8

579-586

PL

Opisano system GSM, jego cechy, usługi, a także architekturę. Przedstawiono opis transkodera STF 2000 opracowanego przez firmę Lucent Technologies. Transkoder STF 2000 koduje i dekoduje sygnały, a także dostosowuje przepływności binarne. Jego modułowa struktura pozwala na stopniową rozbudowę i dopasowanie do zmieniających się wymagań. Transkoder ten może przetwarzać sygnał mowy kodowany z tzw. pełną przepływnością binarną oraz kodowany połówkowo.

EN

This paper describes the GSM cellular system, its features and services, and its system architecture. It also details the speech transcooding frame STF 2000, one component of Lucent Technologies GSM system. The STF 2000, which provides speech encoding and decoding and data rate adaptation, is cost effective, with a flexible, modular architecture. It has the capability for "half-rate" speech coding.