Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 17

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  MFCC
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
EN
Parkinson's disease (PD) is a progressive neurological disorder prevalent in old age. Past studies have shown that speech can be used as an early marker for identification of PD. It affects a number of speech components such as phonation, speech intensity, articulation, and respiration, which alters the speech intelligibility. Speech feature extraction and classification always have been challenging tasks due to the existence of non-stationary and discontinuity in the speech signal. In this study, empirical mode decomposition (EMD) based features are demonstrated to capture the speech characteristics. A new feature, intrinsic mode function cepstral coefficient (IMFCC) is proposed to efficiently represent the characteristics of Parkinson speech. The performances of proposed features are assessed with two different datasets: dataset-1 and dataset-2 each having 20 normal and 25 Parkinson affected peoples. From the results, it is demonstrated that the proposed intrinsic mode function cepstral coefficient feature provides superior classification accuracy in both data-sets. There is a significant increase of 10–20% in accuracy compared to the standard acoustic and Mel-frequency cepstral coefficient (MFCC) features.
EN
Spectral compression is an effective robust feature extraction technique to reduce the mismatch between training and testing data in feature domain. In this paper we propose a new MFCC feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the energies of the outputs of the mel-scaled band pass filters are compressed by different root values adjusted based on information from the back-end of speech recognition system. Using this new scheme of speech recognizer based non-uniform spectral compression (SRNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in presence of different additive noises with different SNR values on TIMIT database, as compared to the standard MFCC and features derived with cubic root spectral compression.
PL
Kompresja spektralna jest efektywną i niezawodną techniką wyodrębniania cech w celu zmniejszenia niedopasowania między danymi uczącymi i testowymi w domenie cech. W tym artykule proponujemy nową metodę wyodrębniania cech MFCC z niejednorodną kompresją spektralną do rozpoznawania mowy w hałaśliwym otoczeniu. W opisywanej metodzie, energie wyjść pasmowych filtrów skali melowej są kompresowane przez różne wartości bazowe wyznaczone na podstawie informacji z back-endu systemu rozpoznawania mowy. Stosując ten nowy schemat niejednorodnej kompresji spektralnej (SRNSC) opartej na rozpoznawaniu mowy dla współczynników cepstralnych opartych na banku filtrów o skali melowej, stwierdzono znaczną poprawę rozpoznawania w obecności różnych szumów addytywnych o różnych wartościach SNR z bazy danych TIMIT, w porównaniu do standardowego MFCC i cech wyznaczonych za pomocą pierwiastkowej kompresji spektralnej.
EN
Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
EN
One of the crucial aspects of the environmental protection is continuous monitoring of environment. Specific aspect is estimation of the bird species population. It is particularly important for bird species being in danger of extinction. Avian monitoring programs are time and money consuming actions which usually base on terrain expeditions. Certain remedy for this can be automatic acoustical avian monitoring system, described in the paper. Main components of the designed system are: digital audio recorder for bird voices acquisition, computer program automatically recognizing bird species by its signals emitted (voices or others) and object-relational database accessed via the Internet. Optional system components can be: digital camera and camcorder, bird attracting device, wireless data transmission module, power supply with solar panel, portable weather station. The system records bird voices and sends the recordings to the database. Recorded bird voices can be also provoked by the attracting device. Application of wireless data transmission module and power supply with solar panel allows long term operation of digital sound recorder in a hard accessible terrain. Recorded bird voices are analysed by the computer program and labelled with the automatically recognized bird species. Recognition accuracy of the program can be optionally enhanced by an expert system. Besides of labelled sound recordings, database can store also many other information like: photos and films accompanying recorded bird voices/ sounds, information about localization of observation/ recordings (GPS position, description of a place of an observation), information about bird features and behaviour, meteorological information, etc. Database on the base of geographical/ geological digital maps can generate actual maps of bird population (presence, number of individuals of each species). Moreover data-base can trigger alerts in case of rapidly decreasing bird population. It is also possible to obtain new knowledge about bird species with data mining methods. The paper presents collected data on observed bird species (audio recordings, photos and films) as well as results of experiments testing particular components of the automatic acoustical avian monitoring system.
EN
The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.
EN
This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
EN
This paper presents the classification of musical instruments using Mel Frequency Cepstral Coefficients (MFCC) and Higher Order Spectral features. MFCC, cepstral, temporal, spectral, and timbral features have been widely used in the task of musical instrument classification. As music sound signal is generated using non-linear dynamics, non-linearity and non-Gaussianity of the musical instruments are important features which have not been considered in the past. In this paper, hybridisation of MFCC and Higher Order Spectral (HOS) based features have been used in the task of musical instrument classification. HOS-based features have been used to provide instrument specific information such as non-Gaussianity and non-linearity of the musical instruments. The extracted features have been presented to Counter Propagation Neural Network (CPNN) to identify the instruments and their family. For experimentation, isolated sounds of 19 musical instruments have been used from McGill University Master Sample (MUMS) sound database. The proposed features show the significant improvement in the classification accuracy of the system.
8
Content available System rozpoznawania mowy z ograniczonym słownikiem
PL
Motywacją w pisanej pracy jest omówienie i porównanie popularnych algorytmów rozpoznawania mowy na różnych systemach. Zebrane informacje są przedstawione w stosunkowo krótkiej formie, bez wnikliwej analizy dowodów matematycznych, do których przedstawienia i tak potrzebne jest odniesienie się do odrębnych specjalistycznych źródeł. Omówione zostały tutaj problemy pewne związane z ASR (ang. Automatic Speech Recognition) i perspektywy na rozwiązanie ich. Na podstawie dostępnych rozwiązań stworzony został moduł aplikacji umożliwiający porównywanie zebranych nagrań pod kątem podobieństwa sygnału mowy i przedstawienie wyników w formie tabelarycznej. Stworzona biblioteka w celach prezentacyjnych została użyta do pełnej aplikacji umożliwiającej wykonywanie rozkazów na podstawie słów wypowiadanych do mikrofonu. Wyniki posłużą nie tyle za ostateczne wnioski w tematyce rozpoznawania mowy, co za wskazówki do kolejnych analiz i badań. Mimo postępów w badaniach nad ASR, nadal nie ma algorytmów o skuteczności przekraczającej 95%. Motywacją do dalszych działań może być np. społeczne wykluczenie ludzi nie mogących posługiwać się komunikacją polegającą na wzroku.
EN
Motivation of this thesis is discussion about popular ASR algorithms and comparision on various architectures. Collected results are presented in relatively short shape. It’s done without math argumentation because it could depend on complicated equations. Here are discussed some problems associated with ASR (Automatic Speech Recognition) and the prospects for a solution to their. On the basis of available solutions it was developed application module that allows comparison of collected recordings in respect of similarity of the speech signal and present the results in tabular form. For presentation purposes it has been created a library and it was used in complete application that allows execution of commands based on the words spoken to microphone. The results will be used not only for the final conclusions about ASR, what clues for further analysis and research. Despite the advances in research on ASR, still there are no algorithms for effectiveness in excess of 95%. The motivation for further actions may be, eg, the social exclusion of people who can not use the communication involving the eye
PL
Przedmiotem niniejszego artykułu jest parametryzacja sygnału mowy emocjonalnej przy użyciu współczynników preceptualnych. Dokonano porównania wydajności współczynników MFCC z współczynnikami HFCC oraz przynależnych im parametrów dynamicznych. Na podstawie bazy mowy emocjonalnej oceniono skuteczność wybranych współczynników.
EN
The following paper presents parameterization of emotional speech using perceptual coefficients. The comparison of MFCC to HFCC and adherent dynamic parameters is presented. Basing on emotional speech database efficiency of used coefficients was evaluated.
EN
The aim of this paper is to present a hybrid algorithm that combines the advantages of artificial neural networks and hidden Markov models in speech recognition for control purposes. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov models (HMM). The main part of the paper consists of a description of development and implementation of a hybrid algorithm of speech recognition using NN and HMM and presentation of verification of correctness results.
PL
Celem artykułu jest przedstawienie algorytmów hybrydowych łączących zalety sztucznych sieci neuronowych i ukrytych modeli Markowa w zastosowaniach rozpoznawania mowy dla potrzeb sterowania. W zakres opracowania wchodzi przegląd stosowanych obecnie rozwiązań, opis i analiza implementacji wybranych struktur sieci neuronowych (NN) oraz ukrytych modeli Markowa (HMM). Główną część artykułu stanowi opis opracowywania hybrydowego algorytmu rozpoznawania mowy wykorzystującego NN i HMM oraz prezentacja wyników weryfikacji poprawności działania.
EN
The aim of this study was to assess the applicability of Mel Frequency Cepstral Coefficients (MFCC) of voice samples in diagnosing vocal nodules and polyps. Patients’ voice samples were analysed acoustically with the measurement of MFCC and values of the first three formants. Classification of mel coefficients was performed by applying the Sammon Mapping and Support Vector Machines. For the tests conducted on 95 patients, voice disorders were detected with accuracy reaching approx. 80%.
PL
Celem niniejszej pracy była ocena możliwości zastosowania analizy tzw. współczynników cepstralnych (ang. Mel Cepstral Coefficients (MFCC)) dla próbek rejestrowanego głosu pacjentów we wspomaganiu diagnozy guzów i polipów. Rejestracje mowy pacjentów poddane zostały analizie akustycznej, w której zastosowano parametry MFCC oraz wartości trzech pierwszych formantów. Do klasyfikacji współczynników cepstralnych zastosowano odwzorowanie Sammona oraz tzw. Maszynę Wektorów Nośnych. W testach wykonanych dla 95 rejestracji mowy pacjentów, zaburzenia głosu zostały wykryte z ok. 80% dokładnością.
12
Content available remote Analysis of differences between MFCC after multiple GSM transcodings
EN
This paper presents results of studies on the effects of multiple speech transcoding operations in the case of GSM standard with 8 kSps and 16 kSps sampling rate. Differences between the MFCC coefficients obtained by successive transcoding were considered. The aim of comparisons is to check the possibility for separation and detection of the used GSM encoder. During the research we used the TIMIT database recordings, transcoded four times by GSM codecs. A possibility of encoder type detection was analyzed based on differences between the curvilinear approximations of the MFCC coefficient errors.
PL
Artykuł prezentuje rezultaty badań nad wpływem wielokrotnego transkodowania sygnału audio próbkowanego z szybkością 8 kSps dla standardu GSM, oraz 16 kSps. Przeanalizowane zostały uzyskane różnice między współczynnikami MFCC, otrzymane w wyniku kolejnych transkodowań. Głównym celem porównania jest sprawdzenie możliwości separacji danych oraz detekcji wykorzystywanego w transmisji kodera GSM. Do eksperymentu wykorzystana została baza nagrań sygnału mowy TIMIT, transkodowana czterokrotnie przez kodery GSM. Przeanalizowane zostały możliwości detekcji typu kodera na podstawie różnic między aproksymatami krzywoliniowymi błędów współczynników MFCC. (Analiza wpływu wielokrotnego transkodowania GSM na różnice między współczynnikami MFCC).
EN
In this paper a system for speaker recognition and respective experiments based on telephone speech signal quality are presented and reported. First, the speech signals are transmitted using regular GSM or analog telephone systems. The recorded signals are used as input for the Gaussian mixture model based speaker recognition system. The results suggest that the parameters of MFCC extraction should be tailored to the signal quality.
PL
Artykuł prezentuje eksperymenty z systemem rozpoznawania mówcy działającym na sygnale mowy o jakości telefonicznej. Najpierw sygnał mowy został przetransmitowany przez rzeczywisty kanał telefoniczny zawierający zarówno kodek GSM jak i standard analogowy. Tak uzyskany sygnał został zapisany i wykorzystany do testowania rozpoznawania mówcy opartego na modelu liniowych mieszanin Gaussowskich. Uzyskane wyniki wskazują, że parametry obliczania współczynników MFCC powinny być dopasowane do jakości sygnału.
14
Content available remote Automatyczne znakowanie danych audio na platformie serwera baz danych Oracle
PL
W rozdziale tym przedstawiono projekt systemu automatycznego etykietowania nagrań dźwiękowych. System oparto na algorytmach nieliniowej transformacji czasu DTW, operującej na współczynnikach mel-cepstralnych i human-cepstralnych. Mechanizm automatycznego etykietowania korzystać będzie z w pełni konfigurowalnej, referencyjnej bazy nagrań oraz mapowań znaczników. Finalnie przestawione zostały testy potwierdzające wysoką jakość zaproponowanych algorytmów.
EN
In this chapter you will be provided with description of automated audio tagging system. The system will be based on optimized Dynamic Time Warping algorithm, mel-cepstral coefficients MFCC and human-cepstral coefficients HFCC. In addition the tagging process will be based on fully configurable reference audio database with mapping tags. Introduced tests results of proposed algorithms confirm their high-quality.
EN
The paper presents method of diagnostics of imminent failure conditions of synchronous motor. This method is based on a study of acoustic signals generated by synchronous motor. Sound recognition system is based on data processing algorithms, such as MFCC and GSDM. Software to recognize the sounds of synchronous motor was implemented. The studies were carried out for four imminent failure conditions of synchronous motor. The results confirm that the system can be useful for detecting damage and protect the motors.
EN
The article presents two methods of determination of cepstral parameters commonly applied in digital signal processing, in particular in speech recognition systems. The solutions presented are part of a project aimed at developing applications allowing to control the Windows operating system with voice and the use of MSAA (Microsoft Active Accessibility). The analysed voice signal has been visually presented at each of the crucial stages of developing cepstral coefficients.
17
EN
The results of intra- and interspeaker distances between MFCC vectors obtained from speech samples of eight well-known Polish personalities and their imitations performed by cabaret entertainers are presented and discussed. The intraspeaker distances between MFCC vectors representing normal and disguised speech samples of 10 speakers are also presented. The analysis of the measurement results indicated that, utilizing Euclidean distance between MFCC vectors, it is possible to differentiate the original speakers from the imitators. On the other hand, the MFCC vectors cannot be used to confirm the speaker's identity in the case of voice disguise.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.