Wyniki wyszukiwania - BazTech

1

Classification of Parkinson’s disease and other neurological disorders using voice features extraction and reduction techniques

Majdoubi Oumaima, Benba Achraf, Hammouch Ahmed

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2023

|

T. 13, nr 3

16--22

EN

This study aimed to differentiate individuals with Parkinson's disease (PD) from those with other neurological disorders (ND) by analyzing voice samples, considering the association between voice disorders and PD. Voice samples were collected from 76 participants using different recording devices and conditions, with participants instructed to sustain the vowel /a/ comfortably. PRAAT software was employed to extract features including autocorrelation (AC), cross-correlation (CC), and Mel frequency cepstral coefficients (MFCC) from the voice samples. Principal component analysis (PCA) was utilized to reduce the dimensionality of the features. Classification Tree (CT), Logistic Regression, Naive Bayes (NB), Support Vector Machines (SVM), and Ensemble methods were employed as supervised machine learning techniques for classification. Each method provided distinct strengths and characteristics, facilitating a comprehensive evaluation of their effectiveness in distinguishing PD patients from individuals with other neurological disorders. The Naive Bayes kernel, using seven PCA-derived components, achieved the highest accuracy rate of 86.84% among the tested classification methods. It is worth noting that classifier performance may vary based on the dataset and specific characteristics of the voice samples. In conclusion, this study demonstrated the potential of voice analysis as a diagnostic tool for distinguishing PD patients from individuals with other neurological disorders. By employing a variety of voice analysis techniques and utilizing different machine learning algorithms, including Classification Tree, Logistic Regression, Naive Bayes, Support Vector Machines, and Ensemble methods, a notable accuracy rate was attained. However, further research and validation using larger datasets are required to consolidate and generalize these findings for future clinical applications.

PL

Przedstawione badanie miało na celu różnicowanie osób z chorobą Parkinsona (PD) od osób z innymi zaburzeniami neurologicznymi poprzez analizę próbek głosowych, biorąc pod uwagę związek między zaburzeniami głosu a PD. Próbki głosowe zostały zebrane od 76 uczestników przy użyciu różnych urządzeń i warunków nagrywania, a uczestnicy byli instruowani, aby wydłużyć samogłoskę /a/ w wygodnym tempie. Oprogramowanie PRAAT zostało zastosowane do ekstrakcji cech, takich jak autokorelacja (AC), krzyżowa korelacja (CC) i współczynniki cepstralne Mel (MFCC) z próbek głosowych. Analiza składowych głównych (PCA) została wykorzystana w celu zmniejszenia wymiarowości cech. Jako techniki nadzorowanego uczenia maszynowego wykorzystano drzewa decyzyjne (CT), regresję logistyczną, naiwny klasyfikator Bayesa (NB), maszyny wektorów nośnych (SVM) oraz metody zespołowe. Każda z tych metod posiadała swoje unikalne mocne strony i charakterystyki, umożliwiając kompleksową ocenę ich skuteczności w rozróżnianiu pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Naiwny klasyfikator Bayesa, wykorzystujący siedem składowych PCA, osiągnął najwyższy wskaźnik dokładności na poziomie 86,84% wśród przetestowanych metod klasyfikacji. Należy jednak zauważyć, że wydajność klasyfikatora może się różnić w zależności od zbioru danych i konkretnych cech próbek głosowych. Podsumowując, to badanie wykazało potencjał analizy głosu jako narzędzia diagnostycznego do rozróżniania pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Poprzez zastosowanie różnych technik analizy głosu i wykorzystanie różnych algorytmów uczenia maszynowego, takich jak drzewa decyzyjne, regresja logistyczna, naiwny klasyfikator Bayesa, maszyny wektorów nośnych i metody zespołowe, osiągnięto znaczący poziom dokładności. Niemniej jednak, konieczne są dalsze badania i walidacja na większych zbiorach danych w celu skonsolidowania i uogólnienia tych wyników dla przyszłych zastosowań klinicznych.

2

A novel Parkinson's disease detection algorithm combined EMD, BFCC, and SVM classifier

Boualoulou Nouhaila, Mounia Miyara, Nsiri Benayad, Behoussine Drissi Taoufiq

Diagnostyka

|

2023

|

Vol. 24, No. 4

art. no. 2023404

EN

Identifying and assessing Parkinson's disease in its early stages is critical to effectively monitoring the disease's progression. Methodologies based on machine learning enhanced speech analysis are gaining popularity as the potential of this field is revealed. Acoustic features, in particular, are used in a variety of algorithms for machine learning and could serve as indicators of the general health of subjects' voices. In this research paper, a novel method is introduced for the automated detection of Parkinson's disease through speech signal analysis, a support vector machines classifier (SVM) and an Artificial Neural Network (ANN) are used to evaluate and classify the data based on two acoustic features: Bark Frequency Cepstral Coefficients (BFCC) and Mel Frequency Cepstral Coefficients (MFCC). These features are extracted from the denoised signals using Empirical Mode Decomposition (EMD). The most relevant results obtained for a dataset of 38 participants are by the BFCC coefficients with an accuracy up to 92.10%. These results confirm that EMD-BFCC-SVM method can contribute to the detection of Parkinson's disease.

3

CNN and LSTM for the classification of parkinson's disease based on the GTCC and MFCC

Boualoulou Nouhaila, Drissi Taoufiq Belhoussine, Nsiri Benayad

Applied Computer Science

|

2023

|

Vol. 19, no 2

1--24

EN

Parkinson's disease is a recognizable clinical syndrome with a variety of causes and clinical presentations; it represents a rapidly growing neurodegenerative disorder. Since about 90 percent of Parkinson's disease sufferers have some form of early speech impairment, recent studies on tele diagnosis of Parkinson's disease have focused on the recognition of voice impairments from vowel phonations or the subjects' discourse. This paper presents a new approach for Parkinson's disease detection from speech sounds that are based on CNN and LSTM and uses two categories of characteristics. These are Mel Frequency Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GTCC) obtained from noise-removed speech signals with comparative EMD-DWT and DWT-EMD analysis. The proposed model is divided into three stages. In the first step, noise is removed from the signals using the EMD-DWT and DWT-EMD methods. In the second step, the GTCC and MFCC are extracted from the enhanced audio signals. The classification process is carried out in the third step by feeding these features into the LSTM and CNN models, which are designed to define sequential information from the extracted features. The experiments are performed using PC-GITA and Sakar datasets and 10-fold cross validation method, the highest classification accuracy for the Sakar dataset reached 100% for both EMD-DWT-GTCC-CNN and DWT-EMD-GTCC-CNN, and for the PC-GITA dataset, the accuracy is reached 100% for EMD-DWT-GTCC-CNN and 96.55% for DWT-EMD-GTCC-CNN. The results of this study indicate that the characteristics of GTCC are more appropriate and accurate for the assessment of PD than MFCC.

4

Effect of Time-domain Windowing on Isolated Speech Recognition System Performance

Ananthakrishna Thalengala, Anitha H., Girisha T.

International Journal of Electronics and Telecommunications

|

2022

|

Vol. 68, No. 1

161--166

EN

Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.

5

A telemedicine software application for asthma severity levels identification using wheeze sounds classification

Ghulam Nabi Fizza, Sundaraj Kenneth, Iqbal Muhammad Shahid, Shafiq Muhammad, Palaniappan Rajkumar

Biocybernetics and Biomedical Engineering

|

2022

|

Vol. 42, no. 4

1236--1247

EN

Early and precise knowledge of asthma severity levels may help in effective precautions, proper medication, and follow-up planning for the patients. Keeping this in view, we propose a telemedicine application that is capable of automatically identifying the severity level of asthma patients by using machine learning techniques. Respiratory sounds of 111 asthmatic patients were collected. The 111-patient dataset consisted of 34 mild, 36 moderate, and 41 severe levels. Data was collected from two auscultation locations, i.e., from the trachea and lower lung base. The first dataset was used for the testing and training (cross-validation) of classifiers while a second database was used for the validation of the system. Mel-frequency cepstral coefficient (MFCC) features were extracted to discriminate the severity levels. Then, ensemble and k-nearest neighbor (KNN) classifiers were used for classification. This was performed on both auscultation locations jointly and individually. The developed telemedicine application, based on MFCC features and classifiers, automatically detects wheeze and classifies it into a severity level. The extracted features showed significant differences (p < 0.05) for all severity levels. Based on the testing, training, and validation results, the performance of the ensemble and KNN classifiers were comparable. MFCC-based features classification provides maximum accuracy of 99%, 90%, and 89% for mild, moderate, and severe samples, respectively. The average rate of wheeze detection was observed to be 93%. The maximum accuracy of validation of the telemedicine application was found to be 57%, 72%, and 76% for mild, moderate, and severe levels, respectively.

6

Voice pathology assessment using x-vectors approach

Kotarba Katarzyna, Kotarba Michał

Vibrations in Physical Systems

|

2021

|

Vol. 32, nr 1

art. no. 2021108

EN

Voice pathology assessment using sustained vowels has proven to be effective and reliable. However, only a few studies regarding detection of pathological speech based on continuous speech are available. In this study we evaluate the usefulness of various regression models trained on continuous speech recordings from Saarbruecken Voice Database in the detection of voice pathologies. The recordings were used for extraction of speaker embeddings called x-vectors based on mel-frequency cepstral coefficients and gammatone frequency cepstral coefficients. Since the dataset used in this study is imbalanced, various over- and undersampling techniques were applied to the training set to ensure robustness of models’ decision boundaries. The models were trained on both imbalanced and resampled training sets using 5-fold cross-validation. The best results were obtained for Multi Layer Perceptron trained on GFCC-based x-vectors, achieving accuracy of 0.8184, F1-score of 0.8212, and ROC AUC score of 0.8810 for the testing set.

7

Discrimination between patients with CVDs and healthy people by voiceprint using the MFCC and pitch

Bourouhou Abdelhamid, Jilbab Abdelilah, Cherti Mohammed, Bourouhou Zaineb, Nacir Chafik

Diagnostyka

|

2021

|

Vol. 22, No. 4

9--16

EN

Heart diseases cause many deaths around the world every year, and his death rate makes the leader of the killer diseases. But early diagnosis can be helpful to decrease those several deaths and save lives. To ensure good diagnose, people must pass a series of clinical examinations and analyses, which make the diagnostic operation expensive and not accessible for everyone. Speech analysis comes as a strong tool which can resolve the task and give back a new way to discriminate between healthy people and person with cardiovascular diseases. Our latest paper treated this task but using a dysphonia measurement to differentiate between people with cardiovascular disease and the healthy one, and we were able to reach 81.5% in prediction accuracy. This time we choose to change the method to increase the accuracy by extracting the voiceprint using 13 Mel-Frequency Cepstral Coefficients and the pitch, extracted from the people's voices provided from a database which contain 75 subjects (35 has cardiovascular diseases, 40 are healthy), three records of sustained vowels (aaaaa…, ooooo… .. and iiiiiiii….) has been collected from each one. We used the k-near-neighbor classifier to train a model and to classify the test entities. We were able to outperform the previous results, reaching 95.55% of prediction accuracy.

8

Heart Rate Detection and Classification from Speech Spectral Features Using Machine Learning

Usman Mohammed, Zubair Mohammed, Ahmad Zeeshan, Zaidi Monji, Ijyas Thafasal, Parayangat Muneer, Wajid Mohd, Shiblee Mohammad, Ali Syed Jaffar

Archives of Acoustics

|

2021

|

Vol. 46, No. 1

41--53

EN

Measurement of vital signs of the human body such as heart rate, blood pressure, body temperature and respiratory rate is an important part of diagnosing medical conditions and these are usually measured using medical equipment. In this paper, we propose to estimate an important vital sign – heart rate from speech signals using machine learning algorithms. Existing literature, observation and experience suggest the existence of a correlation between speech characteristics and physiological, psychological as well as emotional conditions. In this work, we estimate the heart rate of individuals by applying machine learning based regression algorithms to Mel frequency cepstrum coefficients, which represent speech features in the spectral domain as well as the temporal variation of spectral features. The estimated heart rate is compared with actual measurement made using a conventional medical device at the time of recording speech. We obtain estimation accuracy close to 94% between the estimated and actual measured heart rate values. Binary classification of heart rate as ‘normal’ or ‘abnormal’ is also achieved with 100% accuracy. A comparison of machine learning algorithms in terms of heart rate estimation and classification accuracy is also presented. Heart rate measurement using speech has applications in remote monitoring of patients, professional athletes and can facilitate telemedicine.

9

Parkinson disease prediction using intrinsic mode function based features from speech signal

Karan Biswajit, Sahu Sitanshu Sekhar, Mahto Kartik

Biocybernetics and Biomedical Engineering

|

2020

|

Vol. 40, no. 1

249--264

EN

Parkinson's disease (PD) is a progressive neurological disorder prevalent in old age. Past studies have shown that speech can be used as an early marker for identification of PD. It affects a number of speech components such as phonation, speech intensity, articulation, and respiration, which alters the speech intelligibility. Speech feature extraction and classification always have been challenging tasks due to the existence of non-stationary and discontinuity in the speech signal. In this study, empirical mode decomposition (EMD) based features are demonstrated to capture the speech characteristics. A new feature, intrinsic mode function cepstral coefficient (IMFCC) is proposed to efficiently represent the characteristics of Parkinson speech. The performances of proposed features are assessed with two different datasets: dataset-1 and dataset-2 each having 20 normal and 25 Parkinson affected peoples. From the results, it is demonstrated that the proposed intrinsic mode function cepstral coefficient feature provides superior classification accuracy in both data-sets. There is a significant increase of 10–20% in accuracy compared to the standard acoustic and Mel-frequency cepstral coefficient (MFCC) features.

10

Speech recognizer-based non-uniform spectral compression for robust MFCC feature extraction

Baba Ali B., Wójcik W., Mamyrbayev O., Turdalyuly M., Mekebayev N.

Przegląd Elektrotechniczny

|

2018

|

R. 94, nr 6

90--93

EN

Spectral compression is an effective robust feature extraction technique to reduce the mismatch between training and testing data in feature domain. In this paper we propose a new MFCC feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the energies of the outputs of the mel-scaled band pass filters are compressed by different root values adjusted based on information from the back-end of speech recognition system. Using this new scheme of speech recognizer based non-uniform spectral compression (SRNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in presence of different additive noises with different SNR values on TIMIT database, as compared to the standard MFCC and features derived with cubic root spectral compression.

PL

Kompresja spektralna jest efektywną i niezawodną techniką wyodrębniania cech w celu zmniejszenia niedopasowania między danymi uczącymi i testowymi w domenie cech. W tym artykule proponujemy nową metodę wyodrębniania cech MFCC z niejednorodną kompresją spektralną do rozpoznawania mowy w hałaśliwym otoczeniu. W opisywanej metodzie, energie wyjść pasmowych filtrów skali melowej są kompresowane przez różne wartości bazowe wyznaczone na podstawie informacji z back-endu systemu rozpoznawania mowy. Stosując ten nowy schemat niejednorodnej kompresji spektralnej (SRNSC) opartej na rozpoznawaniu mowy dla współczynników cepstralnych opartych na banku filtrów o skali melowej, stwierdzono znaczną poprawę rozpoznawania w obecności różnych szumów addytywnych o różnych wartościach SNR z bazy danych TIMIT, w porównaniu do standardowego MFCC i cech wyznaczonych za pomocą pierwiastkowej kompresji spektralnej.

11

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

Kaur G., Srivastava M., Kumar A.

Journal of Telecommunications and Information Technology

|

2018

|

nr 2

23--31

EN

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

12

Towards the automatic acoustical avian monitoring system

Wielgat R., Król D., Potempa T., Kozioł P., Lisowska-Lis A.

Tarnowskie Colloquia Naukowe. Nauki Techniczne i Ścisłe

|

2017

|

T. 4, nr 3

17--38

EN

One of the crucial aspects of the environmental protection is continuous monitoring of environment. Specific aspect is estimation of the bird species population. It is particularly important for bird species being in danger of extinction. Avian monitoring programs are time and money consuming actions which usually base on terrain expeditions. Certain remedy for this can be automatic acoustical avian monitoring system, described in the paper. Main components of the designed system are: digital audio recorder for bird voices acquisition, computer program automatically recognizing bird species by its signals emitted (voices or others) and object-relational database accessed via the Internet. Optional system components can be: digital camera and camcorder, bird attracting device, wireless data transmission module, power supply with solar panel, portable weather station. The system records bird voices and sends the recordings to the database. Recorded bird voices can be also provoked by the attracting device. Application of wireless data transmission module and power supply with solar panel allows long term operation of digital sound recorder in a hard accessible terrain. Recorded bird voices are analysed by the computer program and labelled with the automatically recognized bird species. Recognition accuracy of the program can be optionally enhanced by an expert system. Besides of labelled sound recordings, database can store also many other information like: photos and films accompanying recorded bird voices/ sounds, information about localization of observation/ recordings (GPS position, description of a place of an observation), information about bird features and behaviour, meteorological information, etc. Database on the base of geographical/ geological digital maps can generate actual maps of bird population (presence, number of individuals of each species). Moreover data-base can trigger alerts in case of rapidly decreasing bird population. It is also possible to obtain new knowledge about bird species with data mining methods. The paper presents collected data on observed bird species (audio recordings, photos and films) as well as results of experiments testing particular components of the automatic acoustical avian monitoring system.

13

Navigation security module with real-time voice command recognition system

Yagimli M., Kursat-Tezer H.

Polish Maritime Research

|

2017

|

nr 2

17--26

EN

The real-time voice command recognition system used for this study, aims to increase the situational awareness, therefore the safety of navigation, related especially to the close manoeuvres of warships, and the courses of commercial vessels in narrow waters. The developed system, the safety of navigation that has become especially important in precision manoeuvres, has become controllable with voice command recognition-based software. The system was observed to work with 90.6% accuracy using Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) parameters and with 85.5% accuracy using Linear Predictive Coding (LPC) and DTW parameters.

14

Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features

Bhalke D. G., Rajesh B., Bormane D. S.

Archives of Acoustics

|

2017

|

Vol. 42, No. 2

213--222

EN

This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.

15

Hybridisation of Mel Frequency Cepstral Coefficient and Higher Order Spectral Features for Musical Instruments Classification

Bhalke D. G., Rama Rao C. B., Bormane D.

Archives of Acoustics

|

2016

|

Vol. 41, No. 3

427--436

EN

This paper presents the classification of musical instruments using Mel Frequency Cepstral Coefficients (MFCC) and Higher Order Spectral features. MFCC, cepstral, temporal, spectral, and timbral features have been widely used in the task of musical instrument classification. As music sound signal is generated using non-linear dynamics, non-linearity and non-Gaussianity of the musical instruments are important features which have not been considered in the past. In this paper, hybridisation of MFCC and Higher Order Spectral (HOS) based features have been used in the task of musical instrument classification. HOS-based features have been used to provide instrument specific information such as non-Gaussianity and non-linearity of the musical instruments. The extracted features have been presented to Counter Propagation Neural Network (CPNN) to identify the instruments and their family. For experimentation, isolated sounds of 19 musical instruments have been used from McGill University Master Sample (MUMS) sound database. The proposed features show the significant improvement in the classification accuracy of the system.

16

System rozpoznawania mowy z ograniczonym słownikiem

Grabowski D., Kwiatkowska M., Świerczewski Ł.

Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka

|

2014

|

vol. 4

44--53

PL

Motywacją w pisanej pracy jest omówienie i porównanie popularnych algorytmów rozpoznawania mowy na różnych systemach. Zebrane informacje są przedstawione w stosunkowo krótkiej formie, bez wnikliwej analizy dowodów matematycznych, do których przedstawienia i tak potrzebne jest odniesienie się do odrębnych specjalistycznych źródeł. Omówione zostały tutaj problemy pewne związane z ASR (ang. Automatic Speech Recognition) i perspektywy na rozwiązanie ich. Na podstawie dostępnych rozwiązań stworzony został moduł aplikacji umożliwiający porównywanie zebranych nagrań pod kątem podobieństwa sygnału mowy i przedstawienie wyników w formie tabelarycznej. Stworzona biblioteka w celach prezentacyjnych została użyta do pełnej aplikacji umożliwiającej wykonywanie rozkazów na podstawie słów wypowiadanych do mikrofonu. Wyniki posłużą nie tyle za ostateczne wnioski w tematyce rozpoznawania mowy, co za wskazówki do kolejnych analiz i badań. Mimo postępów w badaniach nad ASR, nadal nie ma algorytmów o skuteczności przekraczającej 95%. Motywacją do dalszych działań może być np. społeczne wykluczenie ludzi nie mogących posługiwać się komunikacją polegającą na wzroku.

EN

Motivation of this thesis is discussion about popular ASR algorithms and comparision on various architectures. Collected results are presented in relatively short shape. It’s done without math argumentation because it could depend on complicated equations. Here are discussed some problems associated with ASR (Automatic Speech Recognition) and the prospects for a solution to their. On the basis of available solutions it was developed application module that allows comparison of collected recordings in respect of similarity of the speech signal and present the results in tabular form. For presentation purposes it has been created a library and it was used in complete application that allows execution of commands based on the words spoken to microphone. The results will be used not only for the final conclusions about ASR, what clues for further analysis and research. Despite the advances in research on ASR, still there are no algorithms for effectiveness in excess of 95%. The motivation for further actions may be, eg, the social exclusion of people who can not use the communication involving the eye

17

Porównanie wydajności współczynników perceptualnych na potrzeby automatycznego rozpoznawania emocji w sygnale mowy

Kamińska D., Sapiński T., Niewiadomy D., Pelikant A.

Studia Informatica

|

2013

|

Vol. 34, nr 2B

59--66

PL

Przedmiotem niniejszego artykułu jest parametryzacja sygnału mowy emocjonalnej przy użyciu współczynników preceptualnych. Dokonano porównania wydajności współczynników MFCC z współczynnikami HFCC oraz przynależnych im parametrów dynamicznych. Na podstawie bazy mowy emocjonalnej oceniono skuteczność wybranych współczynników.

EN

The following paper presents parameterization of emotional speech using perceptual coefficients. The comparison of MFCC to HFCC and adherent dynamic parameters is presented. Basing on emotional speech database efficiency of used coefficients was evaluated.

18

Hybrid of neural networks and hidden Markov models as a modern approach to speech recognition systems

Sokólski P., Rutkowski T.

Pomiary Automatyka Robotyka

|

2013

|

R. 17, nr 2

449-455

EN

The aim of this paper is to present a hybrid algorithm that combines the advantages of artificial neural networks and hidden Markov models in speech recognition for control purposes. The scope of the paper includes review of currently used solutions, description and analysis of implementation of selected artificial neural network (NN) structures and hidden Markov models (HMM). The main part of the paper consists of a description of development and implementation of a hybrid algorithm of speech recognition using NN and HMM and presentation of verification of correctness results.

PL

Celem artykułu jest przedstawienie algorytmów hybrydowych łączących zalety sztucznych sieci neuronowych i ukrytych modeli Markowa w zastosowaniach rozpoznawania mowy dla potrzeb sterowania. W zakres opracowania wchodzi przegląd stosowanych obecnie rozwiązań, opis i analiza implementacji wybranych struktur sieci neuronowych (NN) oraz ukrytych modeli Markowa (HMM). Główną część artykułu stanowi opis opracowywania hybrydowego algorytmu rozpoznawania mowy wykorzystującego NN i HMM oraz prezentacja wyników weryfikacji poprawności działania.

19

Application of Mel Cepstral Representation of Voice Recordings for Diagnosing Vocal Disorders

Grygiel J., Strumiłło P., Niebudek-Bogusz E.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 6

8-11

EN

The aim of this study was to assess the applicability of Mel Frequency Cepstral Coefficients (MFCC) of voice samples in diagnosing vocal nodules and polyps. Patients’ voice samples were analysed acoustically with the measurement of MFCC and values of the first three formants. Classification of mel coefficients was performed by applying the Sammon Mapping and Support Vector Machines. For the tests conducted on 95 patients, voice disorders were detected with accuracy reaching approx. 80%.

PL

Celem niniejszej pracy była ocena możliwości zastosowania analizy tzw. współczynników cepstralnych (ang. Mel Cepstral Coefficients (MFCC)) dla próbek rejestrowanego głosu pacjentów we wspomaganiu diagnozy guzów i polipów. Rejestracje mowy pacjentów poddane zostały analizie akustycznej, w której zastosowano parametry MFCC oraz wartości trzech pierwszych formantów. Do klasyfikacji współczynników cepstralnych zastosowano odwzorowanie Sammona oraz tzw. Maszynę Wektorów Nośnych. W testach wykonanych dla 95 rejestracji mowy pacjentów, zaburzenia głosu zostały wykryte z ok. 80% dokładnością.

20

Analysis of differences between MFCC after multiple GSM transcodings

Weychan R., Marciniak T.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 6

24-29

EN

This paper presents results of studies on the effects of multiple speech transcoding operations in the case of GSM standard with 8 kSps and 16 kSps sampling rate. Differences between the MFCC coefficients obtained by successive transcoding were considered. The aim of comparisons is to check the possibility for separation and detection of the used GSM encoder. During the research we used the TIMIT database recordings, transcoded four times by GSM codecs. A possibility of encoder type detection was analyzed based on differences between the curvilinear approximations of the MFCC coefficient errors.

PL

Artykuł prezentuje rezultaty badań nad wpływem wielokrotnego transkodowania sygnału audio próbkowanego z szybkością 8 kSps dla standardu GSM, oraz 16 kSps. Przeanalizowane zostały uzyskane różnice między współczynnikami MFCC, otrzymane w wyniku kolejnych transkodowań. Głównym celem porównania jest sprawdzenie możliwości separacji danych oraz detekcji wykorzystywanego w transmisji kodera GSM. Do eksperymentu wykorzystana została baza nagrań sygnału mowy TIMIT, transkodowana czterokrotnie przez kodery GSM. Przeanalizowane zostały możliwości detekcji typu kodera na podstawie różnic między aproksymatami krzywoliniowymi błędów współczynników MFCC. (Analiza wpływu wielokrotnego transkodowania GSM na różnice między współczynnikami MFCC).