Wyniki wyszukiwania - BazTech

1

Singing voice analysis on the basis of acoustic parameters

Bednarz Natalia, Madej Zdzisław

Vibrations in Physical Systems

|

2023

|

Vol. 34, nr 2

art. no. 2023211

EN

Voice plays a fundamental role in human relations. In addition to its communicative function in everyday life, the voice also acts as an instrument or a working tool for singers, teachers or actors. It is said that singing is an extension of speech, but performing it correctly is a complex task that requires hard work and training. This paper draws attention to the problem of insufficient training in voice emission and voice control of singers in amateur choirs, which can cause strain and disorders of the phonatory system. Tool that can assess the quality of a person's singing on the basis of acoustic parameters may prove useful. In order to determine parameters that could help evaluate the correctness of singing, a study was conducted on a group of 10 choir members and one professional singer. The study consisted of recording the singers' voice during singing and speech. The subjects performed simple vocal exercises consisting mainly in upward and downward sound modulation. In this study, portions of the recordings were analysed to determine parameters like Maximal Phonation Time (MPT), Singing Power Ratio (SPR) or signal integral. The values of obtained parameters for the choristers were compared with the results of the professional singer, which allowed to select those parameters that may be helpful in the evaluation of the singing voice. The parameters for which the connection between their value and singing correctness has been shown create a vector of features that can be used to assess the correctness of classical singing. The paper also describes further research plan.

2

Classification of Parkinson’s disease and other neurological disorders using voice features extraction and reduction techniques

Majdoubi Oumaima, Benba Achraf, Hammouch Ahmed

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2023

|

T. 13, nr 3

16--22

EN

This study aimed to differentiate individuals with Parkinson's disease (PD) from those with other neurological disorders (ND) by analyzing voice samples, considering the association between voice disorders and PD. Voice samples were collected from 76 participants using different recording devices and conditions, with participants instructed to sustain the vowel /a/ comfortably. PRAAT software was employed to extract features including autocorrelation (AC), cross-correlation (CC), and Mel frequency cepstral coefficients (MFCC) from the voice samples. Principal component analysis (PCA) was utilized to reduce the dimensionality of the features. Classification Tree (CT), Logistic Regression, Naive Bayes (NB), Support Vector Machines (SVM), and Ensemble methods were employed as supervised machine learning techniques for classification. Each method provided distinct strengths and characteristics, facilitating a comprehensive evaluation of their effectiveness in distinguishing PD patients from individuals with other neurological disorders. The Naive Bayes kernel, using seven PCA-derived components, achieved the highest accuracy rate of 86.84% among the tested classification methods. It is worth noting that classifier performance may vary based on the dataset and specific characteristics of the voice samples. In conclusion, this study demonstrated the potential of voice analysis as a diagnostic tool for distinguishing PD patients from individuals with other neurological disorders. By employing a variety of voice analysis techniques and utilizing different machine learning algorithms, including Classification Tree, Logistic Regression, Naive Bayes, Support Vector Machines, and Ensemble methods, a notable accuracy rate was attained. However, further research and validation using larger datasets are required to consolidate and generalize these findings for future clinical applications.

PL

Przedstawione badanie miało na celu różnicowanie osób z chorobą Parkinsona (PD) od osób z innymi zaburzeniami neurologicznymi poprzez analizę próbek głosowych, biorąc pod uwagę związek między zaburzeniami głosu a PD. Próbki głosowe zostały zebrane od 76 uczestników przy użyciu różnych urządzeń i warunków nagrywania, a uczestnicy byli instruowani, aby wydłużyć samogłoskę /a/ w wygodnym tempie. Oprogramowanie PRAAT zostało zastosowane do ekstrakcji cech, takich jak autokorelacja (AC), krzyżowa korelacja (CC) i współczynniki cepstralne Mel (MFCC) z próbek głosowych. Analiza składowych głównych (PCA) została wykorzystana w celu zmniejszenia wymiarowości cech. Jako techniki nadzorowanego uczenia maszynowego wykorzystano drzewa decyzyjne (CT), regresję logistyczną, naiwny klasyfikator Bayesa (NB), maszyny wektorów nośnych (SVM) oraz metody zespołowe. Każda z tych metod posiadała swoje unikalne mocne strony i charakterystyki, umożliwiając kompleksową ocenę ich skuteczności w rozróżnianiu pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Naiwny klasyfikator Bayesa, wykorzystujący siedem składowych PCA, osiągnął najwyższy wskaźnik dokładności na poziomie 86,84% wśród przetestowanych metod klasyfikacji. Należy jednak zauważyć, że wydajność klasyfikatora może się różnić w zależności od zbioru danych i konkretnych cech próbek głosowych. Podsumowując, to badanie wykazało potencjał analizy głosu jako narzędzia diagnostycznego do rozróżniania pacjentów z PD od osób z innymi zaburzeniami neurologicznymi. Poprzez zastosowanie różnych technik analizy głosu i wykorzystanie różnych algorytmów uczenia maszynowego, takich jak drzewa decyzyjne, regresja logistyczna, naiwny klasyfikator Bayesa, maszyny wektorów nośnych i metody zespołowe, osiągnięto znaczący poziom dokładności. Niemniej jednak, konieczne są dalsze badania i walidacja na większych zbiorach danych w celu skonsolidowania i uogólnienia tych wyników dla przyszłych zastosowań klinicznych.

3

BiLSTM with Data Augmentation using Interpolation Methods to Improve Early Detection of Parkinson Disease

Abayomi-Alli Olusola O., Damaševičius Robertas, Maskeliūnas Rytis, Abayomi-Alli Adebayo

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

371--380

EN

The lack of dopamine in the human brain is the cause of Parkinson disease (PD) which is a degenerative disorder common globally to older citizens. However, late detection of this disease before the first clinical diagnosis has led to increased mortality rate. Research effort towards the early detection of PD has encountered challenges such as: small dataset size, class imbalance, overfitting, high false detection rate, model complexity, etc. This paper aims to improve early detection of PD using machine learning through data augmentation for very small datasets. We propose using Spline interpolation and Piecewise Cubic Hermite Interpolating Polynomial (Pchip) interpolation methods to generate synthetic data instances. We further investigate on reducing dimensionality of features for effective and real-time classification while considering computational complexity of implementation on real-life mobile phones. For classification we use Bidirectional LSTM (BiLSTM) deep learning network and compare the results with traditional machine learning algorithms like Support Vector Machine (SVM), Decision Tree, Logistic regression, KNN and Ensemble bagged tree. For experimental validation we use the Oxford Parkinson disease dataset with 195 data samples, which we have augmented with 571 synthetic data samples. The results for BiLSTM shows that even with a holdout of 90%, the model was still able to effectively recognize PD with an average accuracy for ten rounds experiment using 22 features as 82.86%, 97.1\%, and 96.37% for original, augmented (Spline) and augmented (Pchip) datasets, respectively. Our results show that proposed data augmentation schemes have significantly (p < 0.001) improved the accuracy of PD recognition on a small dataset using both classical machine learning models and BiLSTM.

4

Speech Emotion Recognition Based on Voice Fundamental Frequency

Dimitrova-Grekow Teodora, Klis Aneta, Igras-Cybulska Magdalena

Archives of Acoustics

|

2019

|

Vol. 44, No. 2

277--286

EN

The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.

5

Acoustic Parameters in the Evaluation of Voice Quality of Choral Singers. Prototype of Mobile Application for Voice Quality Evaluation

Szklanny Krzysztof

Archives of Acoustics

|

2019

|

Vol. 44, No. 3

439--446

EN

Choral singers are among intensive voice users whose excessive vocal effort puts them at risk of developing voice disorders. The aim of the work was to assess voice quality for choral singers in the choir at the Polish-Japanese Academy of Information Technology. This evaluation was carried out using the acoustic parameters from the COVAREP (A Collaborative Voice Analysis Repository For Speech Technologies) repository. A prototype of a mobile application was also prepared to allow the calculation of these parameters. The study group comprised 6 male and 19 female choir singers. The control group consisted of health non-singing individuals, 50 men and 39 women. Auditory perceptual assessment (using the RBH scale) as well as acoustic analysis were used to test the voice quality of all the participants. The voice quality of the female choir singers proved to be normal in comparison with the control group. The male choir singers were found to have tense voice in comparison with the controls. The parameters which proved most effective for voice evaluation were Peak Slope and Normalized Amplitude Quotient.

6

Bezpieczeństwo połączeń w telefonii PSTN

Piotrowski Z., Różanowski K., Gajewski P.

Zeszyty Naukowe Warszawskiej Wyższej Szkoły Informatyki

|

2012

|

nr 8

99--104

PL

Odpowiednio wczesne zabezpieczenie krytycznych systemów infrastruktury na potencjalnie groźne ataki typu voice spoofing jest warunkowane opracowaniem skutecznych metod i istnieniem dedykowanych rozwiązań technicznych. Metody ataków i obrony przed impersonizacją skupiają się zasadniczo na dwóch obszarach: zmianie głosu abonenta na inny głos (wirtualny lub innej osoby) oraz nieautoryzowanej edycji komunikatów głosowych. W nowych generacjach ataków na łącza telefoniczne, w których następuje zmiana głosu mówcy w czasie rzeczywistym lub odtwarzany jest uprzednio spreparowany komunikat, stosuje się metody obrony polegające na m.in. weryfikacji wspólnie posiadanej wiedzy lub posiadanego klucza.

EN

In order to protect critical infrastructure systems early enough against potentially dangerous attacks called spoofing voice it is required to develop effective methods and implement dedicated solutions. Methods of attack and defence against impersonalisation focus basically on two areas: changing of original voice to the voice of other subscriber (virtual simulation or voice of different person) or unauthorized editing of voice messages. The new generations of attacks on telephone lines, in which the speaker’s voice is being changed in real time or prepared message is being played, require other methods of defence involving verification of common knowledge or of the authorisation key.

7

Automatyczna ocena zaburzeń emisji głosu będących wynikiem procesów neurodegeneracyjnych w oparciu o analizę wyizolowanych głosek

Orzechowski T., Chmurzyńska K., Radkowski P.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2006

|

T. 10, z. 3

91-97

PL

Prezentowane wyniki stanowią początek badań nad automatyczną klasyfikacją głosu. W niniejszej pracy zarysowano teoretyczne podstawy fizjologiczne głosu, patologiczne zmiany w mowie powodowane dyzartrią, następnie scharakteryzowano dobór materiału lingwistycznego pod względem miejsca i sposobu artykulacji w systemie fonetycznym języka polskiego. Kolejne miejsce w pracy zajmuje opis rejestracji i wstępnej analizy głosu badanych (zmiany w realizacji głosek, natężenie głosek wymawianych wielokrotnie w izolacji, analiza widma dźwięków ciągłych). Zjawiska słyszane w badaniu subiektywnym patologa mowy, bądź neurologa zostały potwierdzone precyzyjnym badaniem obiektywnym. Uzyskane parametry pozwalają na sparametryzowanie wyników badań, umożliwiające kompleksową klasyfikację. Pozwoli to również na dokładną ocenę progresji choroby, niemożliwą w klasycznym badaniu subiektywnym.

EN

This paper presents results of preliminary research of voice pathological changes caused by dysarthria. Computer analysis of voice may lead to identification of parameters correlated with neurological diseases. The selection of linguistic material was characterized according to the place and manner of articulation in the phonetic system of Polish. Results of clinical examination allowed to determine simple markers of neurodegenerative diseases, which will serve as a basis for construction of objective examination model.