PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Learning rate interference to overcome overfitting for Audio Emotion Recognition using LSTM

Treść / Zawartość
Identyfikatory
Warianty tytułu
PL
Interferencja tempa uczenia się w celu przezwyciężenia nadmiernego dopasowania do rozpoznawania emocji dźwiękowych przy użyciu LSTM
Języki publikacji
EN
Abstrakty
EN
This paper presents a neural network architecture approach to recognize human emotions on features extracted from an audio song. The features used to train the classifier are extracted using Mel Frequency Cepstrum Coefficients (MFCC). The presented neural network architecture is built based on the LSTM network, due to its ability to learn long-term dependencies and its simple implementation that helps highlight the importance of the learning rate hyper-parameter. By tuning the learning rate, the neural network tracks it regularly each time the weights are updated. Which worked perfectly to overcome the overfitting problem and achieve an accuracy result of 75.80%.
PL
W artykule przedstawiono podejście oparte na architekturze sieci neuronowej umożliwiające rozpoznawanie ludzkich emocji na podstawie cech wyodrębnionych z utworu audio. Cechy używane do uczenia klasyfikatora są wyodrębniane przy użyciu współczynników cepstrum częstotliwości Mel (MFCC). Zaprezentowana architektura sieci neuronowej zbudowana jest w oparciu o sieć LSTM, ze względu na jej zdolność uczenia się zależności długoterminowych oraz prostą implementację, która pomaga podkreślić znaczenie hiperparametru szybkości uczenia się. Dostrajając szybkość uczenia się, sieć neuronowa śledzi ją regularnie za każdym razem, gdy wagi są zmieniane zaktualizowany. Co sprawdziło się doskonale, aby przezwyciężyć problem nadmiernego dopasowania i osiągnąć wynik dokładności 75,80%.
Rocznik
Strony
125--128
Opis fizyczny
Bibliogr. 19 poz., rys., tab.
Twórcy
autor
  • Signal Image and Information Technology(SITI) Laboratory, Department of Electrical Engineering, National Engineering School of Tunis, Campus Universitaire Farhat Hached el Manar BP 37, Le Belvedere 1002 TUNIS
autor
  • Signal Image and Information Technology(SITI) Laboratory, Department of Electrical Engineering, National Engineering School of Tunis, Campus Universitaire Farhat Hached el Manar BP 37, Le Belvedere 1002 TUNIS
Bibliografia
  • [1] J Ancilin and A Milton. Improved speech emotion recognition with mel frequency magnitude coefficient. Applied Acoustics, 179:108046, 2021.
  • [2] Muzaffer Aslan. Cnn based efficient approach for emotion recognition. Journal of King Saud University- and Information Sciences, 34(9):7335–7346, 2022.
  • [3] Souha Ayadi and Zied Lachiri. A combined cnn-lstm network for audio emotion recognition using speech and song attributs. In 2022 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pages 1–6. IEEE, 2022.
  • [4] Souha Ayadi and Zied Lachiri. Visual emotion sensing using convolutional neural network. Przeglad Elektrotechniczny, 98(3), 2022.
  • [5] P Ashok Babu, V Siva Nagaraju, and Rajeev Ratna Vallabhuni. Speech emotion recognition system with librosa. In 2021 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), pages 421–424. IEEE, 2021.
  • [6] Mohammad Mahdi Bejani and Mehdi Ghatee. A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, pages 1–48, 2021.
  • [7] Pádraig Cunningham and Sarah Jane Delany. Underestimation bias and underfitting in machine learning. In Trustworthy AIIntegrating Learning, Optimization and Reasoning: First International Workshop, TAILOR 2020, Virtual Event, September 4–5, 2020, Revised Selected Papers 1, pages 20– 31. Springer, 2021.
  • [8] Na He and Sam Ferguson. Multi-view neural networks for raw audio-based music emotion recognition. In 2020 IEEE International Symposium on Multimedia (ISM), pages 168–172. IEEE, 2020.
  • [9] Hyun-il Lim. A study on dropout techniques to reduce overfitting in deep neural networks. In Advanced Multimedia and Ubiquitous Engineering: MUE- 2020, pages 133–139. Springer, 2021.
  • [10] Steven R Livingstone and Frank A Russo. The ryerson audiovisual database of emotional speech and song ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 13(5):e0196391, 2018.
  • [11] Rezwan Matin and Damian Valles. A speech emotion recognition solution-based on support vector machine for children with autism spectrum disorder to help identify human emotions. In 2020 Intermountain Engineering, Technology and Computing (IETC), pages 1–6, 2020.
  • [12] Yashon O Ouma, Lawrence Omai, et al. Flood susceptibility mapping using image-based 2d-cnn deep learning: Overview and case study application using multiparametric spatial data in data-scarce urban environments. International Journal of Intelligent Systems, 2023, 2023.
  • [13] Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. Data augmentation can improve robustness. Advances in Neural Information Processing Systems, 34:29935–29948, 2021.
  • [14] Panissara Thanapol, Kittichai Lavangnananda, Pascal Bouvry, Frédéric Pinel, and Franck Leprévost. Reducing overfitting and improving generalization in training convolutional neural network (cnn) under limited sample sizes in image recognition. In 2020-5th International Conference on Information Technology (InCIT), pages 300–305. IEEE, 2020.
  • [15] Jianyou Wang, Michael Xue, Ryan Culhane, Enmao Diao, Jie Ding, and Vahid Tarokh. Speech emotion recognition with dualsequence lstm architecture. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6474– 6478. IEEE, 2020.
  • [16] Ashima Yadav and Dinesh Kumar Vishwakarma. A multilingual framework of cnn and bi-lstm for emotion classification. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6, 2020.
  • [17] Satya Prakash Yadav, Subiya Zaidi, Annu Mishra, and Vibhash Yadav. Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (rnn). Archives of Computational Methods in Engineering, 29(3):1753–1770, 2022.
  • [18] Kaichao You, Mingsheng Long, Jianmin Wang, and Michael I Jordan. How does learning rate decay help modern neural networks? ArXiv preprint arXiv:1908.01878, 2019.
  • [19] S Zargar. Introduction to sequence learning models: Rnn, lstm, gru. Department of Mechanical and Aerospace Engineering, North Carolina State University, Raleigh, North Carolina, 27606, 2021.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki i promocja sportu (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-c6c6bbcd-15f8-4a67-8421-d7807ae4ddf0
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.