PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Audio emotion recognition based on song modality using Conv1D vs Conv2D

Treść / Zawartość
Identyfikatory
Warianty tytułu
PL
Rozpoznawanie emocji dźwiękowych w oparciu o modalność utworu przy użyciu Conv1D i Conv2D
Języki publikacji
EN
Abstrakty
EN
Audio emotion recognition is a very advanced process of detecting emotions from different forms of signals. The form of modality presented in this article is Audio-Song. The goal is to create different neural network architectures capable of recognizing the emotions of a song performer. The database used for this purpose is the RAVDESS database. We compared the performance of Conv1D with Conv2D, where MFCC is used for the feature extractor for both neural network architectures. The accuracies obtained are 83.95 and 82.47% respectively. The better of the two models is Conv1D regarding the accuracy result obtained and the complexity of the model, where it seems that the Conv1D model is less complex than the Conv2D model.
PL
Rozpoznawanie emocji dźwiękowych to bardzo zaawansowany proces wykrywania emocji na podstawie różnych form sygnałów. Formą modalności przedstawioną w tym artykule jest utwór audio. Celem jest stworzenie różnych architektur sieci neuronowych zdolnych do rozpoznawania emocji wykonawcy utworu. Bazą danych wykorzystywaną w tym celu jest baza danych RAVDESS. Porównaliśmy wydajność Conv1D z Conv2D, gdzie MFCC jest używane do ekstraktora cech dla obu architektur sieci neuronowych. Uzyskane dokładności wynoszą odpowiednio 83,95 i 82,47%. Lepszym z obu modeli jest Conv1D pod względem uzyskanego wyniku dokładności i złożoności modelu, gdzie wydaje się, że model Conv1D jest mniej złożony niż model Conv2D.
Rocznik
Strony
54--57
Opis fizyczny
Bibliogr. 16 poz., rys., tab.
Twórcy
autor
  • Signal Image and Information Technology(SITI) Laboratory, Department of Electrical Engineering, National Engineering School of Tunis, Campus Universitaire Farhat Hached el Manar BP 37, Le Belvedere 1002 TUNIS
autor
  • Signal Image and Information Technology(SITI) Laboratory, Department of Electrical Engineering, National Engineering School of Tunis, Campus Universitaire Farhat Hached el Manar BP 37, Le Belvedere 1002 TUNIS
Bibliografia
  • [1] Wejdan Ibrahim AlSurayyi, Norah Saleh Alghamdi,and Ajith Abraham. Deep learning with word embedding modeling for a sentiment analysis of online reviews. International Journal of Computer Information Systems and Industrial Management Applications, 11:227–241, 2019.
  • [2] Bagus Tris Atmaja, Akira Sasou, and Masato Akagi. Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion. Speech Communication, 140:11 28, 2022.
  • [3] Souha Ayadi and Zied Lachiri. A combined cnn-lstm network for audio emotion recognition using speech and song attributs. In 2022 6th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pages 1–6. IEEE, 2022.
  • [4] Subhajit Chatterjee and Yung-Cheol Byun. Eeg-based emotion classification using stacking ensemble approach. Sensors, 22(21):8550, 2022.
  • [5] Stuart Cunningham, Harrison Ridley, Jonathan Weinel, and Richard Picking. Supervised machine learning for audio emotion recognition: Enhancing film sound design using audio features, regression models and artificial neural networks. Personal and Ubiquitous Computing, 25:637–650, 2021.
  • [6]Harshit Dolka, Arul Xavier VM, and Sujitha Juliet. Speech emotion recognition using ann on mfcc features. In 2021 3rd international conference on signal processing and communication (ICPSC), pages 431–435. IEEE, 2021.
  • [7] Pooja Gambhir, Amita Dev, Poonam Bansal, and Deepak Kumar Sharma. End-to-end multi-modal low-resourced speech keywords recognition using sequential conv2d nets. ACM Transactions on Asian and Low-Resource Language Information Processing, 2023.
  • [8] Utkarsh Garg, Sachin Agarwal, Shubham Gupta, Ravi Dutt, and Dinesh Singh. Prediction of emotions from the audio speech signals using mfcc, mel and chroma. In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN), pages 87–91. IEEE, 2020.
  • [9] Donghong Han, Yanru Kong, Jiayi Han, and Guoren Wang. A survey of music emotion recognition. Frontiers of Computer Science, 16(6):166335, 2022.
  • [10] C Hema and Fausto Pedro Garcia Marquez. Emotional speech recognition using cnn and deep learning techniques. Applied Acoustics,211:109492, 2023.
  • [11] S Jothimani and K Premalatha. Mff-saug: Multi feature fusion with spectrogram augmentation of speech emotion recognition using convolution neural network. Chaos, Solitons & Fractals, 162:112512, 2022.
  • [12] Steven R Livingstone and Frank A Russo. The ryerson audio visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north american english. PloS one, 13(5):e0196391, 2018.
  • [13] Zixuan Peng, Yu Lu, Shengfeng Pan, and Yunfeng Liu. Efficient speech emotion recognition using multi-scale cnn and attention. In ICASSP 2021-2021 IEEE Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3020–3024. IEEE, 2021.
  • [14] R Raja Subramanian, Yalla Sireesha, Yalla Satya Praveen Kumar Reddy, Tavva Bindamrutha, Mekala Harika, and R Raja Sudharsan. Audio emotion recognition by deep neural networks and machine learning algorithms. In 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pages 1–6. IEEE, 2021.
  • [15] Jianyou Wang, Michael Xue, Ryan Culhane, Enmao Diao, Jie Ding, and Vahid Tarokh. Speech emotion recognition with dual sequence lstm architecture. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6474 6478.IEEE, 2020.
  • [16] Satya Prakash Yadav, Subiya Zaidi, Annu Mishra, and Vibhash Yadav. Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (rnn). Archives of Computa- tional Methods in Engineering, 29(3):1753–1770, 2022.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki i promocja sportu (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-4fb24514-b1a6-49e6-946d-4d7e0374c6c7
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.