Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl
Ograniczanie wyników
Czasopisma help
Lata help
Autorzy help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 41

Liczba wyników na stronie
first rewind previous Strona / 3 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  emotion recognition
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 3 next fast forward last
EN
The paper describes the relations of speech signal representation in the layers of the convolutional neural network. Using activation maps determined by the Grad-CAM algorithm, energy distribution in the time–frequency space and their relationship with prosodic properties of the considered emotional utterances have been analysed. After preliminary experiments with the expressive speech classification task, we have selected the CQT-96 time–frequency representation. Also, we have used a custom CNN architecture with three convolutional layers in the main experimental phase of the study. Based on the performed analysis, we show the relationship between activation levels and changes in the voiced parts of the fundamental frequency trajectories. As a result, the relationships between the individual activation maps, energy distribution, and fundamental frequency trajectories for six emotional states were described. The results show that the convolutional neural network in the learning process uses similar fragments from time–frequency representation, which are also related to the prosodic properties of emotional speech utterances. We also analysed the relations of the obtained activation maps with time-domain envelopes. It allowed observing the importance of the speech signals energy in classifying individual emotional states. Finally, we compared the energy distribution of the CQT representation in relation to the regions’ energy overlapping with masks of individual emotional states. In the result, we obtained information on the variability of energy distributions in the selected signal representation speech for particular emotions.
EN
Speech emotion recognition is an important part of human-machine interaction studies. The acoustic analysis method is used for emotion recognition through speech. An emotion does not cause changes on all acoustic parameters. Rather, the acoustic parameters affected by emotion vary depending on the emotion type. In this context, the emotion-based variability of acoustic parameters is still a current field of study. The purpose of this study is to investigate the acoustic parameters that fear affects and the extent of their influence. For this purpose, various acoustic parameters were obtained from speech records containing fear and neutral emotions. The change according to the emotional states of these parameters was analyzed using statistical methods, and the parameters and the degree of influence that the fear emotion affected were determined. According to the results obtained, the majority of acoustic parameters that fear affects vary according to the used data. However, it has been demonstrated that formant frequencies, mel-frequency cepstral coefficients, and jitter parameters can define the fear emotion independent of the data used.
4
100%
EN
The study investigates the use of speech signal to recognise speakers’ emotional states. The introduction includes the definition and categorization of emotions, including facial expressions, speech and physiological signals. For the purpose of this work, a proprietary resource of emotionally-marked speech recordings was created. The collected recordings come from the media, including live journalistic broadcasts, which show spontaneous emotional reactions to real-time stimuli. For the purpose of signal speech analysis, a specific script was written in Python. Its algorithm includes the parameterization of speech recordings and determination of features correlated with emotional content in speech. After the parametrization process, data clustering was performed to allows for the grouping of feature vectors for speakers into greater collections which imitate specific emotional states. Using the t-Student test for dependent samples, some descriptors were distinguished, which identified significant differences in the values of features between emotional states. Some potential applications for this research were proposed, as well as other development directions for future studies of the topic.
5
Content available remote Classifying and Visualizing Emotions with Emotional DAN
100%
EN
Classification of human emotions remains an important and challenging task for many computer vision algorithms, especially in the era of humanoid robots which coexist with humans in their everyday life. Currently proposed methods for emotion recognition solve this task using multi-layered convolutional networks that do not explicitly infer any facial features in the classification phase. In this work, we postulate a fundamentally different approach to solve emotion recognition task that relies on incorporating facial landmarks as a part of the classification loss function. To that end, we extend a recently proposed Deep Alignment Network (DAN) with a term related to facial features. Thanks to this simple modification, our model called EmotionalDAN is able to outperform state-of-the-art emotion classification methods on two challenging benchmark dataset by up to 5%. Furthermore, we visualize image regions analyzed by the network when making a decision and the results indicate that our EmotionalDAN model is able to correctly identify facial landmarks responsible for expressing the emotions.
EN
Communication atmosphere based on emotional states of humans and robots is modeled by using Fuzzy Atmosfield (FA), where the human emotion is estimated from bimodal communication cues (i.e., speech and gesture) using weighted fusion and fuzzy logic, and the robot emotion is generated by emotional expression synthesis. It makes possible to quantitatively express overall affective expression of individuals, and helps to facilitate smooth communication in humans-robots interaction. Experiments in a household environment are performed by four humans and five eye robots, where emotion recognition of humans based on bimodal cues achieves 84% accuracy in average, improved by about 10% compared to that using only speech. Experimental results from the model of communication atmosphere based on the FA are evaluated by comparing with questionnaire surveys, from which the maximum error of 0.25 and the minimum correlation coefficient of 0.72 for three axes in the FA confirm the validity of the proposal. In ongoing work, an atmosphere representation system is being planned for casual communication between humans and robots, taking into account multiple emotional modalities such as speech, gesture, and music.
EN
Today’s human-computer interaction systems have a broad variety of applications in which automatic human emotion recognition is of great interest. Literature contains many different, more or less successful forms of these systems. This work emerged as an attempt to clarify which speech features are the most informative, which classification structure is the most convenient for this type of tasks, and the degree to which the results are influenced by database size, quality and cultural characteristic of a language. The research is presented as the case study on Slavic languages.
EN
This article presents an approach to emotion recognition based on facial expressions of gamers. With application of certain methods crucial features of an analyzed face like eyebrows’ shape, eyes and mouth width, height were extracted. Afterward a group of artificial intelligence methods was applied to classify a given feature set as one of the following emotions: happiness, sadness, anger and fear. The approach presented in this paper was verified using specialized databases and real-life situations. The obtained results are vastly promising, thus further work on the subject should be continued.
PL
Artykuł prezentuje sposób rozpoznawania emocji na podstawie wyrazu twarzy graczy. Przy zastosowaniu określonych metod wybierano niezbędne cechy analizowanej twarzy: kształt brwi, szerokość i wysokość ust oraz oczu. Następnie zastosowano zestaw narzędzi sztucznej inteligencji do rozpoznania odpowiednich emocji (szczęście, smutek, złość i strach) na podstawie uzyskanych zbiorów cech. Rozwiązanie przedstawione w niniejszej publikacji zostało zweryfikowane za pomocą obrazów zawartych w specjalistycznych bazach danych oraz przedstawiających sytuacje z życia codziennego. Otrzymane wyniki są bardzo obiecujące i zachęcają do kontynuacji prac nad tym zagadnieniem.
EN
The human voice is one of the basic means of communication, thanks to which one also can easily convey the emotional state. This paper presents experiments on emotion recognition in human speech based on the fundamental frequency. AGH Emotional Speech Corpus was used. This database consists of audio samples of seven emotions acted by 12 different speakers (6 female and 6 male). We explored phrases of all the emotions – all together and in various combinations. Fast Fourier Transformation and magnitude spectrum analysis were applied to extract the fundamental tone out of the speech audio samples. After extraction of several statistical features of the fundamental frequency, we studied if they carry information on the emotional state of the speaker applying different AI methods. Analysis of the outcome data was conducted with classifiers: K-Nearest Neighbours with local induction, Random Forest, Bagging, JRip, and Random Subspace Method from algorithms collection for data mining WEKA. The results prove that the fundamental frequency is a prospective choice for further experiments.
11
Content available remote Comparison of speaker dependent and speaker independent emotion recognition
88%
EN
This paper describes a study of emotion recognition based on speech analysis. The introduction to the theory contains a review of emotion inventories used in various studies of emotion recognition as well as the speech corpora applied, methods of speech parametrization, and the most commonly employed classification algorithms. In the current study the EMO-DB speech corpus and three selected classifiers, the k-Nearest Neighbor (k-NN), the Artificial Neural Network (ANN) and Support Vector Machines (SVMs), were used in experiments. SVMs turned out to provide the best classification accuracy of 75.44% in the speaker dependent mode, that is, when speech samples from the same speaker were included in the training corpus. Various speaker dependent and speaker independent configurations were analyzed and compared. Emotion recognition in speaker dependent conditions usually yielded higher accuracy results than a similar but speaker independent configuration. The improvement was especially well observed if the base recognition ratio of a given speaker was low. Happiness and anger, as well as boredom and neutrality, proved to be the pairs of emotions most often confused.
PL
Niniejsza praca podejmuje próbę pomiaru cech sygnału mowy skorelownych z jego zawartością emocjonalną (na przykładzie emocji podstawowych). Zaprezentowano korpus mowy zaprojektowany tak, by umożliwić różnicową analizę niezależną od mówcy i treści oraz przeprowadzono testy mające na celu ocenę jego przydatności do automatyzacji wykrywania emocji w mowie. Zaproponowano robocze profile wokalne emocji. Artykuł prezentuje również propozycje aplikacji medycznych opartych na pomiarach emocji w głosie.
EN
The paper presents an approach to creating new measures of emotional content of speech signals. The results of this project constitute the basis or further research in this field. For analysis of differences of the basic emotional states independently of a speaker and semantic content, a corpus of acted emotional speech was designed and recorded. The alternative methods for emotional speech signal acquisition are presented and discussed (Section 2). Preliminary tests were performed to evaluate the corpus applicability to automatic emotion recognition. On the stage of recording labeling, human perceptual tests were applied (using recordings with and without semantic content). The results are presented in the form of the confusion table (Tabs. 1 and 2). The further signal processing: parametrisation and feature extraction techniques (Section 3) allowed extracting a set of features characteristic for each emotion, and led to developing preliminary vocal emotion profiles (sets of acoustic features characteristic for each of basic emotions) - an example is presented in Tab. 3. Using selected feature vectors, the methods for automatic classification (k nearest neighbours and self organizing neural network) were tested. Section 4 contains the conclusions: analysis of variables associated with vocal expression of emotions and challenges in further development. The paper also discusses use of the results of this kind of research for medical applications (Section 5).
EN
This paper concerns measurement procedures on an emotion monitoring stand designed for tracking human emotions in the Human-Computer Interaction with physiological characteristics. The paper addresses the key problem of physiological measurements being disturbed by a motion typical for human-computer interaction such as keyboard typing or mouse movements. An original experiment is described, that aimed at practical evaluation of measurement procedures performed at the emotion monitoring stand constructed at GUT. Different locations of sensors were considered and evaluated for suitability and measurement precision in the Human- Computer Interaction monitoring. Alternative locations (ear lobes and forearms) for skin conductance, blood volume pulse and temperature sensors were proposed and verified. Alternative locations proved correlation with traditional locations as well as lower sensitiveness to movements like typing or mouse moving, therefore they can make a better solution for monitoring the Human-Computer Interaction.
EN
In this paper KinectRecorder comprehensive tool is described which provides for convenient and fast acquisition, indexing and storing of RGB-D video streams from Microsoft Kinect sensor. The application is especially useful as a supporting tool for creation of fully indexed databases of facial expressions and emotions that can be further used for learning and testing of emotion recognition algorithms for affect-aware applications. KinectRecorder was successfully exploited for creation of Facial Expression and Emotion Database (FEEDB) significantly reducing the time of the whole project consisting of data acquisition, indexing and validation. FEEDB has already been used as a learning and testing dataset for a few emotion recognition algorithms which proved utility of the database, and the KinectRecorder tool.
PL
W pracy przedstawiono kompleksowe narzędzie, które pozwala na wygodną i szybką akwizycję, indeksowanie i przechowywanie nagrań strumieni RGB-D z czujnika Microsoft Kinect. Aplikacja jest szczególnie przydatna jako narzędzie wspierające tworzenie w pełni zaindeksowanych baz mimiki i emocji, które mogą być następnie wykorzystywane do nauki i testowania algorytmów rozpoznawania emocji użytkownika dla aplikacji je uwzględniających. KinectRecorder został z powodzeniem wykorzystany do utworzenia bazy mimiki i emocji FEEDB, znacznie skracając czas całego procesu, obejmującego akwizycję, indeksowanie i walidację nagrań. Baza FEEDB została już z powodzeniem wykorzystana jako uczący i testujący zbiór danych dla kilku algorytmów rozpoznawania emocji, co wykazało przydatność zarówno jej, jak również narzędzia KinectRecorder.
Logistyka
|
2015
|
tom nr 4
9712--9721, CD3
EN
Emotion recognition system can improve customer service especially in the case of call centers. Knowledge of the emotional state of the speaker would allow the operator to adapt better and generally improve cooperation. Research in emotion recognition focuses primarily on speech analysis. Emotion classification algorithms designed for real-world application must be able to interpret the emotional content of an utterance or dialog beyond various limitation i.e. speaker, context, personality or culture. This paper presents research on emotion recognition system of spontaneous voice stream based on a multimodal classifier. Experiments were carried out basing on natural speech characterized by seven emotional states. The process of multimodal classification was based on Plutchik’s theory of emotion and emotional profiles.
EN
This paper is focused on automatic emotion recognition from static grayscale images. Here, we propose a new approach to this problem, which combines a few other methods. The facial region is divided into small subregions, which are selected for processing based on a face relevance map. From these regions, local directional pattern histograms are extracted and concatenated into a single feature histogram, which is classified into one of seven defined emotional states using support vector machines. In our case, we distinguish: anger, disgust, fear, happiness, neutrality, sadness and surprise. In our experimental study we demonstrate that the expression recognition accuracy for Japanese Female Facial Expression database is one of the best compared with the results reported in the literature.
PL
W artykule tym przedstawiono zagadnienie rozpoznawania emocji na podstawie obrazów w skali szarości. Prezentujemy w nim nowe podejście, stanowiące połączenie kilku istniejących metod. Obszar twarzy jest dzielony na mniejsze regiony, które są wybierane do dalszego przetwarzania, z uwzględnieniem binarnych map istotności. Z każdego regionu ekstrahowany jest histogram lokalnych wzorców binarnych, a następnie histogramy są składane do wektora cech i klasyfikowane za pomocą maszyny wektorów podpierających. W naszym przypadku rozróżniamy takie emocje, jak: gniew, wstręt, strach, szczęście, neutralność, smutek i zaskoczenie. Podczas naszych eksperymentów pokazaliśmy, że nasze podejście umożliwia poprawę skuteczności rozpoznawania emocji dla bazy Japanese Female Facial Expression względem innych istniejących metod.
PL
W artykule omówiono sposoby pozyskiwania, przetwarzania i reprezentacji sygnałów audio w celu prowadzenia dalszych analiz związanych zarówno z semantyką wypowiedzi, jak również z cechami behawioralnymi mówcy. Przyjęto, że analiza danych powinna być prowadzona możliwie blisko miejsca ich przechowywania, np. w komercyjnych serwerach baz danych z wykorzystaniem enkapsulacji klas obiektowych do elementów programistycznych relacyjnego serwera. Poza wykorzystaniem reprezentacji sygnału za pomocą wektorów wyrażonych w skalach cepstralnych, ważnym elementem analizy jest zastosowanie algorytmów dopasowania strumieni wektorów danych – Spring DTW. W przypadku analizy stanów emocjonalnych do wzmocnienia procesu klasyfikacji zastosowano komitety klasyfikatorów działających na różnych zestawach atrybutów, a analizę odniesiono do modelu Plutchika.
EN
The article describes methods of acquisition, processing and representation of audio signals for the purpose of further analysis associated with both the semantics of expression, as well as behavioral characteristics of the speaker. It is assumed that the data analysis should be carried out as close to the place of storage, eg. in commercial database servers using the encapsulation of object classes to relational server software components. In addition to using a representation of a signal as vectors in cepstral scale, an important part of the analysis is to apply matching algorithms - Spring DTW. In order to enhance the analysis of emotional states classification committees consiting of classifiers operating on different sets of attributes were used. Emotion detection was based on Plutchik’s wheel.
18
75%
EN
EEG-based emotion recognition is a challenging and active research area in affective computing. We used three-dimensional (arousal, valence and dominance) model of emotion to recognize the emotions induced by music videos. The participants watched a video (1 min long) while their EEG was recorded. The main objective of the study is to identify the features that can best discriminate the emotions. Power, entropy, fractal dimension, statistical features and wavelet energy are extracted from the EEG signals. The effects of these features are investigated and the best features are identified. The performance of the two feature selection methods, Relief based algorithm and principle component analysis (PCA), is compared. PCA is adopted because of its improved performance and the efficacies of the features are validated using support vector machine, K-nearest neighbors and decision tree classifiers. Our system achieves an overall best classification accuracy of 77.62%, 78.96% and 77.60% for valence, arousal and dominance respectively. Our results demonstrated that time-domain statistical characteristics of EEG signals can efficiently discriminate different emotional states. Also, the use of three-dimensional emotion model is able to classify similar emotions that were not correctly classified by two-dimensional model (e.g. anger and fear). The results of this study can be used to support the development of real-time EEG-based emotion recognition systems.
19
Content available remote Polish emotional speech recognition based on the committee of classifiers
75%
EN
This article presents the novel method for emotion recognition from polish speech. We compared two different databases: spontaneous and acted out speech. For the purpose of this research we gathered a set of audio samples with emotional information, which serve as input database. Multiple Classifier Systems were used for classification, with commonly used speech descriptors and different groups of perceptual coefficients as features extracted from audio samples.
PL
Niniejsza praca dotyczy rozpoznawania stanów emocjonalnych na podstawie głosu. W artykule porównaliśmy mowę spontaniczną z mową odegraną. Na potrzeby zrealizowanych badań zgromadzone zostały emocjonalne nagrania audio, stanowiące kompleksową bazę wejściową. Przedstawiamy nowatorski sposób klasyfikacji emocji wykorzystujący komitety klasyfikujące, stosując do opisu emocji powszechnie używane deskryptory sygnału mowy oraz percepcyjne współczynniki hybrydowe.
20
75%
EN
Emotions play a significant role in product design for end-users. However, how to take emotions into account is not yet completely understood. We argue that this gap is due to a lack of methodological and technological frameworks for effective investigation of the elicitation conditions related to emotions and corresponding emotional responses of the users. Emotion-driven design should encompass a thorough assessment of users' emotional reactions in relation to certain elicitation conditions. By using Virtual Reality (VR) as mean to perform this investigation, we propose a novel methodological framework, referred to as the VR-Based Emotion-Elicitation-and-Recognition loop (VEE-loop), to close this gap.
first rewind previous Strona / 3 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.