Warianty tytułu
Analiza dyskryminacyjna maksymalnego marginesu w rozpoznawaniu emocji w mowie
Języki publikacji
Abstrakty
A novel speech emotion recognition method based on the generalized maximum margin discriminant analysis (GMMDA) method is proposed in this paper. GMMDA is a multi-class extension of our proposed two-class dimensionality reduction method based on maximum margin discriminant analysis (MMDA), which utilizes the normal direction of optimal hyperplane of linear support vector machine (SVM) as the projection vector for feature extraction. To generate an optimal set of projection vectors from MMDA-based dimensionality reduction method, we impose orthogonal restrictions on the projection vectors and then recursively solve the problem. Moreover, to deal with the multi-class speech emotion recognition problem, we present two recognition schemes based on our proposed dimensionality reduction approach. One is using “one-versus-one" strategy for multi-class classification, and the other one is to compose the projection vectors of each pair of classes to obtain a transformation matrix for the multi-class dimensionality reduction.
W artykule przedstawiono metodę analizy emisji głosu pod kątem rozpoznawania emocji. Rozwiązanie bazuje na analizie dyskryminacyjnej maksymalnego marginesu GMMDA.
Czasopismo
Rocznik
Tom
Strony
86-91
Opis fizyczny
Bibliogr. 21 poz., rys., tab.
Twórcy
autor
- Southeast University, jinyun9999@gmail.com
- Jiangsu Normal University
autor
- Southeast University, wenming_zheng@seu.edu.cn
autor
- Southeast University
autor
- Southeast University
Bibliografia
- [1] R. Cowie, et al.,“Emotion recognition in human-computer interaction," IEEE Signal Process Magazine, Vol.18, pp.32-80, 2001.
- [2] T. Vogt and E.Andre, “Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition.," presented at the Proc. Multimedia and Expo(ICME05), Amsterdam, Netherlands, 2005.
- [3] B. Schuller, et al., “Brute-forcing hierarchical functionals for paralinguistics: a waste of feature space?," presented at the Proc. ICASSP, Las Vegas, NV, 2008.
- [4] B. Schuller, et al., “Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles," in Proc. Interspeech 2006, pp. 1818-1821.
- [5] B. Schuller, et al., “Recognising realistic emotions and affect in speech: State of the art and lessons learnt from te first challenge," speech communication, 2011.
- [6] I. K.Fodor, “A survey of dimension reduction techniques," 2002.
- [7] P. Pudil, et al., “Floating search methods in feature selection," Pattern Recognition Letter, vol. 15, pp. 1119-1125, 1994.
- [8] C.Bishop. Pattern recognition and machine learning. Springer,2006
- [9] I. T. Jolliffe, Principle Component Analysis. Berlin, Germany: Springer, 2002.
- [10] K. Fukunaga, Introduction to Statistical Pattern Recogniton: Academic Press, 1990.
- [11] A. Kocsor, et al., “Margin Maximizing Discriminant Analysis," ECML, pp. 227-238, 2004.
- [12] K. Kovacs, et al., “Maximum Margin Discriminant Analysis based Face Recognition," presented at the Proc. Joint Hungarian-Austrian Conf. Image Process Pattern Recognition, 2005.
- [13] I. W.-H. Tsang, et al., “Large-Scale Maximum Margin Discriminant Analysis Using Core Vector Machines " IEEE Transactions On Neural Network, vol. 19, pp. 610-624, 2008.
- [14] I. W.Tsang, et al., “Efficient kernel feature extraction for massive data sets," presented at the Proceedings of the 12th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, NY, USA, 2006.
- [15] I. W. Tsang, et al., “Diversified SVM Ensembles for Large Data Sets," presented at the Machine Learning: ECML, 2006.
- [16] H. Li, et al., “Efficient and Robust Feature Extraction by Maximum Margin Criterion," IEEE Transactions On Neural Networks, vol. 17, pp. 157-165, 2006.
- [17] S. Gu, et al., “Discriminant analysis via support vectors," Neurocomputing, vol. 73, pp. 1669-1675, 2010.
- [18] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
- [19] F.Burkhardt, et al., “A database of german emotional speech," presented at the Interspeech, 2005.
- [20] D.Bitouk, R.Verma, A.nenkova, Class-level spectral features for emotion recognition. Speech Communication.(2010), doi:10.1016/j.specom.2010.02.010
- [21] P. Boersma. Praat, a system for doing phonetics by computer. Glot International, 5(9/10):341–345, 2001.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-3a6e5be1-38d2-46e0-b469-22e92bbd496b