Human Face Expressions from Images

Pilarczyk, Rafał; Chang, Xin; Skarbek, Władysław

doi:10.3233/FI-2019-1833

Artykuł - szczegóły

Tytuł artykułu

Human Face Expressions from Images

Autorzy

Pilarczyk Rafał , Chang Xin , Skarbek Władysław

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2019-1833

Warianty tytułu

Języki publikacji

Abstrakty

Several computer algorithms for recognition of visible human emotions are compared at the web camera scenario using CNN/MMOD face detector. The recognition refers to four face expressions: smile, surprise, anger, and neutral. At the feature extraction stage, the following three concepts of face description are confronted: (a) static 2D face geometry represented by its 68 characteristic landmarks (FP68); (b) dynamic 3D geometry defined by motion parameters for eight distinguished face parts (denoted as AU8) of personalized Candide-3 model; (c) static 2D visual description as 2D array of gray scale pixels (known as facial raw image). At the classification stage, the performance of two major models are analyzed: (a) support vector machine (SVM) with kernel options; (b) convolutional neural network (CNN) with variety of relevant tensor processing layers and blocks of them. The models are trained for frontal views of human faces while they are tested for arbitrary head poses. For geometric features, the success rate (accuracy) indicate nearly triple increase of performance of CNN with respect to SVM classifiers. For raw images, CNN outperforms in accuracy its best geometric counterpart (AU/CNN) by about 30 percent while the best SVM solutions are inferior. For F-score the high advantage of raw/CNN over geometric/CNN and geometric/SVM is observed, as well. We conclude that contrary to CNN based emotion classifiers, the generalization capability wrt human head pose for SVM based emotion classifiers, is worse too.

Słowa kluczowe

face expression recognition face landmarks facial action units SVM classifier CNN classifier

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2019

Tom

Vol. 168, nr 2-4

Strony

287--310

Opis fizyczny

Bibliogr. 36 poz., fot., rys., tab., wykr.

Twórcy

autor

Pilarczyk Rafał

r.pilarczyk@gmail.com

Institute of Radioelectronics and Multimedia Technology, Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland

autor

Chang Xin

Institute of Radioelectronics and Multimedia Technology, Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland

autor

Skarbek Władysław

Institute of Radioelectronics and Multimedia Technology, Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland

Bibliografia

[1] Ekman P, Friesen WV. Constants across cultures in the face and emotion. Journal of personality and social psychology, 1971. 17 2:124-9.
[2] Lien JJ, Kanade T, Cohn JF, Li CC. Automated facial expression recognition based on FACS action units. In: Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. 1998 pp. 390-395. doi:10.1109/AFGR.1998.670980.
[3] Ekman P, Friesen W. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, 1978.
[4] Ahlberg J. CANDIDE-3 - An Updated Parameterised Face. Technical report, Image Coding Group. Dept. of Electrical Engineering, Linkping University, 2001.
[5] Yuksel K, Chang X, Skarbek W. Smile detectors correlation. In: Proc. SPIE, volume 10445. 2017 pp.10445-10445-12. doi:10.1117/12.2280760.
[6] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017. 60(6):84-90. doi:10.1145/3065386.
[7] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L. ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575 [cs], 2014. ArXiv: 1409.0575, URL http://arxiv.org/abs/1409.0575.
[8] Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs], 2014. ArXiv: 1409.1556, URL http://arxiv.org/abs/1409.1556.
[9] Szegedy C, Wei Liu, Yangqing Jia, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, USA. ISBN 978-1-4673-6964-0, 2015 pp. 1-9. doi:10.1109/CVPR.2015.7298594.
[10] Liu M, Wang R, Li S, Shan S, Huang Z, Chen X. Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild. In: Proceedings of the 16th International Conference on Multimodal Interaction - ICMI ’14. ACM Press, Istanbul, Turkey. ISBN 978-1-4503-2885-2, 2014 pp.494-501. doi:10.1145/2663204.2666274.
[11] Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M. Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on International conference on multimodal interaction - ICMI ’13. ACM Press, Sydney, Australia. ISBN 978-1-4503-2129-7, 2013 pp. 517-524. doi:10.1145/2522848.2531741.
[12] Liu P, Han S, Meng Z, Tong Y. Facial Expression Recognition via a Boosted Deep Belief Network. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Columbus, OH, USA. ISBN 978-1-4799-5118-5, 2014 pp. 1805-1812. doi:10.1109/CVPR.2014.233.
[13] Kanou SE, Ferrari RC, Mirza M, Jean S, Carrier PL, Dauphin Y, Boulanger-Lewandowski N, Aggarwal A, Zumer J, Lamblin P, Raymond JP, Pal C, Desjardins G, Pascanu R, Warde-Farley D, Torabi A, Sharma A, Bengio E, Konda KR, Wu Z, Bouthillier X, Froumenty P, Gulcehre C, Memisevic R, Vincent P, Courville A, Bengio Y. Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International conference on multimodal interaction - ICMI ’13. ACM Press, Sydney, Australia. ISBN 978-1-4503-2129-7, 2013 pp. 543-550. doi:10.1145/2522848.2531745.
[14] Tang Y. Deep Learning using Linear Support Vector Machines. arXiv:1306.0239 [cs, stat], 2013. ArXiv: 1306.0239, URL http://arxiv.org/abs/1306.0239.
[15] Yu Z, Zhang C. Image Based Static Facial Expression Recognition with Multiple Deep Network Learning. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI’15. ACM, New York, NY, USA. ISBN 978-1-4503-3912-4, 2015 pp. 435-442. doi:10.1145/2818346.2830595.
[16] King DE. Max-Margin Object detection. arXiv:1502.00046 [cs.CV], 2015. URL http://arxiv.org/abs/1502.00046.
[17] King DE. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research, 2009. 10:1755-1758. doi:10.1145/1577069.1755843.
[18] Viola P, Jones MJ. Robust Real-Time Face Detection. Int. J. Comput. Vision, 2004. 57(2):137-154. URL http://dx.doi.org/10.1023/B:VISI.0000013087.49260.fb.
[19] Joachims T, Finley T, Yu CN. Cutting-Plane Training of Structural SVMs. Machine Learning, 2009. 77(1):27-59. doi:10.1007/s10994-009-5108-8.
[20] Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) – Volume 1 - Volume 01, CVPR ’05. IEEE Computer Society, Washington, DC, USA. ISBN 0-7695-2372-2, 2005 pp. 886-893. doi:10.1109/CVPR.2005.177.
[21] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikitlearn: Machine Learning in Python. Journal of Machine Learning Research, 2011. 12:2825-2830.arXiv:1201.0490 [cs.LG].
[22] Pilarczyk R, Skarbek W. Tuning deep learning algorithms for face alignment and pose estimation. In: Proc.SPIE, volume 10808. 2018 pp. 10808-10808-8. doi:10.1117/12.2501682.
[23] Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 [cs], 2017. ArXiv: 1704.04861, URL http://arxiv.org/abs/1704.04861.
[24] Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S, Pantic M. 300 Faces In-The-Wild Challenge: database and results. Image and Vision Computing, 2016. 47:3-18. doi:10.1016/j.imavis.2016.01.002.
[25] Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M. 300 Faces in-the-Wild Challenge: The First Facial Landmark Localization Challenge. In: 2013 IEEE International Conference on Computer Vision Workshops. IEEE, Sydney, Australia. ISBN 978-1-4799-3022-7, 2013 pp. 397-403. doi:10.1109/ICCVW.2013.59.
[26] Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M. A Semi-automatic Methodology for Facial Landmark Annotation. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, OR, USA. ISBN 978-0-7695-4990-3, 2013 pp. 896-903. doi:10.1109/CVPRW.2013.132.
[27] Chrysos GG, Antonakos E, Zafeiriou S, Snape P. Offline Deformable Face Tracking in Arbitrary Videos. In: 2015 IEEE International Conference on Computer Vision Workshop (ICCVW). IEEE, Santiago, Chile. ISBN 978-1-4673-9711-7, 2015 pp. 954-962. doi:10.1109/ICCVW.2015.126.
[28] Chrysos GG, Antonakos E, Zafeiriou S, Snape P. Offline Deformable Face Tracking in Arbitrary Videos. In: The IEEE International Conference on Computer Vision (ICCV) Workshops. 2015 pp. 50-58. doi:10.1109/ICCVW.2015.126.
[29] Chang X, Skarbek W. Facial expressions recognition by animated motion of Candide 3D model. In: Proc. SPIE, volume 10808. 2018 pp. 10808-10808-10. doi:10.1117/12.2500175.
[30] Febriana P, Skarbek W. Personalization of Candide 3D model for human computer interfacing. In: Proc. SPIE, volume 10808. 2018 pp. 10808-10808-8. doi:10.1117/12.2501645.
[31] Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops. 2010 pp. 94-101. doi:10.1109/CVPRW.2010.5543262.
[32] Aifanti N, Papachristou C, Delopoulos A. The MUG facial expression database. In: 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10. 2010 pp. 1-4.
[33] Langner O, Dotsch R, Bijlstra G, Wigboldus DHJ, Hawk ST, van Knippenberg A. Presentation and validation of the Radboud Faces Database. Cognition and Emotion, 2010. 24(8):1377-1388. doi:10.1080/02699930903485076.
[34] Skarbek W. Symbolic Tensor Neural Networks for Digital Media - from Tensor Processing via BNF Graph Rules to CREAMS Applications. Preprint stored in Cornell University Archive, 2018. abs/1809.06582. 1809.06582, URL https://arxiv.org/abs/1809.06582.
[35] Jung A. Imgaug - image augmentation library. URL https://github.com/aleju/imgaug, 2018.
[36] Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv:1610.02357 [cs], 2016. ArXiv: 1610.02357, URL http://arxiv.org/abs/1610.02357.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-b4997f8c-fa54-4c6f-92b0-3dbb1d1fc15f