Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset

Gosztolya, G.; Beke, A.; Neuberger, T.; Tóth, L.

doi:10.1515/aoa-2016-0064

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset

Autorzy

Gosztolya G. , Beke A. , Neuberger T. , Tóth L.

Treść / Zawartość

Pełne teksty:

Gosztolya_Laughter Classification_4_2016.pdf

Pobierz

Identyfikatory

DOI

10.1515/aoa-2016-0064

Warianty tytułu

Języki publikacji

Abstrakty

Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.

Słowa kluczowe

speech recognition speech technology computational paralinguistics laughter detection deep neural networks

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2016

Tom

Vol. 41, No. 4

Strony

669--682

Opis fizyczny

Bibliogr. 50 poz., fot., tab., wykr.

Twórcy

autor

Gosztolya G.

ggabor@inf.u-szeged.hu

MTA-SZTE Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary

autor

Beke A.

beke.andras@nytud.mta.hu

Research Institute for Linguistics of the Hungarian Academy of Sciences, Budapest, Hungary

autor

Neuberger T.

neuberger.tilda@nytud.mta.hu

Research Institute for Linguistics of the Hungarian Academy of Sciences, Budapest, Hungary

autor

Tóth L.

tothl@inf.u-szeged.hu

MTA-SZTE Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Szeged, Hungary

Bibliografia

1. Bachorowski J.-A., Smoski M. J., Owren M. J. (2001), The acoustic features of human laughter, Journal of the Acoustical Society of America, 110, 3, 1581–1597.
2. Bickley C., Hunnicutt S. (1992), Acoustic analysis of laughter, [in:] Proceedings of ICSLP, pp. 927–930, Banff, Canada.
3. Blomberg M., Elenius K. (1992), Speech recognition using artificial neural networks and dynamic programming, [in:] Proceedings of Fonetik, p. 57, Göteborg, Sweden.
4. Bourlard H. A., Morgan N. (1993), Connectionist Speech Recognition: A Hybrid Approach, Kluwer Academic, Norwell.
5. Brendel, M., Zaccarelli, R., and Devillers, L. (2010). A quick sequential forward floating feature selection algorithm for emotion detection from speech. [in:] Proceedings of Interspeech, pages 1157–1160, Makuhari, Japan.
6. Brueckner R., Schuller B. (2013), Hierarchical neural networks and enhanced class posteriors for social signal classification, [in:] Proceedings of ASRU, pp. 362–367.
7. Bryant G. A., Aktipis C. A. (2014), The animal nature of spontaneous human laughter, Evolution and Human Behavior, 35, 4, 327–335.
8. Busso C., Mariooryad S., Metallinou A., Narayanan S. (2013), Iterative feature normalization scheme for automatic emotion detection from speech, IEEE Transactions on Affective Computing, 4, 4, 386–397.
9. Cai R., Lu L., Zhang H.-J., Cai L.-H. (2003), Highlight sound effects detection in audio stream, [in:] Proceedings of ICME, pp. 37–40.
10. Campbell N. (2007), On the use of nonverbal speech sounds in human communication, [in:] Proceedings of COST Action 2102: Verbal and Nonverbal Communication Behaviours, pp. 117–128, Vietri sul Mare, Italy.
11. Campbell N., Kashioka H., Ohara R. (2005), No laughing matter, [in:] Proceedings of Interspeech, pp. 465–468, Lisbon, Portugal.
12. Chandrashekar G., Sahin F. (2014), A survey on feature selection methods, Computers & Electrical Engineering, 40, 1, 16–28.
13. Glenn P. (2003), Laughter in interaction, Cambridge University Press, Cambridge, UK.
14. Glorot X., Bordes A., Bengio Y. (2011), Deep sparse rectifier networks, [in:] Proceedings of AISTATS, pp. 315–323.
15. Goldstein J. H., McGhee P. E. (1972), The psychology of humor: Theoretical perspectives and empirical issues, Academic Press, New York, USA.
16. Gósy M. (2012), BEA: A multifunctional Hungarian spoken language database, The Phonetician, 105, 106, 50–61.
17. Gosztolya G. (2015a), Conflict intensity estimation from speech using greedy forward-backward feature selection, [in:] Proceedings of Interspeech, pp. 1339–1343, Dresden, Germany.
18. Gosztolya G. (2015b), On evaluation metrics for social signal detection, [in:] Proceedings of Interspeech, pp. 2504–2508, Dresden, Germany.
19. Gosztolya G., Busa-Fekete R., Tóth L. (2013), Detecting autism, emotions and social signals using AdaBoost, [in:] Proceedings of Interspeech, pp. 220–224, Lyon, France.
20. Gosztolya G., Grósz T., Busa-Fekete R., Tóth L. (2014), Detecting the intensity of cognitive and physical load using AdaBoost and Deep Rectifier Neural Networks, [in:] Proceedings of Interspeech, pp. 452–456, Singapore.
21. Grósz T., Tóth L. (2013), A comparison of Deep Neural Network training methods for Large Vocabulary Speech Recognition, [in:] Proceedings of TSD, pp. 36–43, Pilsen, Czech Republic.
22. Grósz T., Busa-Fekete R., Gosztolya G., Tóth L. (2015), Assessing the degree of nativeness and Parkinson’s condition using Gaussian Processes and Deep Rectifier Neural Networks, [in:] Proceedings of Interspeech, pp. 1339–1343.
23. Günther U. (2002), What’s in a laugh? Humour, jokes, and laughter in the conversational corpus of the BNC, Ph.D. thesis, Universität Freiburg.
24. Gupta R., Audhkhasi K., Lee S., Narayanan S. S. (2013), Speech paralinguistic event detection using probabilistic time-series smoothing and masking, [in:] Proceedings of Interspeech, pp. 173–177.
25. Hinton G. E., Osindero S., Teh Y.-W. (2006), A fast learning algorithm for deep belief nets, Neural Computation, 18, 7, 1527–1554.
26. Holmes J., Marra M. (2002), Having a laugh at work: How humour contributes to workplace culture, Journal of Pragmatics, 34, 12, 1683–1710.
27. Hudenko W., Stone W., Bachorowski J.-A. (2009), Laughter differs in children with autism: An acoustic analysis of laughs produced by children with and without the disorder, Journal of Autism and Developmental Disorders, 39, 10, 1392–1400.
28. Kennedy L. S., Ellis D. P. W. (2004), Laughter detection in meetings, [in:] Proceedings of the NIST Meeting Recognition Workshop at ICASSP, pp. 118–121, Montreal, Canada.
29. Knox M. T., Mirghafori N. (2007), Automatic laughter detection using neural networks, [in:] Proceedings of Interspeech, pp. 2973–2976, Antwerp, Belgium.
30. Kovács Gy., Tóth L. (2015), Joint optimization of spectro-temporal features and Deep Neural Nets for robust automatic speech recognition, Acta Cybernetica, 22, 1, 117–134.
31. Lockerd A., Müller F. (2002), LAFCam leveraging affective feedback camcorder, [in:] Proceedings of CHIEA, pp. 574–575, Minneapolis, MN, USA.
32. Lukács E. (1955), A characterization of the Gamma distribution, Annals of Mathematical Statistics, 26, 2, 319–324.
33. Martin R. A. (2007), The psychology of humor: An integrative approach, Elsevier, Amsterdam, NL.
34. Neuberger T., Beke A. (2013a), Automatic laughter detection in Hungarian spontaneous speech using GMM/ANN hybrid method, [in:] Proceedings of SJUSK Conference on Contemporary Speech Habits, pp. 1–13.
35. Neuberger T., Beke A. (2013b), Automatic laughter detection in spontaneous speech using GMM–SVM method, [in:] Proceedings of TSD, pp. 113–120.
36. Neuberger T., Beke A., Gósy M. (2014), Acoustic analysis and automatic detection of laughter in Hungarian spontaneous speech, [in:] Proceedings of ISSP, pp. 281–284.
37. Nwokah E. E., Davies P., Islam A., Hsu H.-C., Fogel A. (1993), Vocal affect in three-year-olds: a quantitative acoustic analysis of child laughter, Journal of the Acoustical Society of America, 94, 6, 3076–3090.
38. Rothgänger H., Hauser G., Cappellini A. C., Guidotti A. (1998), Analysis of laughter and speech sounds in Italian and German students, Naturwissenschaften, 85, 8, 394–402.
39. Salamin H., Polychroniou A., Vinciarelli A. (2013), Automatic detection of laughter and fillers in spontaneous mobile phone conversations, [in:] Proceedings of SMC, pp. 4282–4287.
40. Schapire R., Singer Y. (1999), Improved boosting algorithms using confidence-rated predictions, Machine Learning, 37, 3, 297–336.
41. Schölkopf B., Platt J., Shawe-Taylor J., Smola A., Williamson R. (2001), Estimating the support of a high-dimensional distribution, Neural Computation, 13, 7, 1443–1471.
42. Schuller B., Steidl S., Batliner A., Vinciarelli A., Scherer K., Ringeval F., Chetouani M., Weninger F., Eyben F., Marchi E., Salamin H., Polychroniou A., Valente F., Kim S. (2013), The Interspeech 2013 Computational Paralinguistics Challenge: Social signals, Conflict, Emotion, Autism, [in:] Proceedings of Interspeech.
43. Suarez M. T., Cu J., Maria M. S. (2012), Building a multimodal laughter database for emotion recognition, [in:] Proceedings of LREC, pp. 2347–2350.
44. Tanaka H., Campbell N. (2011), Acoustic features of four types of laughter in natural conversational speech, [in:] Proceedings of ICPhS, pp. 1958–1961.
45. Tóth L. (2013), Phone recognition with Deep Sparse Rectifier Neural Networks, [in:] Proceedings of ICASSP, pp. 6985–6989.
46. Tóth L. (2015), Phone recognition with hierarchical Convolutional Deep Maxout Networks, EURASIP Journal on Audio, Speech, and Music Processing, 2015, 25, 1–13.
47. Tóth L., Gosztolya G., Vincze V., Hoffmann I., Szatlóczki G., Biró E., Zsura F., Pákáski M., Kálmán J. (2015), Automatic detection of Mild Cognitive Impairment from spontaneous speech using ASR, [in:] Proceedings of Interspeech, pp. 2694–2698, Dresden, Germany.
48. Truong K. P., van Leeuwen D. A. (2005), Automatic detection of laughter, [in:] Proceedings of Interspeech, pp. 485–488, Lisbon, Portugal.
49. Truong K. P., van Leeuwen D. A. (2007), Automatic discrimination between laughter and speech, Speech Communication, 49, 2, 144–158.
50. Vicsi K., Sztahó D., Kiss G. (2012), Examination of the sensitivity of acoustic-phonetic parameters of speech to depression, [in:] Proceedings of CogInfoCom, pp. 511–515, Kosice, Slovakia.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f747e203-4b6c-4a88-bdab-a15d0de346d0