Archives of Acoustics, 41, 4, pp. 669–682, 2016
10.1515/aoa-2016-0064

Laughter Classification Using Deep Rectifier Neural Networks with a Minimal Feature Subset

Gábor GOSZTOLYA
http://www.inf.u-szeged.hu/~ggabor/publlist/publlist.html
MTA-SZTE Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged
Hungary

András BEKE
Research Institute for Linguistics of the Hungarian Academy of Sciences
Hungary

Tilda NEUBERGER
Research Institute for Linguistics of the Hungarian Academy of Sciences
Hungary

László TÓTH
MTA-SZTE Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged
Hungary

Laughter is one of the most important paralinguistic events, and it has specific roles in human conversation. The automatic detection of laughter occurrences in human speech can aid automatic speech recognition systems as well as some paralinguistic tasks such as emotion detection. In this study we apply Deep Neural Networks (DNN) for laughter detection, as this technology is nowadays considered state-of-the-art in similar tasks like phoneme identification. We carry out our experiments using two corpora containing spontaneous speech in two languages (Hungarian and English). Also, as we find it reasonable that not all frequency regions are required for efficient laughter detection, we will perform feature selection to find the sufficient feature subset.
Keywords: speech recognition; speech technology; computational paralinguistics; laughter detection; deep neural networks.
Full Text: PDF

References

Bachorowski, J.-A., Smoski, M. J., and Owren, M. J. (2001). The acoustic features of human laughter. Journal of the Acoustical Society of America, 110(3), 1581-1597.

Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 19, 153-160.

Acoustic analysis of laughter. In Proceedings of ICSLP, pages 927-930, Banff, Canada.

Blomberg, M. and Elenius, K. (1992). Speech recognition using artificial neural networks and dynamic programming. In Proceedings of Fonetik, page 57, Göteborg, Sweden.

Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic, Norwell.

Brendel, M., Zaccarelli, R., and Devillers, L. (2010). A quick sequential forward floating feature selection algorithm for emotion detection from speech. In Proceedings of Interspeech, pages 1157-1160, Makuhari, Japan.

Hierarchical neural networks and enhanced class posteriors for social signal classification. In Proceedings of ASRU, pages 362-367.

The animal nature of spontaneous human laughter. Evolution and Human Behavior, 35(4), 327-335.

Busso, C., Mariooryad, S., Metallinou, A., and Narayanan, S. (2013). Iterative feature normalization scheme for automatic emotion detection from speech. IEEE Transactions on Affective Computing, 4(4), 386-397.

High-light sound effects detection in audio stream. In Proceedings of ICME, pages 37-40.

On the use of nonverbal speech sounds in human communication. In Proceedings of COST Action 2102: Verbal and Nonverbal Communication Behaviours, pages 117-128, Vietri sul Mare, Italy.

Campbell, N., Kashioka, H., and Ohara, R. (2005). No laughing matter. In Proceedings of Interspeech, pages 465-468, Lisbon, Portugal.

Chandrashekar, G. and Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28.

Pattern Recognition, a Statistical Approach. Prentice Hall.

Laughter in interaction. Cambridge University Press, Cambridge, UK.

Deep sparse rectifier networks. In Proceedings of AISTATS, pages 315-323.

Goldstein, J. H. and McGhee, P. E. (1972). The psychology of humor: Theoretical perspectives and empirical issues. Academic Press, New York, USA.

{BEA}: {A} multifunctional {H}ungarian spoken language database. The Phonetician, 105(106), 50-61.

Conflict intensity estimation from speech using greedy forward-backward feature selection. In Proceedings of Interspeech, pages 1339-1343, Dresden, Germany.

On evaluation metrics for social signal detection. In Proceedings of Interspeech, pages 2504-2508, Dresden, Germany.

Gosztolya, G., Busa-Fekete, R., and Tóth, L. (2013). Detecting autism, emotions and social signals using {AdaBoost}. In Proceedings of Interspeech, pages 220-224, Lyon, France.

Gosztolya, G., Grósz, T., Busa-Fekete, R., and Tóth, L. (2014). Detecting the intensity of cognitive and physical load using {AdaBoost} and {Deep Rectifier Neural Networks}. In Proceedings of Interspeech, pages 452-456, Singapore.

A comparison of {Deep Neural Network} training methods for {Large Vocabulary Speech Recognition}. In Proceedings of TSD, pages 36-43, Pilsen, Czech Republic.

Grósz, T., Busa-Fekete, R., Gosztolya, G., and Tóth, L. (2015). Assessing the degree of nativeness and {Parkinson's} condition using {Gaussian Processes} and {Deep Rectifier Neural Networks}. In Proceedings of Interspeech, pages 1339-1343.

What's in a laugh? {H}umour, jokes, and laughter in the conversational corpus of the {BNC}. Ph.D. thesis, Universitat Freiburg.

Gupta, R., Audhkhasi, K., Lee, S., and Narayanan, S. S. (2013). Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In Proceedings of Interspeech, pages 173-177.

Nevetés a társalgásban. In K. Laczkó and S. Tátrai, editors, Elmélet és módszer, pages 105-129. ELTE Eötvös József Collegium, Budapest, Hungary.

A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527-1554.

Having a laugh at work: {H}ow humour contributes to workplace culture. Journal of Pragmatics, 34(12), 1683-1710.

Hudenko, W., Stone, W., and Bachorowski, J.-A. (2009). Laughter differs in children with autism: An acoustic analysis of laughs produced by children with and without the disorder. Journal of Autism and Developmental Disorders, 39(10), 1392-1400.

Laughter detection in meetings. In Proceedings of the NIST Meeting Recognition Workshop at ICASSP, pages 118-121, Montreal, Canada.

Automatic laughter detection using neural networks. In Proceedings of Interspeech, pages 2973-2976, Antwerp, Belgium.

Kovács, Gy. and Tóth, L. (2015). Joint optimization of spectro-temporal features and {Deep Neural Nets} for robust automatic speech recognition. Acta Cybernetica, 22(1), 117-134.

{LAFC}am leveraging affective feedback camcorder. In Proceedings of CHI EA, pages 574-575, Minneapolis, MN, USA.

A characterization of the {Gamma} distribution. Annals of Mathematical Statistics, 26(2), 319-324.

The psychology of humor: An integrative approach. Elsevier, Amsterdam, NL.

Neuberger, T. and Beke, A. (2013a). Automatic laughter detection in {H}ungarian spontaneous speech using {GMM}/{ANN} hybrid method. In Proceedings of SJUSK Conference on Contemporary Speech Habits, pages 1-13.

Automatic laughter detection in spontaneous speech using {GMM}-{SVM} method. In Proceedings of TSD, pages 113-120.

Neuberger, T., Beke, A., and Gósy, M. (2014). Acoustic analysis and automatic detection of laughter in {Hungarian} spontaneous speech. In Proceedings of ISSP, pages 281-284.

Nwokah, E. E., Davies, P., Islam, A., Hsu, H.-C., and Fogel, A. (1993). Vocal affect in three-year-olds: a quantitative acoustic analysis of child laughter. Journal of the Acoustical Society of America, 94(6), 3076-3090.

Rothganger, H., Hauser, G., Cappellini, A. C., and Guidotti, A. (1998). Analysis of laughter and speech sounds in {Italian} and {German} students. Naturwissenschaften, 85(8), 394-402.

Salamin, H., Polychroniou, A., and Vinciarelli, A. (2013). Automatic detection of laughter and fillers in spontaneous mobile phone conversations. In Proceedings of SMC, pages 4282-4287.

Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297-336.

Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A., and Williamson, R. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443-1471.

Valente, and Kim]{compare2013} Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., Salamin, H., Polychroniou, A., Valente, F., and Kim, S. (2013). The {I}nterspeech 2013 {C}omputational {P}aralinguistics {C}hallenge: {S}ocial signals, {C}onflict, {E}motion, {A}utism. In Proceedings of Interspeech.

Feature engineering in context-dependent deep neural networks for conversational speech transcription. In Proceedings of ASRU, pages 24-29.

Building a multimodal laughter database for emotion recognition. In Proceedings of LREC, pages 2347-2350.

Acoustic features of four types of laughter in natural conversational speech. In Proceedings of ICPhS, pages 1958-1961.

Phone recognition with {Deep Sparse Rectifier Neural Networks}. In Proceedings of ICASSP, pages 6985-6989.

Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition. In Proceedings of ICASSP, pages 190-194.

Convolutional deep maxout networks for phone recognition. In Proceedings of Interspeech, pages 1078-1082.

Phone recognition with hierarchical {Convolutional Deep Maxout Networks}. EURASIP Journal on Audio, Speech, and Music Processing, 2015(25), 1-13.

Tóth, L., Gosztolya, G., Vincze, V., Hoffmann, I., Szatlóczki, G., Biró, E., Zsura, F., Pákáski, M., and Kálmán, J. (2015). Automatic detection of mild cognitive impairment from spontaneous speech using {ASR}. In Proceedings of Interspeech, pages 2694-2698, Dresden, Germany.

Paralanguage: A first approximation. Studies in Linguistics, 13}, 1-12.

The typology of paralanguage. Anthropological Linguistics, 3(1), 17-21.

Conventional, biological and environmental factors in speech communication: a modulation theory. Phonetica, 51(1-3), 170-183.

Evidence for demodulation in speech perception. In Proceedings of ICSLP, pages 790-793, Beijing, China.

Truong, K. P. and van Leeuwen, D. A. (2005). Automatic detection of laughter. In Proceedings of Interspeech, pages 485-488, Lisbon, Portugal.

Truong, K. P. and van Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 49(2), 144-158.

Nem verbális hangjelenségek spontán társalgásban. Beszédkutatás, 2011}, 134-148.

Vicsi, K., Sztahó, D., and Kiss, G. (2012). Examination of the sensitivity of acoustic-phonetic parameters of speech to depression. In Proceedings of CogInfoCom, pages 511-515, Kosice, Slovakia.




DOI: 10.1515/aoa-2016-0064

Copyright © Polish Academy of Sciences & Institute of Fundamental Technological Research (IPPT PAN)