Fine-Grained Recognition of Fidgety-Type Emotions Using Multi-Scale One-Dimensional Residual Siamese Network

Downloads

Authors

  • Jiu SUN School of Information Technology, Yancheng Institute of Technology, China
  • Junxin ZHU School of Information Technology, Yancheng Institute of Technology, China
  • Jun SHAO School of Information Technology, Yancheng Institute of Technology, China

Abstract

Fidgety speech emotion has important research value, and many deep learning models have played a good role in feature modeling in recent years. In this paper, the problem of practical speech emotion is studied, and the improvement is made on fidgety-type emotion using a novel neural network model. First, we construct a large number of phonological features for modeling emotions. Second, the differences in fidgety speech between various groups of people were studied. Through the distribution of features, the individual features of fidgety emotion were studied. Third, we propose a fine-grained emotion classification method, which analyzes the subtle differences between emotional categories through Siamese neural networks. We propose to use multi-scale residual blocks within the network architecture, and alleviate the vanishing gradient problem. This allows the network to learn more meaningful representations of fidgety speech signal. Finally, the experimental results show that the proposed method can provide the versatility of modeling, and that fidgety emotion is well identified. It has great research value in practical applications.

Keywords:

residual convolutional neural network, multi-scale neural network, fidgety speech emotion, finegrained emotion classification, Siamese neural networks

References

1. Abdeljaber O., Avci O., Kiranyaz S., Gabbouj M., Inman D.J. (2017), Real-time vibration-based structural damage detection using one-dimensional convolutional neural networks, Journal of Sound and Vibration, 388: 154–170, https://doi.org/10.1016/j.jsv.2016.10.043

2. Atila O., Sengür A. (2021), Attention guided 3D CNN-LSTM model for accurate speech based emotion recognition, Applied Acoustics, 182: 108260, https://doi.org/10.1016/j.apacoust.2021.108260

3. Atsavasirilert K. et al. (2019), A light-weight deep convolutional neural network for speech emotion recognition using mel-spectrograms, [in:] Proceedings of 14th International Joint Symposium on Artificial Intelligence and Natural Language Processing (ISAI -NLP), pp. 1–4, https://doi.org/10.1109/isai-nlp48611.2019.9045511

4. Avci O., Abdeljaber O., Kiranyaz S., Hussein M., Inman D.J. (2018), Wireless and real-time structural damage detection: A novel decentralized method for wireless sensor networks, Journal of Sound and Vibration, 424: 158–172, https://doi.org/10.1016/j.jsv.2018.03.008

5. Avci O., Abdeljaber O., Kiranyaz S., Inman D.J. (2019), Convolutional neural networks for real-time and wireless damage detection, Dynamics of Civil Structures, 2: 129–136, https://doi.org/10.1007/978-3-030-12115-0_17

6. Chen Q., Huang G. (2021), A novel dual attention-based BLSTM with hybrid features in speech emotion recognition, Engineering Applications of Artificial Intelligence, 102: 104277, https://doi.org/10.1016/j.engappai.2021.104277

7. Dupuis K., Pichora-Fuller M.K. (2014), Recognition of emotional speech for younger and older talkers, Ear & Hearing, 35(6): 695–707, https://doi.org/10.1097/aud.0000000000000082

8. Huang C., Chen G., Yu H., Bao Y., Zhao L. (2013a), Speech emotion recognition under white noise, Archives of Acoustics, 38(4): 457–463, https://doi.org/10.2478/aoa-2013-0054

9. Huang C., Jin Y., Zhao Y., Yu Y., Zhao L. (2009a), Speech emotion recognition based on re-composition of two-class classifiers, [in:] Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, pp. 1–3, https://doi.org/10.1109/acii.2009.5349420

10. Huang C., Jin Y., Zhao Y., Yu Y., Zhao L. (2009b), Recognition of practical emotion from elicited speech, [in:] Proceedings of the First International Conference on Information Science and Engineering, pp. 1–4, https://doi.org/10.1109/icise.2009.875

11. Huang C., Liang R., Wang Q., Xi J., Zha C., Zhao L. (2013b), Practical speech emotion recognition based on online learning: From acted data to elicited data, Mathematical Problems in Engineering, 2013: 265819, https://doi.org/10.1155/2013/265819

12. Huang C., Song B., Zhao L. (2016), Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering, International Journal of Speech Technology, 19(4): 805–816, https://doi.org/10.1007/s10772-016-9371-3

13. Huang C., Zhao Y., Jin Y., Yu Y., Zhao L. (2011), A study on feature analysis and recognition of practical speech emotion, Journal of Electronics & Information Technology, 33(1): 112–116, https://doi.org/10.3724/sp.j.1146.2009.00886

14. Jin Y., Huang C., Zhao L. (2011), A semi-supervised learning algorithm based on modified self-training SVM, Journal of Computers, 6(7): 1438–1443, https://doi.org/10.4304/jcp.6.7.1438-1443

15. Jin Y., Song P., Zheng W., Zhao L. (2014), A feature selection and feature fusion combination method for speaker-independent speech emotion recognition, [in:] Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4808–4812, https://doi.org/10.1109/icassp.2014.6854515

16. Jin Y., Zhao Y., Huang C., Zhao L. (2009), Study on the emotion recognition of whispered speech, [in:] Proceedings of 2009 WRI Global Congress on Intelligent Systems, pp. 242–246, https://doi.org/10.1109/gcis.2009.175

17. Kiranyaz S., Gastli A., Ben-Brahim L., Al-Emadi N., Gabbouj M. (2019), Real-time fault detection and identification for MMC using 1-D convolutional neural networks, IEEE Transactions on Industrial Electronics, 66(11): 8760–8771, https://doi.org/10.1109/tie.2018.2833045

18. Latif S., Shahid A., Qadir J. (2023), Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation, Applied Acoustics, 210: 109425, https://doi.org/10.1016/j.apacoust.2023.109425

19. Lieskovská E., Jakubec M., Jarina R., Chmulík M. (2021), A review on speech emotion recognition using deep learning and attention mechanism, Electronics, 10(10): 1163, https://doi.org/10.3390/electronics10101163

20. Praseetha V.M., Vadivel S. (2018), Deep learning models for speech emotion recognition, Journal of Computer Science, 14(11): 1577–1587, https://doi.org/10.3844/jcssp.2018.1577.1587

21. Wang Z.Q., Tashev I. (2017), Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks, [in:] Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5150–5154, https://doi.org/10.1109/icassp.2017.7953138

22. Wu C., Huang C., Chen H. (2018), Text-independent speech emotion recognition using frequency adaptive features, Multimedia Tools and Applications, 77(18): 24353–24363, https://doi.org/10.1007/s11042-018-5742-x

23. Xiong Z., Stiles M., Zhao J. (2017), Robust ECG signal classification for the detection of atrial fibrillation using novel neural networks, [in:] Proceedings of 2017 Computing in Cardiology Conference (CinC), 44, https://doi.org/10.22489/cinc.2017.066-138

24. Xu X., Huang C., Wu C., Wang Q., Zhao L. (2014), Graph learning based speaker independent speech emotion recognition, Advances in Electrical and Computer Engineering, 14(2): 17–22, https://doi.org/10.4316/aece.2014.02003

25. Yan J., Wang X., Gu W., Ma L. (2013), Speech emotion recognition based on sparse representation, Archives of Acoustics, 38(4): 465–470, https://doi.org/10.2478/aoa-2013-0055

26. Zhou Q. et al. (2021), Cough recognition based on Mel-spectrogram and convolutional neural network, Frontiers in Robotics and AI, 8: 1–7, https://doi.org/10.3389/frobt.2021.580080

27. Zou C., Huang C., Han D., Zhao L. (2011), Detecting practical speech emotion in a cognitive task, [in:] Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN), pp. 1–5, https://doi.org/10.1109/icccn.2011.6005883