Fuzzy-based voiced-unvoiced segmentation for emotion recognition using spectral feature fusions

Yusnita Mohd Ali, Alhan Farhanah Abd Rahim, Emilia Noorsal, Zuhaila Mat Yassin, Nor Fadzilah Mokhtar, Mohamad Helmy Ramlan

Abstract


Despite abundant growth in automatic emotion recognition system (ERS) studies using various techniques in feature extractions and classifiers, scarce sources found to improve the system via pre-processing techniques. This paper proposed a smart pre-processing stage using fuzzy logic inference system (FIS) based on Mamdani engine and simple time-based features i.e. zero-crossing rate (ZCR) and short-time energy (STE) to initially identify a frame as voiced (V) or unvoiced (UV). Mel-frequency cepstral coefficients (MFCC) and linear prediction coefficients (LPC) were tested with K-nearest neighbours (KNN) classifiers to evaluate the proposed FIS V-UV segmentation. We also introduced two feature fusions of MFCC and LPC with formants to obtain better performance. Experimental results of the proposed system surpassed the conventional ERS which yielded a rise in accuracy rate from 3.7% to 9.0%. The fusion of LPC and formants named as SFF LPC-fmnt indicated a promising result between 1.3% and 5.1% higher accuracy rate than its baseline features in classifying between neutral, angry, happy and sad emotions. The best accuracy rates yielded for male and female speakers were 79.1% and 79.9% respectively using SFF MFCC-fmnt fusion technique.

Keywords


Short-time energy; Zero-crossing rate; Fuzzy logic; Mel-frequency cepstral coefficients; Linear prediction coefficients; Emotion recognition

References


D. G. Myers, Theories of emotion vol. 500, 2004.

J. R. Zhuang, Y. J. Guan, H. Nagayoshi, L. Yuge, H. H. Lee, and E. Tanaka, "Two-Dimensional Emotion Evaluation with Multiple Physiological Signals," in Advances in Affective and Pleasurable Design. AHFE 2018. Advances in Intelligent Systems and Computing. vol. 774, F. S., Ed.: Springer, Cham, 2019.

C. Li, N. Ye, H. Huang, R. Wang, and R. Malekian, "Emotion recognition of human physiological signals based on recursive quantitative analysis," in 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), 2018, pp. 217-223.

B. M. Ghandi, R. Nagarajan, and H. Desa, "Real-time system for facial emotion detection using GPSO algorithm," in 2010 IEEE Symposium on Industrial Electronics and Applications (ISIEA), 2010, pp. 40-45.

F. Z. Salmam, A. Madani, and M. Kissi, "Emotion recognition from facial expression based on fiducial points detection and using neural network," International Journal of Electrical and Computer Engineering, vol. 8, p. 52, 2018.

P. Heracleous, K. Yasuda, F. Sugaya, A. Yoneyama, and M. Hashimoto, "Speech emotion recognition in noisy and reverberant environments," in 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) San Antonio, TX, USA: IEEE, 2018, pp. 262-266.

L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, and M. A. Mahjoub, "A review on speech emotion recognition: Case of pedagogical interaction in classroom," in 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Fez, Morocco, 2017, pp. 1-7.

S. T. Saste and S. M. Jagdale, "Emotion recognition from speech using MFCC and DWT for security system," in 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2017, pp. 701-704.

N. Garcia, J. C. Vasquez-Corre, J. D. Arias-Londono, J. F. Vargas-Bonilla, and J. R. Orozco-Arroyave, "Automatic emotion recognition in compressed speech using acoustic and non-linear features," in 2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA), Bogota, Colombia, 2015, pp. 1-7.

L. He, "Stress and emotion recognition in natural speech in the work and family environments," in School of Electrical and Computer Engineering Science, Engineering and Technology Portfolio. vol. Ph.D Thesis: Royal Melbourne Institute of Technology (RMIT) University, November, 2010, p. 218.

M. El Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases," Pattern Recognition, vol. 44, pp. 572-587, 2011.

S. Furui, "Fifty years of progress in speech and speaker recognition," ECTI Transaction on Computer and Information Technology, vol. 1, pp. 64-74, November 2005.

M. Ghai, S. Lal, S. Duggal, and S. Manik, "Emotion recognition on speech signals using machine learning," in 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), Chirala, India, 2017, pp. 34-39.

Z. Wang and I. Tashev, "Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks," in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) New Orleans, LA, USA: IEEE, 2017, pp. 5150-5154.

S. R. Bandela and T. K. Kumar, "Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC," in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT) Delhi, India: IEEE, 2017, pp. 1-5.

S. R. Bandela and T. K. Kumar, "Emotion Recognition of Stressed Speech Using Teager Energy and Linear Prediction Features," in 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT), Mumbai, India, 2018, pp. 422-425.

S. Renjith and K. G. Manju, "Speech based emotion recognition in Tamil and Telugu using LPCC and hurst parameters: A comparitive study using KNN and ANN classifiers," in 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT), Kollam, India, 2017, pp. 1-6.

H. Jiang, B. Hu, Z. Liu, L. Yan, T. Wang, F. Liu, H. Kang, and X. Li, "Investigation of different speech types and emotions for detecting depression using different classifiers," Speech Communication, vol. 90, pp. 39-46, 2017.

A. Khalil, W. Al-Khatib, E. El-Alfy, and L. Cheded, "Anger Detection in Arabic Speech Dialogs," in 2018 International Conference on Computing Sciences and Engineering (ICCSE), Kuwait City, Kuwait, 2018, pp. 1-6.

Y. Li, K. Mueller, J. D. Contreras, and L. J. Salazar, "Classification of voices that elicit soothing effect by applying a voiced vs. unvoiced feature engineering strategy," in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 2489-2493.

M. Hric, M. Chmulík, I. Guoth, and R. Jarina, "SVM based speaker emotion recognition in continuous scale," in 2015 25th International Conference Radioelektronika (RADIOELEKTRONIKA), 2015, pp. 339-342.

S. R. Livingstone, K. Peck, and F. A. Russo, "RAVDESS: The Ryerson Audio-Visual Database of Emotional Speech and Song," in 22nd Annual Meeting of the Canadian Society for Brain, Behaviour and Cognitive Science (CSBBCS), Kingston, 2012.

L. A. Zadeh, "Outline of a New Approach to the Analysis of Complex Systems and Decision Processes," Systems, Man and Cybernetics, IEEE Transactions on, vol. SMC-3, pp. 28-44, 1973.

A. Amindoust, S. Ahmed, A. Saghafinia, and A. Bahreininejad, "Sustainable supplier selection: A ranking model based on fuzzy inference system," Applied Soft Computing, vol. 12, pp. 1668-1677, 2012.

B. S. Atal and L. R. Rabiner, "Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-24, pp. 201-212, 1976.

M. Malcangi, "Softcomputing approach to segmentation of speech in phonetic units," International Journal of Computer and Communications, vol. 3, pp. 41-48, 2009.

M. A. Yusnita, M. P. Paulraj, S. Yaacob, A. B. Shahriman, and A. Saidatul, "Malaysian English accents identification using LPC and formant analysis," in 2011 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), Penang, Malaysia, 2011, pp. 472-476.




DOI: http://doi.org/10.11591/ijeecs.v19.i1.pp%25p
Total views : 7 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics