Speech Emotion Recognition Using Deep Feedforward Neural Network

Muhammad Fahreza Alghifari, Teddy Surya Gunawan, Mira Kartiwi

Abstract


Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized. The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized.The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.

Keywords


Speech emotion recognition (SER); Deep neural network; Mel-frequency cepstral coefficients (MFCC)

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v10.i2.pp554-561

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics