Convolutional neural network for speech emotion recognition in the Moroccan Arabic dialect language
Abstract
Extracting the speaker's emotional state has become an active research topic lately due to the demand for more human interactive applications. This field of research has noted significant advancement, especially in the English language, owing to the availability of massive speech-labeled corpora. However, the progress of analogous methodologies in the Arabic language is still in its infancy stages. This paper presents a new massive natural speech emotion dataset and a speech recognition model for the Moroccan Arabic language. Four primary emotion labels were selected: happy, sad, angry, and neutral. Various spectral features, such as the mel-frequency cepstral coefficient (MFCC), were extracted and tested to determine the optimal feature combination. A convolutional neural networks (CNNs) model was built and trained on our dataset. The results were compared between spectral features individually and combined with the CNN model resulting in the selection of MFCC, root-mean-square (RMS), mel-scaled spectrogram, and spectral, as optimal spectral features for our dataset. This selection yielded significant results, with an accuracy of 99.55% for emotion recognition, outperforming the existing research.
Keywords
Arabic SER; Convolutional neural networks; Feature extraction; Signal processing; Speech emotion recognition
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v37.i3.pp1588-1595
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).