BanSpEmo: a Bangla audio dataset for speech emotion recognition and its baseline evaluation

Babe Sultana, Md Gulzar Hussain, Mahmuda Rahman

Abstract


Speech interfaces provide a natural and comfortable way for humans to communicate with machines. Recognizing emotions from acoustic signals is essential in audio and speech processing. Detection of emotion in speech is critical to the next generation of human-computer interaction (HCI) fields. However, a lack of large-scale datasets has hampered the progress of relevant research. In this study, we prepare BANSpEmo, a demanding Bangla speech emotion dataset consisting of 792 audio recordings totaling more than 1 hour and 23 minutes. The recordings feature 22 native speakers and each speaker uttered two sets of sentences representing six emotions: disgust, happiness, anger, sadness, surprise, and fear. The dataset consists of 12 Bangla sentences, each expressed in these six emotions. Furthermore, a series of investigations are carried out to assess the baseline performance of the support vector machine (SVM), logistic regression (LR), and multinomial Naive Bayes models on the BANSpEmo dataset presented in this study. The studies found that SVM performed best on this dataset, with an accuracy of 87.18%.

Keywords


Audio dataset; Bangla SER; Emotion classification; Machine learning; Speech emotion

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v37.i3.pp2044-2057

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics