Recognizing AlMuezzin and his Maqam using deep learning approach

Nahlah Mohammad Shatnawi, Khalid M. O. Nahar, Suhad Al-Issa, Enas Ahmad Alikhashashneh

Abstract


Speech recognition is an important topic in deep learning, especially to Arabic language in an attempt to recognize Arabic speech, due to the difficulty of applying it because of the nature of the Arabic language, its frequent overlap, and the lack of available sources, and some other limitations related to the programming matters. This paper attempts to reduce the gap that exists between speech recognition and the Arabic language and attempts to address it through deep learning. In this paper, the focus is on Call for Prayer (Aladhan: ناذآلا ) as one of the most famous Arabic words, where its form is stable, but it differs in the notes and shape of its sound, which is known as the phonetic Maqam (Maqam: ماقملا  يتوصلا ). In this paper, a solution to identify the voice of AlMuezzin ( نذؤملا ), recognize AlMuezzin, and determine the form of the Maqam through VGG-16 model presented. The VGG-16 model examined with 4 extracted features: Chroma feature, LogFbank feature, MFCC feature, and spectral centroids. The best result obtained was with chroma features, where the accuracy of Aladhan recognition reached 96%. On the other hand, the classification of Maqam with the highest accuracy reached of 95% using spectral centroids feature.


Keywords


Aladhan; AlMuezzin; Arabic Language; Deep Learning; Maqam; Speech recognition; VGG-16;

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v39.i2.pp1360-1372

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

shopify stats IJEECS visitor statistics