Recognizing AlMuezzin and his Maqam using deep learning approach
Abstract
Speech recognition is an important topic in deep learning, especially to Arabic language in an attempt to recognize Arabic speech, due to the difficulty of applying it because of the nature of the Arabic language, its frequent overlap, and the lack of available sources, and some other limitations related to the programming matters. This paper attempts to reduce the gap that exists between speech recognition and the Arabic language and attempts to address it through deep learning. In this paper, the focus is on Call for Prayer (Aladhan: ناذآلا ) as one of the most famous Arabic words, where its form is stable, but it differs in the notes and shape of its sound, which is known as the phonetic Maqam (Maqam: ماقملا يتوصلا ). In this paper, a solution to identify the voice of AlMuezzin ( نذؤملا ), recognize AlMuezzin, and determine the form of the Maqam through VGG-16 model presented. The VGG-16 model examined with 4 extracted features: Chroma feature, LogFbank feature, MFCC feature, and spectral centroids. The best result obtained was with chroma features, where the accuracy of Aladhan recognition reached 96%. On the other hand, the classification of Maqam with the highest accuracy reached of 95% using spectral centroids feature.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v39.i2.pp1360-1372
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).