Integrating blind source separation and self-supervised learning for Algerian Arabic connected-digit recognition
Abstract
This paper proposes an improvement in Arabic automatic speech recognition (ASR) by combining blind source separation (BSS) with self-supervised acous tic modeling. The study concentrates on the Algerian Arabic connected-digit recognition task and reexamines the classical degenerate unmixing estimation technique (DUET) as a front-end approach for suppressing noise and inter ference. The output of the BSS stage is fed into a Hidden Markov model (HMM) recognizer developed using the HTK toolkit. To contextualize DUET’s performance, it is compared with modern neural separation techniques (Conv TasNet, SepFormer) paired with both traditional and self-supervised ASR back ends (Wav2Vec 2.0 and Whisper). A new corpus of 11,230 utterances from 37 speakers, representing dialectal and gender diversity, was collected. Experimen tal outcomes indicate that DUET enhances word accuracy under stereo mixing conditions; however, neural separation combined with self-supervised ASR re sults in considerably lower word-error rates and stronger robustness in noisy or overlapping-speech scenarios. The study emphasizes practical trade-offs be tween computational cost and accuracy for deploying low-resource Arabic ASR systems.
Keywords
Arabic speech recognition; Blind source separation; Conv-TasNet; DUET; Low-resource ASR; SepFormer; Wav2Vec 2.0
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v42.i1.pp71-80
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).