A three-phase model to keyword detection in Arabic corpora

Driss Namly, Karim Bouzoubaa, Ridouane Tachicart

Abstract


The exponential growth of Arabic text data in recent years has created an urgent demand for sophisticated keyword detection techniques that are specifically tailored to the nuances of the Arabic language. This study addresses the critical need for efficient tools capable of swiftly and accurately identifying keywords within a collection of Arabic documents, particularly when analyzing multiple documents in a corpus. To meet this challenge, we present a novel corpus specifically designed for keyword detection in Arabic texts, along with an innovative approach that integrates three distinct candidate keyword lists: a frequency-based list, a vector space model list, and a machine learning-based list. This hybrid methodology leverages the strengths of each technique, enabling a more comprehensive and effective keyword identification process. We conducted extensive experimental validation to assess the performance and computational efficiency of our proposed pipeline. The results demonstrate that our approach consistently achieves robust performance across a variety of domains, with evaluation metrics indicating F1-scores that consistently surpass 91%. Overall, this study contributes to the advancement of automated keyword detection in Arabic, paving the way for enhanced information retrieval and text analysis capabilities.

Keywords


Arabic language; Information retrieval; Keyword detection; Machine learning; Natural language processing

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v37.i1.pp206-213

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics