A two-step feature selection method for quranic text classification

A. Adeleke, N. A. Samsudin, Z. A. Othman, S. K. Ahmad Khalid

Abstract


Feature selection is an integral phase in text classification problems. It is primarily applied in preprocessing text data prior to labeling. However, there exist some limitations with the FS techniques. The filter-based FS techniques have the drawback of lower accuracy performance while the wrapper-based techniques are highly computationally expensive to process. In this paper, a two-step FS method is presented. In the first step, chisquare (CH) filter-based technique is used to reduce the dimensionality of the feature set and then wrapper correlation-based (CFS) technique is employed in the second step to further select most relevant features from the reduced feature set. Specifically, the ultimate aim is to reduce the computational runtime while achieving high classification accuracy. Subsequently, the proposed method was applied in labeling instances of the input data (Quranic verses) using standard classifiers: naïve bayes (NB), support vector machine (SVM), decision trees (J48). The results report the proposed method achieved accuracy result of 93.6% at 4.17secs.

Keywords


Classifier, Feature selection, Holy quran, Text classification

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v16.i2.pp730-736

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics