Word embedding and imbalanced learning impact on Indonesian Quran ontology population

Fandy Setyo Utomo, Yuli Purwati, Mohd Sanusi Azmi, Lulu Shafira, Nikmah Trinarsih

Abstract


This research addresses limitations in Quranic instance classification, exceptionally high dimensionality, lack of semantic relationships in the term frequency-inverse document frequency (TF-IDF) technique, and imbalanced data distribution, which reduce prediction accuracy for minority classes. This study investigates the impact of word embedding and imbalance learning techniques on instance classification frameworks using Indonesian Quran translation and Tafsir datasets to handle previous research limitations. Four classification frameworks were built and evaluated using accuracy and hamming loss metrics. The results show that the synthetic minority oversampling technique (SMOTE) technique, TF-IDF model, and logistic regression classifier provide the best accuracy results of 62.74% and a hamming loss score of 0.3726 on the Quraish Shihab Tafsir dataset. This is better than the performance of previous classifiers backpropagation neural network (BPNN) and support vector machine (SVM) used in the previous framework, with accuracies of 59.91% and 62.26%, respectively. Logistic regression can also provide the best classification results with an accuracy of 67.92% and a hamming loss of 0.3208 using the previous framework. These results are better than the performance of the previous classifiers BPNN and SVM used in the previous framework, with accuracies of 62.26% and 66.98%, respectively. TF-IDF feature extraction outperforms word2vec in instance classification results due to its superior support under limited dataset conditions.

Keywords


Indonesian Quran interpretation ; Machine learning; Ontology learning; Smote; Word2vec

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v39.i1.pp603-613

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

shopify stats IJEECS visitor statistics