A prediction model based machine learning algorithms with feature selection approaches over imbalanced dataset

Alaa Khalaf Hamoud, Mohammed Baqr Mohammed Kamel, Alaa Sahl Gaafar, Ali Salah Alasady, Aqeel Majeed Humadi, Wid Akeel Awadh, Jasim Mohammed Dahr


The educational sector faced many types of research in predicting student performance based on supervised and unsupervised machine learning algorithms. Most students' performance data are imbalanced, where the final classes are not equally represented. Besides the size of the dataset, this problem affects the model's prediction accuracy. In this paper, the Synthetic Minority Oversampling Technique (SMOTE) filter is applied to the dataset to find its effect on the model's accuracy. Four feature selection approaches are applied to find the most correlated attributes that affect the students' performance. The SMOTE filter is examined before and after applying feature selection approaches to measure the model's accuracy with supervised and unsupervised algorithms. Three supervised/unsupervised algorithms are examined based on feature selection approaches to predict the students' performance. The findings show that supervised algorithms (LMT, Simple Logistic, and Random Forest) got high accuracy after applying SMOTE without feature selection. The prediction accuracies of unsupervised algorithms (Canopy, EM, and Farthest First) are enhanced after applying feature selection approaches and SMOTE filter.


Educational data mining; Feature selection ; SMOTE filter; Students’ performance; Supervised Algorithms; Unsupervised Algorithms

Full Text:


DOI: http://doi.org/10.11591/ijeecs.v28.i2.pp1105-1116


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics