Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data

Nur Hanisah Abdul Malek, Wan Fairos Wan Yaacob, Yap Bee Wah, Syerina Azlin Md Nasir, Norshahida Shaadan, Sapto Wahyu Indratno

Abstract


Training an imbalanced dataset can cause classifiers to overfit the majority class and increase the possibility of information loss for the minority class. Moreover, accuracy may not give a clear picture of the classifier’s performance. This paper utilized decision tree (DT), support vector machine (SVM), artificial neural networks (ANN), K-nearest neighbors (KNN) and Naïve Bayes (NB) besides ensemble models like random forest (RF) and gradient boosting (GB), which use bagging and boosting methods, three sampling approaches and seven performance metrics to investigate the effect of class imbalance on water quality data. Based on the results, the best model was gradient boosting without resampling for almost all metrics except balanced accuracy, sensitivity and area under the curve (AUC), followed by random forest model without resampling in term of specificity, precision and AUC. However, in term of balanced accuracy and sensitivity, the highest performance was achieved by random forest with a random under-sampling dataset. Focusing on each performance metric separately, the results showed that for specificity and precision, it is better not to preprocess all the ensemble classifiers. Nevertheless, the results for balanced accuracy and sensitivity showed improvement for both ensemble classifiers when using all the resampled dataset.

Keywords


Bagging; Boosting; Ensemble methods; Hybrid sampling; Imbalanced data

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v29.i1.pp598-608

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics