A Class Skew-Insensitive ACO-Based Decision Tree Algorithm for Imbalanced Data Sets

Muhamad Hasbullah Mohd Razali, rizauddin saian, Bee Wah Yap, Ku Ruhana Ku-Mahamud

Abstract


Ant-Tree-Miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called Hellinger-Ant-Tree-Miner (HATM) which is inspired by Ant Colony Optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio < 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of Hellinger Distance. The statistical significance test shows that HATM has higher mean BACC score than ATM

Keywords


ACO; Decision Tree; Classification; Hellinger Distance; Imbalanced Learning

References


He, H., & Ma, Y. (Eds.). (2013). Imbalanced learning: Foundations, algorithms, and applications. IEEE Press, Wiley.

Cieslak, D. A., Hoens, T. R., Chawla, N. V., & Kegelmeyer, W. P. (2012). Hellinger distance decision trees are robust and skew insensitive. Data Mining and Knowledge Discovery, 24(1), 136–158. https://doi.org/10.1007/s10618-011-0222-1

Núñez, H., Gonzalez-Abril, L., & Angulo, C. (2017). Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias. Journal of Classification, 34(3), 427–443. https://doi.org/10.1007/s00357-017-9242-x

Ahn, G., Park, Y.-J., & Hur, S. (2020). A Membership Probability–Based Undersampling Algorithm for Imbalanced Data. Journal of Classification. https://doi.org/10.1007/s00357-019-09359-9

Barril Otero, F., Freitas, A., & Johnson, C. (2012). Inducing decision trees with an ant colony optimization algorithm. Applied Soft Computing, 12, 3615–3626. https://doi.org/10.1016/j.asoc.2012.05.028

Freitas, A., Parpinelli, R., & Lopes, H. (2009). Ant colony algorithms for data classification. Encyclopedia of Information Science and Technology, 1, 154–159.

Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(1), 185–197. https://doi.org/10.1109/TSMCA.2009.2029559

Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving Prediction of the Minority Class in Boosting. In G. Goos, J. Hartmanis, J. van Leeuwen, N. Lavrač, D. Gamberger, L. Todorovski, & H. Blockeel (Eds.), Knowledge Discovery in Databases: PKDD 2003 (Vol. 2838, pp. 107–119). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_12

Haque, M. N., Noman, N., Berretta, R., & Moscato, P. (2016). Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification. PLOS ONE, 11(1), e0146116. https://doi.org/10.1371/journal.pone.0146116

Bian, J., Peng, X., Wang, Y., & Zhang, H. (2016). An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem. https://doi.org/10.1155/2016/8752181

J. Nayak and B. Naik, “A Novel Honey-Bees Mating Optimization Approach with Higher order Neural Network for Classification,” Journal of Classification, vol. 35, no. 3, pp. 511–548, Oct. 2018, doi: 10.1007/s00357-018-9270-1.

Sayed, G. I., Darwish, A., & Hassanien, A. E. (2018). A New Chaotic Whale Optimization Algorithm for Features Selection. Journal of Classification, 35(2), 300–344. https://doi.org/10.1007/s00357-018-9261-2

Jerlin Rubini, L., & Perumal, E. (n.d.). Efficient classification of chronic kidney disease by using multi-kernel support vector machine and fruit fly optimization algorithm. International Journal of Imaging Systems and Technology, 2020. https://doi.org/10.1002/ima.22406

Tahir, M. A. U. H., Asghar, S., Manzoor, A., & Noor, M. A. (2019). A Classification Model For Class Imbalance Dataset Using Genetic Programming. IEEE Access, 7, 71013–71037. https://doi.org/10.1109/ACCESS.2019.2915611

S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, & R. Togneri. (2018). Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3573–3587. https://doi.org/10.1109/TNNLS.2017.2732482

Guramand, S. K., Saedudin, R. D. R., Hassan, R., Kasim, S., Ramlan, R., & Salim, B. W. (2019). Optimized bio-inspired kernels with twin support vector machine using low identity sequences to solve imbalance multiclass classification. Journal of Environmental Biology, 40(3(SI)), 563–576. https://doi.org/10.22438/jeb/40/3(SI)/Sp-21

Yu, H., Ni, J., & Zhao, J. (2013). ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing, 101, 309–318. https://doi.org/10.1016/j.neucom.2012.08.018

Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19(1), 315–354

Fayyad, U., & Irani, K. (1993). Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. https://trs.jpl.nasa.gov/handle/2014/35171

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Japkowicz, N., & Shah, M. (2011, January). Evaluating Learning Algorithms: A Classification Perspective. Cambridge Core. https://doi.org/10.1017/CBO9780511921803




DOI: http://doi.org/10.11591/ijeecs.v21.i1.pp%25p
Total views : 45 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics