HCRF: an improved random forest algorithm based on hierarchical clustering
Abstract
Random forest (RF) selects feature subsets randomly. Useless and redundant features will lower the quality of the selected features and subsequently affect the overall classification accuracy of the RF. This study proposes an improved RF algorithm based on hierarchical clustering (HCRF). The algorithm uses hierarchical clustering algorithms to optimize the feature selection process, by establishing similar feature groups based on the GINI index, and then selecting features from each group proportionally to construct the feature subset. The feature subset is then used to construct a single classifier. This process increases the filtering of feature subsets, reducing the negative impact of useless and redundant features on the model, and improving the model's generalization ability and overall performance. In the experimental verification, ten datasets of different sizes and domains were selected, and the accuracy, precision, recall, F1 score, and running time of HCRF, support vector machine (SVM), RF, classification and regression tree (CART) were compared using 10-fold cross-validation. Combining all the results, the HCRF algorithm showed significant improvements in all evaluation indicators, proving that its performance is superior to the other three classifiers. Therefore, this algorithm has broad application areas and value, and effectively improves the overall performance of the classifier within a lower complexity range.
Keywords
Classification algorithm; Feature selection; Hierarchical clustering; Random forest; Redundant feature
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v38.i1.pp578-586
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).