Knowledge discovery from gene expression dataset using bagging lasso decision tree

Umu Sa'adah, Masithoh Yessi Rochayani, Ani Budi Astuti


Classifying high-dimensional data are a challenging task in data mining. Gene expression data is a type of high-dimensional data that has thousands of features. The study was proposing a method to extract knowledge from high-dimensional gene expression data by selecting features and classifying. Lasso was used for selecting features and the classification and regression tree (CART) algorithm was used to construct the decision tree model. To examine the stability of the lasso decision tree, we performed bootstrap aggregating (Bagging) with 50 replications. The gene expression data used was an ovarian tumor dataset that has 1,545 observations, 10,935 gene features, and binary class. The findings of this research showed that the lasso decision tree could produce an interpretable model that theoretically correct and had an accuracy of 89.32%. Meanwhile, the model obtained from the majority vote gave an accuracy of 90.29% which showed an increase in accuracy of 1% from the single lasso decision tree model. The slightly increasing accuracy shows that the lasso decision tree classifier is stable.


Bagging; Decision tree; Feature selection; Gene expression; High-dimensional

Full Text:




  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics