Improving time efficiency in big data through progressive sampling-based classification model

Nandita Bangera, Kayarvizhy Kayarvizhy, Shubham Luharuka, Asha S. Manek

Abstract


The proposed system aims to overcome challenges posed by large databases, data imbalance, heterogeneity, and multidimensionality through progressive sampling as a novel classification model. It leverages sampling techniques to enhance processing performance and overcome memory restrictions. The random forest regressor feature importance technique with the gini significance method is employed to identify important characteristics, reducing the data’s features for classification. The system utilizes diverse classifiers such as random forest, ensemble learning, support vector machine (SVM), k-nearest neighbors’ algorithm (KNN), and logistic regression, allowing flexibility in handling different data types and achieving high accuracy in classification tasks. By iteratively applying progressive sampling to the dataset with the best features, the proposed technique aims to significantly improve performance compared to using the entire dataset. This approach focuses computational resources on the most informative subsets of data, reducing time complexity. Results show that the system can achieve over 85% accuracy even with only 5-10% of the original data size, providing accurate predictions while reducing data processing requirements. In conclusion, the proposed system combines progressive sampling, feature selection using random forest regressor feature importance (RFRFI-PS), and a range of classifiers to address challenges in large databases and improve classification accuracy. It demonstrates promising results in accuracy and time complexity reduction.

Keywords


Classification; Feature importance; Progressive sampling; Random forest regressor; Time complexity

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v33.i1.pp248-260

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics