Prediction of chronic diseases based on ML packages using spark MLlib
Abstract
Heart disease, diabetes, and breast cancer pose significant global health challenges, and effectively addressing these chronic diseases necessitates a coordinated international effort. The integration of machine learning and predictive analytics offers promising solutions for tackling these issues. Our study presents a unified model that utilizes the random forest (RF) algorithm and SparkMLlib to predict these three diseases, testing the model on three distinct datasets and evaluating its performance using scientific metrics, including the receiver operating characteristic (ROC) curve, accuracy, precision, recall, and F1-score. Furthermore, we aim to investigate whether variations in medical data and contextual factors impact the results. The findings indicate that while the model shows strong overall performance, its effectiveness may differ for each disease due to factors such as data characteristics, disease-specific features, model behavior, and various biological and medical considerations; understanding these factors is essential for improving model performance and ensuring its appropriate use in clinical environments.
Keywords
Apache spark; Breast cancer disease; Chronic diseases; Diabetes disease; Heart disease; Random forest algorithm; SparkMLlib
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v37.i2.pp1121-1129
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).