Multi-model deep ensemble framework for early diagnosis of rare genetic disorders using genomic, Phenotypic, and EHRdata fusion
Abstract
Rare genetic disorders pose significant challenges in diagnosis because of their low prevalence, heterogeneous manifestations, and lack of readily available datasets. This study systematically assesses various supervised and unsuper vised machine learning methods for the early diagnosis of rare genetic disorders based on a multi-center pediatric dataset of 2,434 anonymized records enriched with demographic, clinical, and laboratory variables. In this study, genomic, phenotypic, and EHR variables were integrated into a unified feature matrix, al lowing all modalities to be jointly analyzed within each machine learning (ML) model. Following rigorous pre-processing steps, including the discard of nonin formative identifiers, imputation and encoding of categorical features, and nor malization of numerical predictors, five classification frameworks were imple mented: logistic regression (LR), random forest (RF), one-dimensional convo lutional neural network (CNN), a hybrid CNN long short-term memory (LSTM) model, and a stacked ensemble of RF and XGBoost. Model performances were evaluated on an independent test set via accuracy, precision, recall, and F1-score metrics. While LR and the CNN baseline achieved F1-scores of 0.9090 and 0.8572, respectively, tree-based models substantially outperformed deep learn ing (DL) models: RF achieved an F1-score of 0.9565, and the CNN+LSTM hybrid achieved 0.9611. RF+XGB ensemble achieved the highest diagnostic accuracy (98.77%) with balanced precision (0.9879) and recall (0.9877), illus trating its superior capacity in capturing complicated, non-linear feature interac tions and fighting against data imbalance. The results illustrate that bagging and boosting algorithms in combination provide a strong and interpretable frame work for efficient pre-screening of rare genetic disorders. The use of these ensemble techniques has the potential to enhance clinical practice by flagging high-risk cases for verification and facilitating early therapeutic intervention.
Keywords
Deep learning; Genetic disorder; Healthcare; Hybrid model; Machine learning
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v42.i1.pp215-224
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).