An effective imputation scheme for handling missing values in the heterogeneous dataset

Sowmya Venkatesh, Maragal Venkatamuni Vijay Kumar, Ashoka Davanageri Virupakshappa

Abstract


A high level of data quality has always been a concern for many applications based on machine learning, including clinical decision support systems, weather forecasting, traffic predictions, and many others. A very limited amount of work is devoted to exploiting the missing values for effective imputation and better prediction. This paper introduces a unique approach to predicting and imputing missing data fields in the multivariate dataset such as numerical, categorical, and unstructured. The proposed imputation method is a multi-model scheme based on the joint approach of natural language processing (NLP) encoders, machine learning-driven feature extractors, and a sequential regression imputation technique to predict missing values.
The proposed system is robust and scalable without requiring extensive engineering. The validation of the model is done on the benchmarked clinical dataset of heart disease obtained from UCI. The results show that the proposed methods achieve better imputation accuracy and require significantly less time than other missing data imputation methods.

Keywords


Clinical data; Heterogenous data; Imputation; Missing values; Natural language processing

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v32.i1.pp423-431

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics