Predictive analytics on COVID-19 data using Hive based on Hadoop cluster
Abstract
COVID-19 pandemic has received a serious attention from academia, industry and governments to stop the huge number of deaths and economic disruptions around the world. Many techniques have been used to control the spread of the pandemic by understanding its characteristics and behavior. However, because of the large amounts and complex characteristics of COVID-19 data, the querying and analysis of such data using conventional tools have become a challenging task. As a result, powerful and distributed tools are highly required for querying and analyzing this data effectively. In this paper, distributed system using Hive based on Hadoop cluster is used to query and analyze COVID-19 data to obtain meaningful information. Hadoop is employed as a scalable and reliable framework to accommodate such large amounts of data. Hive is used as a data warehouse that run on Hadoop cluster to perform querying and predictive analytics on huge COVID-19 datasets. Several experiments are performed to evaluate the performance of proposed system. Experiments show that the proposed system outperforms relational database management system (RDBMS) in terms of query processing time. Experiments also show that the proposed system has a better efficiency in terms of data load, I/O operation, reading and writing data.
Keywords
Big data; COVID-19; Hadoop framework; Hive; MapReduce
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v31.i2.pp945-956
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).