Predictive analytics on COVID-19 data using Hive based on Hadoop cluster

Ali Abbood Khaleel, Ali Noori Kareem, Laith Hikmet Mahdi

Abstract


COVID-19 pandemic has received a serious attention from academia, industry and governments to stop the huge number of deaths and economic disruptions around the world. Many techniques have been used to control the spread of the pandemic by understanding its characteristics and behavior. However, because of the large amounts and complex characteristics of COVID-19 data, the querying and analysis of such data using conventional tools have become a challenging task. As a result, powerful and distributed tools are highly required for querying and analyzing this data effectively. In this paper, distributed system using Hive based on Hadoop cluster is used to query and analyze COVID-19 data to obtain meaningful information. Hadoop is employed as a scalable and reliable framework to accommodate such large amounts of data. Hive is used as a data warehouse that run on Hadoop cluster to perform querying and predictive analytics on huge COVID-19 datasets. Several experiments are performed to evaluate the performance of proposed system. Experiments show that the proposed system outperforms relational database management system (RDBMS) in terms of query processing time. Experiments also show that the proposed system has a better efficiency in terms of data load, I/O operation, reading and writing data.

Keywords


Big data; COVID-19; Hadoop framework; Hive; MapReduce

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v31.i2.pp945-956

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics