Enhanced query performance for stored streaming data through structured streaming within spark SQL

Benymol Jose, Rajesh N., Lumy Joseph

Abstract


Traditional database systems like relational databases can store data which are structured with predefined schema, but in the case of bigdata, the data comes in different formats or are collected from diverse sources. The distributed databases like not only spark querying language (NoSQL) repositories are often used in relation to bigdata analytics, but a continual updating is required in business because of the streaming data that comes from stock trading, online activities of website visitors, and from the mobile applications in real time. It will not have to delay, for some report to show up, to assess and analyse the current situation, to move forward with the next business choice. Apache Spark’s structured streaming offer capabilities for handling streaming data in a batch processing mode with faster responses compared to MongoDB which is a document-based NoSQL database. This study completes similar queries to evaluate Spark SQL and NoSQL database performance, focusing on the upsides of Spark SQL over NoSQL databases in streaming data exploration. The queries are completed with streaming data stored in a batch mode.

Keywords


Bigdata; MongoDB; NoSQL databases; Spark SQL; Streaming data

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v35.i3.pp1744-1750

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics