Twitter data analysis using hadoop ecosystems and apache zeppelin

Stanly Wilson, Sivakumar R

Abstract


The day-to-day life of the people doesn't depend only on what they think, but it is affected and influenced by what others think. The advertisements and campaigns of the favourite celebrities and mesmerizing personalities influence the way people think and see the world. People get the news and information at lightning speed than ever before. The growth of textual data on the internet is very fast. People express themselves in various ways on the web every minute. They make use of various platforms to share their views and opinions. A huge amount of data is being generated at every moment on this process. Being one of the most important and well-known social media of the present time, millions of tweets are posted on Twitter every day. These tweets are a source of very important information and it can be made use for business, small industries, creating government policies, and various studies can be performed by using it. This paper focuses on the location from where the tweets are posted and the language in which the tweets are written. These details can be effectively analysed by using Hadoop. Hadoop is a tool that is used to analyze distributed big data, streaming data, timestamp data and text data. With the help of Apache Flume, the tweets can be collected from Twitter and then sink in the HDFS (Hadoop Distributed File System). These raw data then analyzed using Apache Pig and the information available can be made use for social and commercial purposes. The result will be visualized using Apache Zeppelin.

Keywords


Flume, JSON, Hadoop, HDFS, Pig, Twitter, Tweets, Zeppelin

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v16.i3.pp1490-1498
Total views : 23 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

shopify stats IJEECS visitor statistics