Summarizing twitter posts regarding COVID-19 based on n-grams
Abstract
The COVID-19 pandemic announced by the World Health Organization has disrupted human lives at different scales, including the economy, public health, and people's emotions. Social media databases record huge accumulated information concern this pandemic. Twitter platform is considered one of the most active social media that enable users to tweet in different conversations they are concerned about. The problem arises when tweeters want to search about a specific topic. They can only sort tweets by its recency to understand conversation and not by relevancy. This makes tweeters read through the most tweets to understand what was firstly discussed about the related topic. Some strategies were developed for summarizing tweets but summarizing topics of COVID-19 are still at the beginning. The current research aims to introduce a technique to present a short summary related COVID-19 topics with consuming little time and effort. Thus, summarization task started by clustering topics based on latent dirichlet allocation (LDA) method and K-means clustering and then selected the important sentences to format summarization. The study also compares bigram-based and unigram-based summarization. Different metrics were used to evaluate results and experiments at each stage, and the output of the proposal system was evaluated using ROUGE metrics.
Keywords
COVID-19; Extractive summarization; K-mean clustering; Latent dirichlet allocation; N-grams
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v31.i2.pp1008-1015
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).