Virality classification from Twitter data using pre-trained language model and multi-layer perceptron
Abstract
Twitter is one of the well-known text-based social media that is often used to disseminate content. According to Katadata, Indonesia ranked fifth in the world in 2023. So many people or organizations want to make tweets go viral. Therefore, this research aims to develop a model that uses tweet data from the Indonesian language Twitter social media to categorize the level of virality. There are several tasks in classifying the level of virality, such as upsampling data, predicting sentiment and emotion, and text embedding. Upsampling data was carried out because the dataset used was an imbalanced dataset. Data upsampling, emotions, and text embedding is carried out using the bidirectional encoder representation from transformers (BERT) model. Meanwhile, sentiment prediction uses the Ro-bustly optimized BERT pretraining approach (RoBERTa). The results of text embedding, sentiment, emotion, will be combined with Twitter metadata then all features will be fed into the multi-layer perceptron (MLP) model to classifying the level of virality which is divided into 3 classes based on the number of retweets, namely low, medium and high. The proposed method produces an F1-score of 49% and an accuracy of 95% and performs better than the baseline model.
Keywords
BERT; Pre-trained language model; Twitter; Virality classification; Virality features
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v35.i3.pp1952-1962
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).