Word embedding for contextual similarity using cosine similarity

Yessy Asri, Dwina Kuswardani, Amanda Atika Sari, Atikah Rifdah Ansyari

Abstract


Perspectives on technology often have similarities in certain contexts, such as information systems and informatics engineering. The source of opinion data comes from the Quora application, with a retrieval limit of the last 5 years. This research aims to implement Indo-bidirectional encoder representations from transformers (BERT), a variant of the BERT model optimized for Indonesian language, in the context of information system (IS) and information technology (IT) topic classification with 414 original data, which, after being augmented using the synonym replacement method, The generated data becomes 828. Data augmentation aims to evaluate the performance of models by using synonyms and rearranging text while maintaining meaning and structure. The approach used is to label the opinion text based on the cosine similarity calculation of the embedding token from the IndoBERT model. Then, the IndoBERT model is applied to classify the reviews. The experimental results show that the approach of using IndoBERT to classify SI and IT topics based on contextual similarity achieves 90% accuracy based on the confusion matrix. These positive results show the great potential of using transformer-based language models, such as IndoBERT, to support the analysis of comments and related topics in Indonesian.

Keywords


Augmented data; Contextual similarity; Cosine similarity; IndoBERT; Word embedding

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v38.i2.pp1170-1180

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics