Unveiling the influence of back-translation on sentiment analysis of Indonesian hotel reviews
Abstract
This study aims to conduct sentiment analysis on hotel reviews in Indonesian using several machine learning classification algorithms, namely multinomial naive bayes (MNB), support vector machine (SVM), and random forest (RF). The back translation method is employed to generate synthetic data variations that are used as additional data variations in building classification models. This research tests three scenarios based on the datasets used: the original dataset, the dataset resulting from back translation, and the combined dataset of both. The experimental results show that the use of combined data yields better results, with the random forest algorithm standing out as the best performer. Back translation significantly improves model evaluation in sentiment analysis for several reasons, including enriching the dataset with new variations, enhancing model robustness, and increasing dataset complexity. However, the differences in the number of word features among scenarios indicate that back translation also significantly influences the dataset's characteristics.
Keywords
Back-translation; Hotel reviews; Machine learning; Sentiment analysis; Translation
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v40.i1.pp271-279
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).