Statistical comparison of MLP and LSTM for mobile health sentiment analysis
Abstract
This study investigates user sentiment towards the Mobile JKN public health application by applying text classification models based on deep learning. Two approaches were compared: a multi-layer perceptron (MLP) with TF IDF features and long short-term memory (LSTM) with Word2Vec embeddings. The dataset consists of 114,364 Indonesian-language user reviews collected from the Google Play Store. To address class imbalance, we applied random oversampling. Each model was evaluated using 5-fold stratified shuffle split cross-validation. The results showed that MLP models achieved higher accuracy (up to 83.90%), while LSTM models demonstrated better recall and precision on minority classes such as neutral sentiment. However, statistical validation using the Wilcoxon signed-rank test revealed that the performance differences between models were not statistically significant (p > 0.05). These findings suggest that both models are viable for sentiment analysis, with trade-offs depending on the evaluation metric of interest. Future work may explore hybrid architecture and larger datasets for improved performance and statistical confidence.
Keywords
Long short-term memory; Mobile app; Multi-layer perceptron; Sentiment analysis; Wilcoxon test
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v42.i3.pp818-826
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).