Improving the term weighting log entropy of latent dirichlet allocation
Abstract
The process of analyzing textual data involves the utilization of topic modeling techniques to uncover latent subjects within documents. The presence of numerous short texts in the Indonesian language poses additional challenges in the field of topic modeling. This study presents a substantial enhancement to the term weighting log entropy (TWLE) approach within the latent dirichlet allocation (LDA) framework, specifically tailored for topic modeling of Indonesian short texts. This work places significant emphasis on the utilization of LDA for word weighting. The research endeavor aimed to enhance the coherence and interpretability of an Indonesian topic model through the integration of local and global weights. Local Weight focuses on the distinct characteristics of each document, whereas global weight examines the broader perspective of the entire corpus of documents. The objective was to enhance the effectiveness of LDA themes by this amalgamation. The TWLE model of LDA was found to be more informative and effective than the TF-IDF LDA when compared with short Indonesian text. This work improves topic modeling in brief Indonesian compositions. Transfer learning for NLP and Indonesian language adaptation helps improve subject analysis knowledge and precision, this could boost NLP and topic modeling in Indonesian.
Keywords
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v34.i1.pp455-462
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).