Improving the term weighting log entropy of latent dirichlet allocation

Muhammad Muhajir, Dedi Rosadi, Danardono Danardono

Abstract


The process of analyzing textual data involves the utilization of topic modeling techniques to uncover latent subjects within documents. The presence of numerous short texts in the Indonesian language poses additional challenges in the field of topic modeling. This study presents a substantial enhancement to the term weighting log entropy (TWLE) approach within the latent dirichlet allocation (LDA) framework, specifically tailored for topic modeling of Indonesian short texts. This work places significant emphasis on the utilization of LDA for word weighting. The research endeavor aimed to enhance the coherence and interpretability of an Indonesian topic model through the integration of local and global weights. Local Weight focuses on the distinct characteristics of each document, whereas global weight examines the broader perspective of the entire corpus of documents. The objective was to enhance the effectiveness of LDA themes by this amalgamation. The TWLE model of LDA was found to be more informative and effective than the TF-IDF LDA when compared with short Indonesian text. This work improves topic modeling in brief Indonesian compositions. Transfer learning for NLP and Indonesian language adaptation helps improve subject analysis knowledge and precision, this could boost NLP and topic modeling in Indonesian.


Keywords


Global weight; Indonesian language; Latent dirichlet allocation; Local weight; Term weighting log entropy; TF-IDF; Topic modelling

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v34.i1.pp455-462

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics