Enhancing diagonal comprehension with advanced topic modeling technique: DIAG-LDA

Fatima-Zahrae Sifi, Wafae Sabbar, Amal El Mzabi

Abstract


With the speed increase of reviews or other forms of text, natural language has the ability to convey large and complex amounts of information in relatively small communications. This capability is being leveraged by the machine-learning algorithm known as latent dirichlet allocation (LDA), which can be utilized to discover latent topics within documents. LDA can be also used to generate summaries or abstracts from a given set of documents. However, LDA can struggle to identify topics in short documents or in data with high levels of noise. This article will introduce a new method for topic modeling with LDA based on diagonal reading for sentences (DIAG-LDA). Primarily, the features are selected using the TF-IDF algorithm, and the highest relevant features are extracted using the confidence value. Besides, the classification step is executed utilizing the LDA classifier. Ultimately, we evaluate our model using the convolutional neural network algorithm. The experiment results show that DIAG-LDA performs well in identifying features from text data, achieving a 94.4%, and 89.5% in accuracy for the datasets on international economics and the political economy.

Keywords


Convolutional neural network; Diagonal reading; Latent dirichlet allocation; Natural language processing; Topic modeling

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v36.i2.pp1261-1272

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics