A relational background knowledge boosting based topic model for Chinese poems

Lei Peng, Paitoon Porntrakoon

Abstract


Classical Chinese poetry has been increasingly popular in recent years, and modeling its topic is quite a promising area of research. Chinese poems have the characteristic of short in length, but traditional topic models perform poorly when faced with short texts due to the text sparsity. Therefore, topic model should be improved to satisfy the scenario of classical Chinese poems. In this paper, a relational background knowledge boosting based topic model (RBKBTM) was proposed to overcome the text sparsity of Chinese poems. We incorporated background information into the model, which expanded the text content from the semantic perspective. The background knowledge was combined using word embedding and TextRank and was then fed into the core computing process. Subsequently, a new sampling formula was derived. Our proposed model was tested on three different tasks using three different datasets. The results demonstrate that the incorporated background knowledge can effectively overcomes text sparsity, improving the performance and effectiveness of the topic model.

Keywords


Gibbs sampling; Latent dirichlet allocation; Short text; TextRank; Topic model

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v35.i2.pp1227-1243

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics