Present and absent keyphrases extraction: an approach based on sentence embedding
Abstract
The automatic keyphrases extraction (AKE) of a document is any expression by which we can learn its content without having to read it. Keyphrases are exploited in natural language processing (NLP) applications. These phrases are often mentioned in the document but there may be some keyphrases that are not mentioned. In the field of AKE, researchers have exploited many techniques, such as statistical calculation, deep learning algorithms, graph representation, and sentence embedding techniques. Approaches that exploit embedding techniques calculate the similarity between a document and a candidate keyphrase, where similar phrases to the document are considered as keyphrases. Representing the document by a single vector makes its performance poor, especially in long documents. This is in addition to the inability of these methods to generate absent keyphrases. In order to overcome these problems, our paper proposes an unsupervised approach to AKE, based on the universal sentence encoder (USE) to represent candidate keyphrases and parts of the document probably containing keyphrases. Our method also generates keyphrases not mentioned in the text. We compared the performance of the proposed approach with other methods based on embedding techniques, where the results showed the superiority of our approach especially in long documents.
Keywords
Automatic keyphrase extraction; Generate absent keyphrases; Natural language processing; Sentence embedding technique; Universal sentence encoder
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v28.i3.pp1601-1612
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).