Adjusted TextRank for keyword extraction in petrochemical project correspondence documents

Indri Atmoko, Evi Yulianti, Meganingrum Arista Jiwanggi

Abstract


A large petrochemical construction project is typically executed by multiple parties, all bound by contract agreement. During the execution phase, issues and problems may arise because the work details are not clearly specified in the contractual agreement. These issues are formally communicated and documented through written correspondence letters. By identifying important keywords within these formal letters, a comprehensive narrative of the project, including its associated issues, can be identified and analyzed. In this research, we introduce an adjusted TextRank algorithm that integrates external features from the Indonesian FastText language model and term frequency-inverse document frequency (TF-IDF) scores to identify important keywords within a dataset of correspondence letters of petrochemical projects. This enhancement involves refining phrase detection, semantic relationship estimation between words, and part-of-speech (POS) identification for words or phrases. Our results show that the proposed adjustments result in improved evaluation scores compared to the baseline standard TextRank and standard TF-IDF, respectively by 24.1% and 25% in terms of F-1 scores.

Keywords


Bahasa Indonesia; Keyword extraction; Phrase detection; Project management; TextRank

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v35.i2.pp1171-1180

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics