Natural language processing for report consolidation and matching based on latent semantic analysis and cosine similarity

Jeleen M. Mangubat, Ryndel Ventura Amorado, Lovely Rose T. Hernandez, Jennifer L. Marasigan

Abstract


Consolidation of reports and matching of documents pose several challenges especially when dealing with large amounts of textual data. Thus, organizations are in need of intelligent systems that are capable of automating these processes, ensuring faster, more accurate analysis and retrieval of relevant information. This study applies Latent Semantic Indexing (LSI) and Cosine Similarity to automate the matching of gender related issues, activities, and programs submitted by university offices. An intelligent web-based system was developed using Python and Django to implement these algorithms for report consolidation. Performance evaluation using accuracy, precision, recall, and F1-score demonstrated that the model correctly classified 90% of entries. A threshold sweep experiment further revealed that a similarity value of 0.51 provides the optimal decision boundary for identifying semantically similar instances. The findings confirm that LSI remains effective for low-resource institutional text analysis, enabling more efficient and accurate report consolidation.

Keywords


Cosine similarity; Intelligent system; Latent semantic indexing; Natural language processing; Web system

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v42.i2.pp609-618

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

shopify stats IJEECS visitor statistics