Plagiarism detection using text-representing centroids techniques
Abstract
This study addresses the limitations of traditional plagiarism detection methods by introducing the text-representing centroid (TRC) technique. TRC is designed to improve the accuracy of detecting semantic similarities and sophisticated forms of plagiarism. It utilizes a co-occurrence graph to identify centroid terms that represent the core meaning of text documents, effectively capturing the contextual associations between terms. Extensive experiments were conducted on a dataset of academic papers to assess TRC’s performance against traditional techniques across various categories of plagiarism, including near-copy, modified-copy, and paraphrasing. The results demonstrate the effectiveness of the TRC technique, achieving an average precision of 0.96 and a recall of 0.71. This performance surpasses methods such as Jaccard and Cosine similarity in accurately detecting more, complex forms of plagiarism. These findings highlight TRC’s potential as a robust tool for both academic and industry applications, helping to ensure integrity in textual content through precise and comprehensive plagiarism detection.
Keywords
Co-occurrence graph; Plagiarism detection; Text representing centroid; Text similarity; Text-based representation
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v38.i3.pp1722-1734
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).