Main keyword comparison based on document analysis system

Jongwon Lee, Jaeseung Lee, Hoekyung Jung

Abstract


Existing document analysis systems list words in the document using a morpheme analyzer. Such a structural feature is difficult to help users to understand the document. To understand a document, you need to analyze the keyword in the document and extract the paragraphs including the keyword. The proposed system retrieves keywords from documents written in XML format, extracts them, and displays them to the user. In addition, it extracts the paragraphs including the keyword entered by the user and maintains paragraph sequence and delete for duplicate paragraphs. Then, the frequency and weight of the keyword are calculated, and the number of paragraphs is reduced by removing the paragraphs including the keyword having a weight less than other keywords weighed. This method may reduce the time and effort required for the user to understand the document as compared to the existing document analysis systems.


Keywords


Deduplication; Document Analysis; Keyword; Paragraph Extraction; Sequence Maintenance

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v19.i3.pp1533-1539

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics