Graph attention-driven document image classification through DualTune learning

Shilpa Shilpa, Shridevi Soma

Abstract


Document image classification is a challenging task due to the complexity of information contained within documents, including text, images, and their spatial arrangement. Deep learning has become a pivotal tool for extracting and learning complex patterns. However, conventional methods often grapple with integrating different data modalities and minimizing redundancy, leading to a need for more advanced and efficient deep learning strategies. This study presents a new approach to document image classification, named graph attention-driven with dual tune learning (GAD-DTL). GAD-DTL employs dual-tune learning and graph attention networks. The methodology creates semantic region embedding within document images, which incorporate both textual and spatial data. A key feature of this approach is the adaptive fusion layer, which integrates different modalities and uses a graph attention layer to capture context within each region. To minimize redundancy in learned features, we implement two distinct learning techniques, relational and non-relational learning. This approach enhances document image classification by ensuring invariant representation and minimal redundancy in features.

Keywords


Deep learning; Document classification; Document image classification; Dual tune learning; GAD-DTL

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v33.i1.pp278-289

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics