Text document clustering using mayfly optimization algorithm with k-means technique

Ratnam Dodda, Alladi Suresh Babu

Abstract


Text clustering is a subfield of machine learning (ML) and natural language processing (NLP) that consists of grouping similar sentences or documents based on their content. However, insignificant features in the documents minimize the accuracy of information retrieval which makes it challenging for the clustering approach to efficiently cluster similar documents. In this research, the mayfly optimization algorithm (MOA) with a k-means approach is proposed for text document clustering (TDC) to effectively cluster similar documents. Initially, the data is obtained from Reuters-21678, 20-Newsgroup, and BBC sports datasets, and then pre-processing is established by stemming and stop word removal to remove unwanted phrases or words. The data imbalance approach is established using an adaptive synthetic sampling algorithm (ADASYN), then term frequency-inverse document frequency (TD-IDF) and WordNet features are employed for extracting features. Finally, MOA with the K-means technique is utilized for TDC. The proposed approach achieves better accuracy of 99.75%, 99.54%, and 98.24% when compared to the existing techniques like fuzzy rough set-based robust nearest neighbor-convolutional neural network (FRS-RNN-CNN), TopicStriker, Modsup-based frequent itemset, and rider optimization-based moth search algorithm (Modsup-Rn-MSA), hierarchical dirichlet-multinomial mixture, and multi-view clustering via consistent and specific non-negative matrix (MCCS).

Keywords


ADASYN; K-means algorithm; Machine learning; MOA; Text document clustering

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v35.i2.pp1099-1109

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics