Semantic Similarity for Document Clustering Using Density-based and GloVe Algorithms

Shapol M. Mohammed, Karwan Jacksi, Subhi R. M. Zeebaree

Abstract


Semantic Similarity is the process of identifying relevant data semantically. The traditional way of identifying similarity is by using synonymous keywords depending on syntactic. While the semantic similarity is to find similar data depending on the meaning of words and semantics. Clustering is a concept of grouping items that have the same features and properties as a cluster. In semantic document clustering, documents are clustered using semantic similarity techniques with semantic similarity measures. One of the common techniques to cluster documents is the density-based clustering algorithms using the concept of density of measures of data points. In this paper, a state-of-the-art survey is presented to analyze the density-based algorithms for clustering documents. Furthermore, the similarity and evaluation measures are investigated with the selected algorithms to grasp the common ones. The delivered review revealed that the most used density-based algorithms in document clustering are DBSCAN and DPC. Also, the most powerful similarity measure that has been used with density-based algorithms specifically DBSCAN and DPC is Cosine similarity with F-measure performance evaluation.


Keywords


Glove Word Embedding; Density Based Algorithm; DBSCAN Algorithm; Density Peak Clustering; Similarity Measures; Evaluation Measures



DOI: http://doi.org/10.11591/ijeecs.v22.i1.pp%25p
Total views : 376 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics