Valuing Semantic Similarity
Abstract
Similarity is a tool widely used in various domains such as DNA sequence analysis, knowledge representation, natural language processing, data mining, information retrieval, and information flow. Computing semantic similarity between two entities is a non-trivial task. There are many ways to define semantic similarity. Some measures have been proposed combining both statistical information and lexical similarity. It is difficult for a measure that performs well in a given domain to be applied with accuracy in another domain. Similarity measure may perform better with one language than another. Word is supposed to be not only similar to itself but also to some of its synonyms in a given context and some words with common roots. Our approach is designed to perform query matching and compute semantic relatedness using word occurrences. It performs better than classical measures like TF-IDF and Cosine. Although it is not a metric, the proposed similarity measure can be used for a wide range of content analysis tasks based on semantic distance and its efficacy has been demonstrated. The measure is not corpus dependent so it can establish directly the semantic relatedness of two entities.
Keywords
Full Text:
PDFRefbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).