Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Farza Nurifan; Riyanarto Sarno; Cahyaningtyas Sekar Wahyuni

doi:10.11591/ijeecs.v12.i3.pp1239-1246

Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Farza Nurifan, Riyanarto Sarno, Cahyaningtyas Sekar Wahyuni

Abstract

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, we measure the sentence similarity with the corpora using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, we use Lesk algorithms and Wu Palmer similarity to deal with problems when there is no word from a sentence in the corpora (we call it as semantic similarity). The results of our research show an 86.94% accuracy rate and the semantic similarity improve the accuracy rate by 12.96% in determining the meaning of ambiguous words.

Keywords

Word Sense Disambiguation; Word2vec; Wikipedia; Lesk; Wu Palmer

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v12.i3.pp1239-1246

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

IJEECS visitor statistics

Username
Password
Remember me