Detecting translation borrowings in huge text collections using various methods

Adel Al-janabi; Ehsan Ali Al-Zubaidi; Baqer M. Merzah

doi:10.11591/ijeecs.v30.i3.pp1609-1616

Detecting translation borrowings in huge text collections using various methods

Adel Al-janabi, Ehsan Ali Al-Zubaidi, Baqer M. Merzah

Abstract

The purpose of this work is to investigate the problem of detecting transportable borrowings and text reuse. The article proposes a monolingual solution to this problem: translating the suspicious material into language collections for additional monolingual analysis. One of the major requirements for the suggested technique is robustness against machine learning ambiguities. The next step in the document analysis is split into two parts. The authors begin by retrieving documents-candidates that are similarity to other types of text recurrence. The paper proposes retrieving texts utilizing word clusters formed using distributional semantic for robustness. In the second stage, the authors use deep learning neural networks to compare the suspected document to candidates utilizing phrase embedding. The experimentation is carried out for the language pair “English-Arabic” on both articles and synthetic data.

Keywords

Deep learning; Distributional semantics; Machine translation; Natural language processing; Text borrowings detection

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v30.i3.pp1609-1616

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

IJEECS visitor statistics

Username
Password
Remember me