Arabic authorship attribution on Twitter: what is  really matters?

Anoual El kah; Aymane El airej; Imad Zeroual

doi:10.11591/ijeecs.v28.i3.pp1730-1737

Arabic authorship attribution on Twitter: what is really matters?

Anoual El kah, Aymane El airej, Imad Zeroual

Abstract

Recently, authorship attribution (AA) of online social networks texts has gained more attention. However, since 2015, when the first work that addressed the AA of Arabic tweets was published, we found that nothing much has been done after that. Thus, the current paper presents an extensive study that investigates the effects of various factors on the AA of Arabic short-texts, especially tweets. This led to a proposed architecture in which the AA accuracy is examined depending on the size of the training dataset, the number of classes covered, the text processing techniques applied, the methods used for both feature selection and extraction, and finally, the classifier implemented. As a result, we performed 792 different tests. The highest accuracy recorded is 97.4%, and it is among the best results published so far.

Keywords

Arabic tweets; Authorship attribution; Bag-of-n-grams; Feature extraction; Stylometric features

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v28.i3.pp1730-1737

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

IJEECS visitor statistics

Username
Password
Remember me