Arabic authorship attribution on Twitter: what is really matters?

Anoual El kah, Aymane El airej, Imad Zeroual

Abstract


Recently, authorship attribution (AA) of online social networks texts has gained more attention. However, since 2015, when the first work that addressed the AA of Arabic tweets was published, we found that nothing much has been done after that. Thus, the current paper presents an extensive study that investigates the effects of various factors on the AA of Arabic short-texts, especially tweets. This led to a proposed architecture in which the AA accuracy is examined depending on the size of the training dataset, the number of classes covered, the text processing techniques applied, the methods used for both feature selection and extraction, and finally, the classifier implemented. As a result, we performed 792 different tests. The highest accuracy recorded is 97.4%, and it is among the best results published so far.

Keywords


Arabic tweets; Authorship attribution; Bag-of-n-grams; Feature extraction; Stylometric features

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v28.i3.pp1730-1737

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics