Arabic authorship attribution on Twitter: what is really matters?
Abstract
Recently, authorship attribution (AA) of online social networks texts has gained more attention. However, since 2015, when the first work that addressed the AA of Arabic tweets was published, we found that nothing much has been done after that. Thus, the current paper presents an extensive study that investigates the effects of various factors on the AA of Arabic short-texts, especially tweets. This led to a proposed architecture in which the AA accuracy is examined depending on the size of the training dataset, the number of classes covered, the text processing techniques applied, the methods used for both feature selection and extraction, and finally, the classifier implemented. As a result, we performed 792 different tests. The highest accuracy recorded is 97.4%, and it is among the best results published so far.
Keywords
Arabic tweets; Authorship attribution; Bag-of-n-grams; Feature extraction; Stylometric features
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v28.i3.pp1730-1737
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).