Exploration of the Best Performance Method of Emotions Classification for Arabic Tweets

Mohammed Abdullah Al-Hagery, Manar Abdullah Al-assaf, Faiza Mohammad Al-kharboush

Abstract


Arab users of social media have significantly increased, thus increasing the opportunities for extracting knowledge from various areas of life such as trade, education, psychological health services, etc. The active Arab presence on Twitter motivates many researchers to classify and analysis Arabic tweets from numerous aspects. This study aimed to explore the best performance scenarios in the classification of emotions conveyed through Arabic tweets. Hence, various experiments were conducted to investigate the effects of feature extraction techniques and the N-gram model on the performance of three supervised Machine Learning (ML) algorithms, which are Support Vector Machine (SVM), Naïve Bayes (NB), and Logistic Regression (LR). The general method of the experiments was based on five steps; data collection, preprocessing, feature extraction, emotion classification, and evaluation of results. To implement these experiments, a real-world Twitter dataset was gathered. The best result achieved by the SVM classifier when using a Bag of Words (BoW) weighting schema (with unigrams and bigrams or with unigrams, bigrams, and trigrams) exceeded the best performance results of other algorithms.


Keywords


Arabic Tweets, Emotion Analysis, Classification, Machine Learning, Feature Extraction, N-Gram

References


D. Spina, A. Zubiaga, A. Sheth, and M. Strohmaier, “Processing social media in real-time,” Inf. Process. Manag., vol. 56, no. 3, pp. 1081–1083, 2019.

J. D. G. Paule, Y. Sun, and Y. Moshfeghi, “On fine-grained geolocalisation of tweets and real-time traffic incident detection,” Inf. Process. Manag., vol. 56, no. 3, pp. 1119–1132, 2019.

M. Hasan, M. A. Orgun, and R. Schwitter, “Real-time event detection from the Twitter data stream using the TwitterNews+ Framework,” Inf. Process. Manag., vol. 56, no. 3, pp. 1146–1165, 2019.

A. Javed, P. Burnap, and O. Rana, “Prediction of drive-by download attacks on Twitter,” Inf. Process. Manag., vol. 56, no. 3, pp. 1133–1145, 2019.

M. Dragoni, M. Federici, and A. Rexha, “An unsupervised aspect extraction strategy for monitoring real-time reviews stream,” Inf. Process. Manag., vol. 56, no. 3, pp. 1103–1118, 2019.

S. Sangam and S. Shinde, “Sentiment classification of social media reviews using an ensemble classifier,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 1, pp. 355–363, 2019.

I. Perikos and I. Hatzilygeroudis, “Recognizing emotions in text using ensemble of classifiers,” Eng. Appl. Artif. Intell., vol. 51, pp. 191–201, 2016.

S. Wilson and R. Sivakumar, “Twitter data analysis using hadoop ecosystems and apache zeppelin,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 3, p. accepted paper, under publiching December, 2019.

H. Becker, D. Iter, M. Naaman, and L. Gravano, “Identifying content for planned events across social media sites,” Proc. fifth ACM Int. Conf. Web search data Min. - WSDM ’12, no. 533, p. 533, 2012.

H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?,” Proc. 19th Int. Conf. World wide web - WWW ’10, p. 591, 2010.

M. Hasan, E. Rundensteiner, and E. Agu, “EMOTEX: Detecting Emotions in Twitter Messages,” ASE BIGDATA/SOCIALCOM/CYBERSECURITY Conf., pp. 27–31, 2014.

A. A. Jihad and A. S. Abdalkafor, “A framework for sentiment analysis in Arabic text,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 3, pp. 1482–1489, 2019.

O. Badarneh, M. Al-Ayyoub, N. Alhindawi, L. A. Tawalbeh, and Y. Jararweh, “Fine-Grained Emotion Analysis of Arabic Tweets: A Multi-target Multi-label Approach,” Proc. - 12th IEEE Int. Conf. Semant. Comput. ICSC 2018, vol. 2018-Janua, no. May, pp. 340–345, 2018.

H. Saif, M. Fernandez, Y. He, and H. Alani, “Evaluation datasets for Twitter sentiment analysis a survey and a new dataset, the STS-Gold,” CEUR Workshop Proc., vol. 1096, no. December, pp. 9–21, 2013.

Statista, “Countries with most Instagram users 2019,” Statista, 2019. [Online]. Available: https://www.statista.com/statistics/578364/countries-with-most-instagram-users/. [Accessed: 07-May-2019].

L. Wikarsa and S. N. Thahir, “A text mining application of emotion classifications of Twitter’s users using Naïve Bayes method,” Proceeding 2015 1st Int. Conf. Wirel. Telemat. ICWT 2015, no. November 2015, 2016.

M. Abdulllah, M. O. Almasawa, I. S. Makki, M. J. Alsolmi, and S. S. Mahrous, “Emotions Classification for Arabic Tweets,” vol. 10, pp. 271–277, 2018.

M. N., I. M., A. H., and H. A., “Opinion Mining and Analysis for Arabic Language,” Int. J. Adv. Comput. Sci. Appl., vol. 5, no. 5, pp. 181–195, 2014.

A. Agarwal and D. Toshniwal, “Application of Lexicon Based Approach in Sentiment Analysis for short Tweets,” Proc. 2018 Int. Conf. Adv. Comput. Commun. Eng. ICACCE 2018, no. June, pp. 189–193, 2018.

A. Agarwal, R. Singh, and D. Toshniwal, “Geospatial sentiment analysis using twitter data for UK-EU referendum,” J. Inf. Optim. Sci., 2018.

A. Agarwal, B. Singh, J. Bedi, and D. Toshniwal, “A Datamining Approach for Emotions Extraction and Discovering Cricketers performance from Stadium to Sensex,” no. September, 2018.

W. Yang and L. Mu, “GIS analysis of depression among Twitter users,” Appl. Geogr., vol. 60, pp. 217–223, 2015.

B. Tran, B. Xue, and M. Zhang, “New Representation in PSO for Discretization-BaAsed Feature Selection,” IEEE TRANSACTIONS ON CYBERNETICS, vol. 48, no. 6. pp. 1733–1746.

N. Tsapatsoulis and C. Djouvas, “Feature extraction for tweet classification: Do the humans perform better?,” Proc. - 12th Int. Work. Semant. Soc. Media Adapt. Pers. SMAP 2017, pp. 53–58, 2017.

A. Stavrianou, C. Brun, T. Silander, and C. Roux, “NLP-based feature extraction for automated tweet classification,” CEUR Workshop Proc., vol. 1202, pp. 145–146, 2014.

S. Badugu and M. Suhasini, “Emotion Detection on Twitter Data using Knowledge Base Approach,” Int. J. Comput. Appl., vol. 162, no. 10, pp. 975–8887, 2017.

S. M. Mohammad and F. Bravo-Marquez, “Emotion Intensities in Tweets,” 2017.

S. Jain and K. Asawa, “EMIA: Emotion Model for Intelligent Agent,” J. Intell. Syst., vol. 24, no. 4, pp. 449–465, 2015.

M. Hasan, E. Rundensteiner, and E. Agu, “Automatic emotion detection in text streams by analyzing Twitter data,” Int. J. Data Sci. Anal., vol. 7, no. 1, pp. 35–51, 2018.

M. A. Azim and M. H. Bhuiyan, “Text to Emotion Extraction Using Supervised Machine Learning Techniques,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 16, no. 3, p. 1394, 2018.

M. Thelwall, “TensiStrength: Stress and relaxation magnitude detection for social media texts,” Inf. Process. Manag., vol. 53, no. 1, pp. 106–121, 2017.

J. Karoui, F. B. Zitoune, and V. Moriceau, “SOUKHRIA: Towards an Irony Detection System for Arabic in Social Media,” Procedia Comput. Sci., vol. 117, pp. 161–168, 2017.

H. M. Abdelaal, A. N. Elmahdy, A. A. Halawa, and H. A. Youness, “Improve the automatic classification accuracy for Arabic tweets using ensemble methods,” J. Electr. Syst. Inf. Technol., no. 2017, pp. 1–8, 2018.

G. Forman, “{http://jmlr.csail.mit.edu/papers/v3/forman03a.htmlAn} Extensive Empirical Study of Feature Selection Metrics for Text Classification,” J. Mach. Learn. Res., vol. 3, pp. 1289–1305, Mar. 2003.

H. Wang, L. Wang, and L. Yi, “Maximum entropy framework used in text classification,” in Proceedings - 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010, 2010, vol. 2, pp. 828–833.

W. J. Long, J. L. Griffith, H. P. Selker, and R. B. D’agostino, “A comparison of logistic regression to decision-tree induction in a medical domain,” Comput. Biomed. Res., vol. 26, no. 1, pp. 74–97, 1993.

S. Dumais, J. Platt, D. Heckerman, and M. Sahami, “Inductive learning algorithms and representations for text categorization,” in Proceedings of the Seventh International Conference on Information and Knowledge Management, 2004, pp. 148–155.

A. Kehagias, V. Petridis, V. G. Kaburlasos, and P. Fragkou, “A comparison of word- and sense-based text categorization using several classification algorithms,” J. Intell. Inf. Syst., vol. 21, no. 3, pp. 227–247, Nov. 2003.

F. Colas and P. Brazdil, “On the Behavior of SVM and Some Older Algorithms in Binary Text Classification Tasks,” in Artificial Intelligence in Theory and Practice, 2006, pp. 45–52.

E.-H. Han, G. Karypis, and V. Kumar, “Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification,” in Advances in Knowledge Discovery and Data Mining, 2007, pp. 53–65.

A. M. Kibriya, E. Frank, B. Pfahringer, and G. Holmes, “Multinomial Naive Bayes for Text Categorization Revisited,” in AI 2004: Advances in Artificial Intelligence, 2011, pp. 488–499.

C. C. Aggarwal and C. X. Zhai, “A survey of text classification algorithms,” in Mining Text Data, vol. 9781461432, C. C. Aggarwal and C. Zhai, Eds. Boston, MA: Springer US, 2012, pp. 163–222.

T. Joachims, “Text categorization with support vector machines: Learning with many relevant features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1998, vol. 1398, pp. 137–142.

M. A. H. Madhfar and M. A. H. Al-Hagery, “Arabic Text Classification: A Comparative Approach Using a Big Dataset,” in 2019 International Conference on Computer and Information Sciences (ICCIS), 2019, pp. 1–5.

R. Xu, T. Chen, Y. Xia, Q. Lu, B. Liu, and X. Wang, “Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification,” Cognit. Comput., vol. 7, no. 2, pp. 226–240, 2015.

“About – Netlytic.org.” .

J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, no. 6. In Journal of Personality and Social Psychology, pp. 1161–1178, 1980.

N. M. Samsudin, C. F. binti Mohd Foozy, N. Alias, P. Shamala, N. F. Othman, and W. I. S. Wan Din, “Youtube spam detection framework using naïve bayes and logistic regression,” Indones. J. Electr. Eng. Comput. Sci., vol. 14, no. 3, p. 1508, 2019.

M. C. Babu and S. Pushpa, “Protecting sensitive information utilizing an efficient association representative rule concealing algorithm for imbalance dataset,” Indones. J. Electr. Eng. Comput. Sci., vol. 15, no. 1, p. 527, 2019.

T. R. S. Mary and S. Sebastian, “Predicting heart ailment in patients with varying number of features using data mining techniques,” Int. J. Informatics Commun. Technol., vol. 8, no. 1, p. 56, 2019.

M. Abdul-Mageed, M. Diab, and S. Kübler, “SAMAR: Subjectivity and sentiment analysis for Arabic social media,” Comput. Speech Lang., vol. 28, no. 1, pp. 20–37, 2014.

K. Bouzoubaa, H. Baidouri, T. Loukili, and T. El Yazidi, “Arabic Stop Words : Towards a Generalisation and Standardisation,” 2009.

I. A. El-khair, “Effects of Stop Words Elimination for Arabic Information Retrieval : A Comparative Study,” Inf. Sci. (Ny)., vol. 4, no. 3, pp. 119–133, 2006.

A. Alajmi, E. M. Saad, and R. R. Darwish, “Toward an ARABIC Stop-Words List Generation,” Int. J. Comput. Appl., vol. 46, no. 8, pp. 975–8887, 2012.

J. D. Prusa, T. M. Khoshgoftaar, and D. J. Dittman, “Impact of Feature Selection Techniques for Tweet Sentiment Classification,” Twenty-Eighth Int. Flairs Conf., pp. 299–304, 2015.

Y. Wang, Z. Zhou, S. Jin, D. Liu, and M. Lu, “Comparisons and Selections of Features and Classifiers for Short Text Classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 261, no. 1, 2017.

J. A. Banados and K. J. Espinosa, “Optimizing Support Vector Machine in classifying sentiments on product brands from Twitter,” in IISA 2014 - 5th International Conference on Information, Intelligence, Systems and Applications, 2014, pp. 75–80.

J. A. Russell, “A circumplex model of affec.” In Journal of Personality and Social Psychology, 1980.




DOI: http://doi.org/10.11591/ijeecs.v19.i2.pp%25p
Total views : 72 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

shopify stats IJEECS visitor statistics