The impact of feature extraction techniques on the performance of text data classification models

Abdallah Maiti; Abdallah Abarda; Mohamed Hanini

doi:10.11591/ijeecs.v35.i2.pp1041-1052

The impact of feature extraction techniques on the performance of text data classification models

Abdallah Maiti, Abdallah Abarda, Mohamed Hanini

Abstract

Sentiment analysis is a crucial discipline that focuses on the interpretation of feelings and points of view in textual data. Our study aims to assess the impact of different feature extraction methods on the accuracy of opinion research models. Techniques such as bag-of-words (BoW), term frequency-inverse document frequency (TF-IDF), Word2Vec, global vectors (GloVe) and bidirectional encoder representations from transformers (BERT) were used with three machine learning algorithms and three deep learning networks as classifiers. The IMDB movie review dataset was used for evaluation. The results showed that combining BERT with LSTM, CNN and RNN improved performance, achieving an accuracy rate of 94%, precision of 94.14%, recall of 93.27% and an F1 score of 89.33%. These results highlight the significant contribution of ERTB to model performance, outperforming other feature extraction techniques in text classification. The study concludes that the fusion of BERT and LSTM significantly improves model accuracy for opinion retrieval, recommending BERT as the main feature extraction method for optimizing performance in NLP tasks.

Keywords

BERT; Classification; Feature extraction; LSTM; Sentiment analysis; Textual data

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v35.i2.pp1041-1052

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES).

IJEECS visitor statistics

Username
Password
Remember me