Analysis of named-entity effect on text classification of traffic accident data using machine learning
Abstract
With the rising number of accidents in Indonesia, it is still necessary to evaluate and analyze accident data. The categorization of traffic accident data has been developed using word embedding, however additional work is needed to achieve better results. Several informative named entities are frequently sufficient to differentiate whether or not information on a traffic accident exists. Named-entities are informational characteristics that can offer details about a text. The influence of named-entities on thematic text categorization is examined in this paper. The information was collected using a Twitter social media crawl. Preprocessing is done at the beginning of the process to modify and delete useful text as well as label specified entities. On Support Vector Machine (SVM), scheme comparisons were performed for (i) Word Embedding, (ii) the number of occurrences of Named Entities, and (iii) the combination of the two is known as a Hybrid. The Hybrid scheme produced an improvement in classification accuracy of 90.27 percent when compared to Word Embedding scheme and occurrences of named entities scheme, according to tests conducted using 1.885 data consisting of 788 accident data and 1.067 non-accident data.
Keywords
Classification; Machine learning; Named-entity; Social media; Traffic accident analysis;
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v25.i3.pp1672-1678
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).