Indonesian news classification using convolutional neural network

Muhammad Ali Ramdhani, Dian Sa’adillah Maylawati, Teddy Mantoro

Abstract


Every language has unique characteristics, structures, and grammar. Thus, different styles will have different processes and result in processed in natural language processing (NLP) research area. In the current NLP research area, data mining (DM) or machine learning (ML) technique is popular, especially for deep learning (DL) method. This research aims to classify text data in the Indonesian language using convolutional neural network (CNN) as one of the DL algorithms. The CNN algorithm used modified following the Indonesian language characteristics. Thereby, in the text pre-processing phase, stopword removal and stemming are particularly suitable for the Indonesian language. The experiment conducted using 472 Indonesian news text data from various sources with four categories: ‘hiburan’ (entertainment), ‘olahraga’ (sport), ‘tajuk utama’ (headline news), and ‘teknologi’ (technology). Based on the experiment and evaluation using 377 training data and 95 testing data, producing five models with ten epoch for each model, CNN has the best percentage of accuracy around 90,74% and loss value around 29,05% for 300 hidden layers in classifying the Indonesian News data.

Keywords


Convolutional neural network; Deep learning; Indonesian language process; Natural language processing; Text mining

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v19.i2.pp1000-1009

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics