Design and Analysis of Parallel MapReduce based KNN-join Algorithm for Big Data Classification
Abstract
In data mining applications, multi-label classification is highly required in many modern applications. Meanwhile, a useful data mining approach is the k-nearest neighbour join, which has high accuracy but time-consuming process. With recent explosion of big data, conventional serial KNN join based multi-label classification algorithm needs to spend a lot of time to handle high volumn of data. To address this problem, we first design a parallel MapReduce based KNN join algorithm for big data classification. We further implement the algorithm using Hadoop in a cluster with 9 vitual machines. Experiment results show that our MapReduce based KNN join exhibits much higher performance than the serial one. Several interesting phenomenon are observed from the experiment results.
Full Text:
PDFDOI: http://doi.org/10.11591/ijeecs.v12.i11.pp7927-7934
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).