Random forest for lung cancer analysis using Apache Mahout and Hadoop based on software defined networking

Ali Abbood Khaleel, Ahmed Adnan Mohammed Al-Azzawi, Aws Mohammed Alkhazraji

Abstract


Random forest is a machine learning algorithm that mainly built as a classification method to make predictions based on decision trees. Many machine learning approaches used random forest to perform deep analysis on different cancer diseases to understand their complex characterstics and behaviour. However, due to massive and complex data generated from such diseases, it has become difficult to run random forest using single machine. Therefore, advanced tools are highly required to run random forest to analyse such massive data. In this paper, random forest algorithm using Apache Mahout and Hadoop based software defined networking (SDN) are used to conduct the prediction and analysis on large lung cancer datasets. Several experiments are conducted to evaluate the proposed system. Experiments are conducted using nine virtual nodes. Experiments show that the implementation of random forest algorithm using the proposed work outperforms its implementation in traditional environment with regard to the execution time. Comparison between the proposed system using Hadoop based SDN and Hadoop only is performed. Results show that random forest using Hadoop based SDN has less execution time than when using Hadoop only. Furthermore, experiments reveal that the performance of implemented system achieved more efficiency regarding execution time, accuracy and reliability.

Keywords


Apache Mahout; Big data; Hadoop; Machine learning; MapReduce; Random forest algorithm; Software defined networking

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v32.i2.pp1086-1093

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics