Machine Learning with PySpark - Review

Raswitha Bandi; J Amudhavel; R Karthik

doi:10.11591/ijeecs.v12.i1.pp102-106

Machine Learning with PySpark - Review

Raswitha Bandi, J Amudhavel, R Karthik

Abstract

A reasonable distributed memory-based Computing system for machine learning is Apache Spark. Spark is being superior in computing when compared with Hadoop. Apache Spark is a quick, simple to use for handling big data that has worked in modules of Machine Learning, streaming SQL, and graph processing. We can apply machine learning algorithms to big data easily, which makes it simple by using Spark and its machine learning library MLlib, even this can be made simpler by using the Python API PySpark. This paper presents the study on how to develop machine learning algorithms in PySpark.

Keywords

Apache spark; Machine Learning; PySpark; SCALA

Full Text:

PDF

DOI: http://doi.org/10.11591/ijeecs.v12.i1.pp102-106

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

IJEECS visitor statistics

Username
Password
Remember me